Feynman's Preface 


These are the lectures in physics that I gave last year and the year before to the 
freshman and sophomore classes at Caltech. The lectures are, of course, not 
verbatim — they have been edited, sometimes extensively and sometimes less so. 
The lectures form only part of the complete course. The whole group of 180 
students gathered in a big lecture room twice a week to hear these lectures and 
then they broke up into small groups of 15 to 20 students in recitation sections 
under the guidance of a teaching assistant. In addition, there was a laboratory 
session once a week. 

The special problem we tried to get at with these lectures was to maintain the 
interest of the very enthusiastic and rather smart students coming out of the high 
schools and into Caltech. They have heard a lot about how interesting and excit- 
ing physics is — the theory of relativity, quantum mechanics, and other modern 
ideas. By the end of two years of our previous course, many would be very dis- 
couraged because there were really very few grand, new, modern ideas presented 
to them. They were made to study inclined planes, electrostatics, and so forth, 
and after two years it was quite stultifying. The problem was whether or not we 
could make a course which would save the more advanced and excited student by 
maintaining his enthusiasm. 

The lectures here are not in any way meant to be a survey course, but are very 
serious. I thought to address them to the most intelligent in the class and to make 
sure, if possible, that even the most intelligent student was unable to completely 
encompass everything that was in the lectures — by putting in suggestions of appli- 
cations of the ideas and concepts in various directions outside the main line of 
attack. For this reason, though, I tried very hard to make all the statements as 
accurate as possible, to point out in every case where the equations and ideas fitted 
into the body of physics, and how — when they learned more — things would be 
modified. I also felt that for such students it is important to indicate what it is 
that they should — if they are sufficiently clever — be able to understand by deduc- 
tion from what has been said before, and what is being put in as something new. 
When new ideas came in, I would try either to deduce them if they were deducible, 
or to explain that it was a new idea which hadn't any basis in terms of things they 
had already learned and which was not supposed to be provable — but was just 
added in. 

At the start of these lectures, I assumed that the students knew something when 
they came out of high school — such things as geometrical optics, simple chemistry 
ideas, and so on. I also didn't see that there was any reason to make the lectures 
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in a definite order, in the sense that I would not be allowed to mention something 
until I was ready to discuss it in detail. There was a great deal of mention of things 
to come, without complete discussions. These more complete discussions would 
come later when the preparation became more advanced. Examples are the dis- 
cussions of inductance, and of energy levels, which are at first brought in in a 
very qualitative way and are later developed more completely. 

At the same time that I was aiming at the more active student, I also wanted 
to take care of the fellow for whom the extra fireworks and side applications are 
merely disquieting and who cannot be expected to learn most of the material in 
the lecture at all. For such students I wanted there to be at least a central core or 
backbone of material which he could get. Even if he didn't understand everything 
in a lecture, I hoped he wouldn't get nervous. I didn't expect him to understand 
everything, but only the central and most direct features. It takes, of course, a 
certain intelligence on his part to see which are the central theorems and central 
ideas, and which are the more advanced side issues and applications which he may 
understand only in later years. 

In giving these lectures there was one serious difficulty: in the way the course 
was given, there wasn't any feedback from the students to the lecturer to indicate 
how well the lectures were going over. This is indeed a very serious difficulty, 
and I don't know how good the lectures really are. The whole thing was essentially 
an experiment. And if I did it again I wouldn't do it the same way — I hope I 
don't have to do it again! I think, though, that things worked out — so far as the 
physics is concerned — quite satisfactorily in the first year. 

In the second year I was not so satisfied. In the first part of the course, dealing 
with electricity and magnetism, I couldn't think of any really unique or different 
way of doing it — of any way that would be particularly more exciting than the 
usual way of presenting it. So I don't think I did very much in the lectures on 
electricity and magnetism. At the end of the second year I had originally intended 
to go on, after the electricity and magnetism, by giving some more lectures on the 
properties of materials, but mainly to take up things like fundamental modes, 
solutions of the diffusion equation, vibrating systems, orthogonal functions,... 
developing the first stages of what are usually called "the mathematical methods of 
physics." In retrospect, I think that if I were doing it again I would go back to 
that original idea. But since it was not planned that I would be giving these lec- 
tures again, it was suggested that it might be a good idea to try to give an introduc- 
tion to the quantum mechanics — what you will find in Volume III. 

It is perfectly clear that students who will major in physics can wait until their 
third year for quantum mechanics. On the other hand, the argument was made 
that many of the students in our course study physics as a background for their 
primary interest in other fields. And the usual way of dealing with quantum 
mechanics makes that subject almost unavailable for the great majority of students 
because they have to take so long to learn it. Yet, in its real applications — espe- 
cially in its more complex applications, such as in electrical engineering and chem- 
istry — the full machinery of the differential equation approach is not actually 
used. So I tried to describe the principles of quantum mechanics in a way which 
wouldn't require that one first know the mathematics of partial differential equa- 
tions. Even for a physicist I think that is an interesting thing to try to do — to 
present quantum mechanics in this reverse fashion — for several reasons which 
may be apparent in the lectures themselves. However, I think that the experiment 
in the quantum mechanics part was not completely successful — in large part 
because I really did not have enough time at the end (I should, for instance, have 
had three or four more lectures in order to deal more completely with such matters 
as energy bands and the spatial dependence of amplitudes). Also, I had never 
presented the subject this way before, so the lack of feedback was particularly 
serious. I now believe the quantum mechanics should be given at a later time. 
Maybe I'll have a chance to do it again someday. Then I'll do it right. 

The reason there are no lectures on how to solve problems is because there were 
recitation sections. Although I did put in three lectures in the first year on how to 
solve problems, they are not included here. Also there was a lecture on inertial 
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guidance which certainly belongs after the lecture on rotating systems, but which 
was, unfortunately, omitted. The fifth and sixth lectures are actually due to 
Matthew Sands, as I was out of town. 

The question, of course, is how well this experiment has succeeded. My own 
point of view — which, however, does not seem to be shared by most of the people 
who worked with the students — is pessimistic. I don't think I did very well by the 
students. When I look at the way the majority of the students handled the problems 
on the examinations, I think that the system is a failure. Of course, my friends 
point out to me that there were one or two dozen students who — very surprisingly 
— understood almost everything in all of the lectures, and who were quite active 
in working with the material and worrying about the many points in an excited 
and interested way. These people have now, I believe, a first-rate background in 
physics — and they are, after all, the ones I was trying to get at. But then, "The 
power of instruction is seldom of much efficacy except in those happy dispositions 
where it is almost superfluous." (Gibbon) 

Still, I didn't want to leave any student completely behind, as perhaps I did. 
I think one way we could help the students more would be by putting more hard 
work into developing a set of problems which would elucidate some of the ideas 
in the lectures. Problems give a good opportunity to fill out the material of the 
lectures and make more realistic, more complete, and more settled in the mind 
the ideas that have been exposed. 

I think, however, that there isn't any solution to this problem of education 
other than to realize that the best teaching can be done only when there is a direct 
individual relationship between a student and a good teacher — a situation in which 
the student discusses the ideas, thinks about the things, and talks about the things. 
It's impossible to learn very much by simply sitting in a lecture, or even by simply 
doing problems that are assigned. But in our modern times we have so many 
students to teach that we have to try to find some substitute for the ideal. Perhaps 
my lectures can make some contribution. Perhaps in some small place where 
there are individual teachers and students, they may get some inspiration or some 
ideas from the lectures. Perhaps they will have fun thinking them through — or 
going on to develop some of the ideas further. 

RICHARD P. FEYNMAN 

June , 1963 



Foreword 


This book is based upon a course of lectures in introductory physics given by 
Prof. R. P. Feynman at the California Institute of Technology during the academic 
year 1961-62; it covers the first year of the two-year introductory course taken by 
all Caltech freshmen and sophomores, and was followed in 1962-63 by a similar 
series covering the second year. The lectures constitute a major part of a funda- 
mental revision of the introductory course, carried out over a four-year period. 

The need for a basic revision arose both from the rapid development of physics 
in recent decades and from the fact that entering freshmen have shown a steady 
increase in mathematical ability as a result of improvements in high school mathe- 
matics course content. We hoped to take advantage of this improved mathematical 
background, and also to introduce enough modern subject matter to make the 
course challenging, interesting, and more representative of present-day physics. 

In order to generate a variety of ideas on what material to include and how to 
present it, a substantial number of the physics faculty were encouraged to offer 
their ideas in the form of topical outlines for a revised course. Several of these 
were presented and were thoroughly and critically discussed. It was agreed almost 
at once that a basic revision of the course could not be accomplished either by 
merely adopting a different textbook, or even by writing one ab initio, but that 
the new course should be centered about a set of lectures, to be presented at the 
rate of two or three per week; the appropriate text material would then be produced 
as a secondary operation as the course developed, and suitable laboratory experi- 
ments would also be arranged to fit the lecture material. Accordingly, a rough 
outline of the course was established, but this was recognized as being incomplete, 
tentative, and subject to considerable modification by whoever was to bear the 
responsibility for actually preparing the lectures. 

Concerning the mechanism by which the course would finally be brought to 
life, several plans were considered. These plans were mostly rather similar, involv- 
ing a cooperative effort by N staff members who would share the total burden 
symmetrically and equally: each man would take responsibility for 1/N of the 
material, deliver the lectures, and write text material for his part. However, the 
unavailability of sufficient staff, and the difficulty of maintaining a uniform point 
of view because of differences in personality and philosophy of individual partici- 
pants, made such plans seem unworkable. 

The realization that we actually possessed the means to create not just a new 
and different physics course, but possibly a unique one, came as a happy inspira- 
tion to Professor Sands. He suggested that Professor R. P. Feynman prepare and 
deliver the lectures, and that these be tape-recorded. When transcribed and edited, 
they would then become the textbook for the new course. This is essentially the 
plan that was adopted. 

It was expected that the necessary editing would be minor, mainly consisting of 
supplying figures, and checking punctuation and grammar; it was to be done by 
one or two graduate students on a part-time basis. Unfortunately, this expectation 
was short-lived. It was, in fact, a major editorial operation to transform the ver- 
batim transcript into readable form, even without the reorganization or revision 
of The subject matter that was sometimes required. Furthermore, it was not a 
job for a technical editor or for a graduate student, but one that required the close 
attention of a professional physicist for from ten to twenty hours per lecture! 
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The difficulty of the editorial task, together with the need to place the material 
in the hands of the students as soon as possible, set a strict limit upon the amount 
of "polishing" of the material that could be accomplished, and thus we were 
forced to aim toward a preliminary but technically correct product that could be 
used immediately, rather than one that might be considered final or finished. 
Because of an urgent need for more copies for our students, and a heartening inter- 
est on the part of instructors and students at several other institutions, we decided 
to publish the material in its preliminary form rather than wait for a further major 
revision which might never occur. We have no illusions as to the completeness, 
smoothness, or logical organization of the material; in fact, we plan several minor 
modifications in the course in the immediate future, and we hope that it will not 
become static in form or content. 

In addition to the lectures, which constitute a centrally important part of the 
course, it was necessary also to provide suitable exercises to develop the students' 
experience and ability, and suitable experiments to provide first-hand contact 
with the lecture material in the laboratory. Neither of these aspects is in as ad- 
vanced a state as the lecture material, but considerable progress has been made. 
Some exercises were made up as the lectures progressed, and these were expanded 
and amplified for use in the following year. However, because we are not yet 
satisfied that the exercises provide sufficient variety and depth of application of 
the lecture material to make the student fully aware of the tremendous power 
being placed at his disposal, the exercises are published separately in a less perma- 
nent form in order to encourage frequent revision. 

A number of new experiments for the new course have been devised by Professor 
H. V. Neher. Among these are several which utilize the extremely low friction 
exhibited by a gas bearing: a novel linear air trough, with which quantitative 
measurements of one-dimensional motion, impacts, and harmonic motion can be 
made, and an air- supported, air-driven Maxwell top, with which accelerated rota- 
tional motion and gyroscopic precession and nutation can be studied. The develop- 
ment of new laboratory experiments is expected to continue for a considerable 
period of time. 

The revision program was under the direction of Professors R. B. Leighton, 
H. V. Neher, and M. Sands. Officially participating in the program were Professors 
R. P. Feynman, G. Neugebauer, R. M. Sutton, H. P. Stabler,* F. Strong, and 
R. Vogt, from the division of Physics, Mathematics and Astronomy, and Professors 
T. Caughey, M. Plesset, and C. H. Wilts from the division of Engineering Science. 
The valuable assistance of all those contributing to the revision program is grate- 
fully acknowledged. We are particularly indebted to the Ford Foundation, without 
whose financial assistance this program could not have been carried out. 


July, 1963 


ROBERT B. LEIGHTON 


* 1961-62, while on leave from Williams College, Williamstown, Mass. 
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Atoms in Motion 


1-1 Introduction 

This two-year course in physics is presented from the point of view that you, 
the reader, are going to be a physicist. This is not necessarily the case of course, 
but that is what every professor in every subject assumes! If you are going to 
be a physicist, you will have a lot to study: two hundred years of the most rapidly 
developing field of knowledge that there is. So much knowledge, in fact, that 
you might think that you cannot learn all of it in four years, and truly you cannot; 
you will have to go to graduate school too! 

Surprisingly enough, in spite of the tremendous amount of work that has been 
done for all this time it is possible to condense the enormous mass of results to 
a large extent — that is, to find laws which summarize all our knowledge. Even 
so, the laws are so hard to grasp that it is unfair to you to start exploring this 
tremendous subject without some kind of map or outline of the relationship of 
one part of the subject of science to another. Following these preliminary remarks, 
the first three chapters will therefore outline the relation of physics to the rest 
of the sciences, the relations of the sciences to each other, and the meaning of 
science, to help us develop a "feel" for the subject. 

You might ask why we cannot teach physics by just giving the basic laws on 
page one and then showing how they work in all possible circumstances, as we do 
in Euclidean geometry, where we state the axioms and then make all sorts of de- 
ductions. (So, not satisfied to learn physics in four years, you want to learn it in 
four minutes?) We cannot do it in this way for two reasons. First, we do not yet 
know all the basic laws: there is an expanding frontier of ignorance. Second, the 
correct statement of the laws of physics involves some very unfamiliar ideas 
which require advanced mathematics for their description. Therefore, one needs 
a considerable amount of preparatory training even to learn what the words 
mean. No, it is not possible to do it that way. We can only do it piece by piece. 

Each piece, or part, of the whole of nature is always merely an approximation 
to the complete truth, or the complete truth so far as we know it. In fact, every- 
thing we know is only some kind of approximation, because we know that we do 
not know all the laws as yet. Therefore, things must be learned only to be unlearned 
again or, more likely, to be corrected. 

The principle of science, the definition, almost, is the following: The test of 
all knowledge is experiment. Experiment is the sole judge of scientific "truth." 
But what is the source of knowledge? Where do the laws that are to be tested 
come from? Experiment, itself, helps to produce these laws, in the sense that it 
gives us hints. But also needed is imagination to create from these hints the great 
generalizations — to guess at the wonderful, simple, but very strange patterns be- 
neath them all, and then to experiment to check again whether we have made the 
right guess. This imagining process is so difficult that there is a division of labor 
in physics: there are theoretical physicists who imagine, deduce, and guess at new 
laws, but do not experiment; and then there are experimental physicists who ex- 
periment, imagine, deduce, and guess. 

We said that the laws of nature are approximate: that we first find the "wrong" 
ones, and then we find the "right" ones. Now, how can an experiment be "wrong" ? 
First, in a trivial way: if something is wrong with the apparatus that you did not 
notice. But these things are easily fixed, and checked back and forth. So without 
snatching at such minor things, how can the results of an experiment be wrong? 
Only by being inaccurate. For example, the mass of an object never seems to 
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change; a spinning top has the same weight as a still one. So a "law" was in- 
vented: mass is constant, independent of speed. That "law" is now found to be 
incorrect. Mass is found to increase with velocity, but appreciable increases require 
velocities near that of light. A true law is: if an object moves with a speed of 
less than one hundred miles a second the mass is constant to within one part in 
a million. In some such approximate form this is a correct law. So in practice 
one might think that the new law makes no significant difference. Well, yes and 
no. For ordinary speeds we can certainly forget it and use the simple constant- 
mass law as a good approximation. But for high speeds we are wrong, and the 
higher the speed, the more wrong we are. 

Finally, and most interesting, philosophically we are completely wrong with 
the approximate law. Our entire picture of the world has to be altered even though 
the mass changes only by a little bit. This is a very peculiar thing about the 
philosophy, or the ideas, behind the laws. Even a very small effect sometimes 
requires profound changes in our ideas. 

Now, what should we teach first? Should we teach the correct but unfamiliar 
law with its strange and difficult conceptual ideas, for example the theory of 
relativity, four-dimensional space-time, and so on? Or should we first teach the 
simple "constant-mass" law, which is only approximate, but does not involve such 
difficult ideas? The first is more exciting, more wonderful, and more fun, but the 
second is easier to get at first, and is a first step to a real understanding of the 
second idea. This point arises again and again in teaching physics. At different 
times we shall have to resolve it in different ways, but at each stage it is worth 
learning what is now known, how accurate it is, how it fits into everything else, 
and how it may be changed when we learn more. 

Let us now proceed with our outline, or general map, of our understanding 
of science today (in particular, physics, but also of other sciences on the periphery), 
so that when we later concentrate on some particular point we will have some idea 
of the background, why that particular point is interesting, and how it fits into 
the big structure. So, what is our over-all picture of the world? 



WATER MAGNIFIED ONE BILLION TIMES 


1-2 Matter is made of atoms 

If, in some cataclysm, all of scientific knowledge were to be destroyed, and only 
one sentence passed on to the next generations of creatures, what statement would 
contain the most information in the fewest words? I believe it is the atomic 
hypothesis (or the atomic fact, or whatever you wish to call it) that all things are 
made of atoms — little particles that move around in perpetual motion , attracting 
each other when they are a little distance apart, but repelling upon being squeezed 
into one another. In that one sentence, you will see, there is an enormous amount 
of information about the world, if just a little imagination and thinking are applied. 

To illustrate the power of the atomic idea, suppose that we have a drop of 
water a quarter of an inch on the side. If we look at it very closely we see nothing 
but water — smooth, continuous water. Even if we magnify it with the best optical 
microscope available — roughly two thousand times — then the water drop will be 
roughly forty feet across, about as big as a large room, and if we looked rather 
closely, we would still see relatively smooth water — but here and there small 
football-shaped things swimming back and forth. Very interesting. These are 
paramecia. You may stop at this point and get so curious about the paramecia 
with their wiggling cilia and twisting bodies that you go no further, except per- 
haps to magnify the paramecia still more and see inside. This, of course, is a subject 
for biology, but for the present we pass on and look still more closely at the water 
material itself, magnifying it two thousand times again. Now the drop of water 
extends about fifteen miles across, and if we look very closely at it we see a kind 
of teeming, something which no longer has a smooth appearance — it looks some- 
thing like a crowd at a football game as seen from a very great distance. In order 
to see what this teeming is about, we will magnify it another two hundred and 
fifty times and we will see something similar to what is shown in Fig. 1-1. This 
is a picture of water magnified a billion times, but idealized in several ways. 
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Figure 1-1 




In the first place, the particles are drawn in a simple manner with sharp edges, 
which is inaccurate. Secondly, for simplicity, they are sketched almost schemati- 
cally in a two-dimensional arrangement, but of course they are moving around in 
three dimensions. Notice that there are two kinds of "blobs" or circles to represent 
the atoms of oxygen (black) and hydrogen (white), and that each oxygen has two 
hydrogens tied to it. (Each little group of an oxygen with its two hydrogens is 
called a molecule.) The picture is idealized further in that the real particles in 
nature are continually jiggling and bouncing, turning and twisting around one 
another. You will have to imagine this as a dynamic rather than a static picture. 
Another thing that cannot be illustrated in a drawing is the fact that the particles 
are "stuck together" — that they attract each other, this one pulled by that one, 
etc. The whole group is "glued together," so to speak. On the other hand, the 
particles do not squeeze through each other. If you try to squeeze two of them too 
close together, they repel. 

The atoms are 1 or 2 X 10" 8 cm in radius. Now 10" 8 cm is called an 
angstrom (just as another name), so we say they are 1 or 2 angstroms (A) in radius. 
Another way to remember their size is this: if an apple is magnified to the size 
of the earth, then the atoms in the apple are approximately the size of the original 
apple. 

Now imagine this great drop of water with all of these jiggling particles stuck 
together and tagging along with each other. The water keeps its volume; it does 
not fall apart, because of the attraction of the molecules for each other. If the 
drop is on a slope, where it can move from one place to another, the water will 
flow, but it does not just disappear — things do not just fly apart — because of 
the molecular attraction. Now the jiggling motion is what we represent as heat: 
when we increase the temperature, we increase the motion. If we heat the water, 
the jiggling increases and the volume between the atoms increases, and if the 
heating continues there comes a time when the pull between the molecules is not 
enough to hold them together and they do fly apart and become separated from 
one another. Of course, this is how we manufacture steam out of water — by 
increasing the temperature; the particles fly apart because of the increased motion. 

In Fig. 1-2 we have a picture of steam. This picture of steam fails in one 
respect: at ordinary atmospheric pressure there might be only a few molecules in 
a whole room, and there certainly would not be as many as three in this figure. 
Most squares this size would contain none — but we accidentally have two and a 
half or three in the picture (just so it would not be completely blank). Now in 
the case of steam we see the characteristic molecules more clearly than in the case 
of water. For simplicity, the molecules are drawn so that there is a 120° angle 
between them. In actual fact the angle is 105°3', and the distance between the 
center of a hydrogen and the center of the oxygen is 0.957 A, so we know this 
molecule very well. 

Let us see what some of the properties of steam vapor or any other gas are. 
The molecules, being separated from one another, will bounce against the walls. 
Imagine a room with a number of tennis balls (a hundred or so) bouncing around 
in perpetual motion. When they bombard the wall, this pushes the wall away. 
(Of course we would have to push the wall back.) This means that the gas exerts 
a jittery force which our coarse senses (not being ourselves magnified a billion 
times) feels only as an average push . In order to confine a gas we must apply a 
pressure. Figure 1-3 shows a standard vessel for holding gases (used in all 
textbooks), a cylinder with a piston in it. Now, it makes no difference what the 
shapes of water molecules are, so for simplicity we shall draw them as tennis 
balls or little dots. These things are in perpetual motion in all directions. So many 
of them are hitting the top piston all the time that to keep it from being patiently 
knocked out of the tank by this continuous banging, we shall have to hold the 
piston down by a certain force, which we call the pressure (really, the pressure 
times the area is the force). Clearly, the force is proportional to the area, for if 
we increase the area but keep the number of molecules per cubic centimeter the 
same, we increase the number of collisions with the piston in the same proportion 
as the area was increased. 
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Now let us put twice as many molecules in this tank, so as to double the den- 
sity, and let them have the same speed, i.e., the same temperature. Then, to a 
close approximation, the number of collisions will be doubled, and since each will 
be just as "energetic" as before, the pressure is proportional to the density. If we 
consider the true nature of the forces between the atoms, we would expect a slight 
decrease in pressure because of the attraction between the atoms, and a slight 
increase because of the finite volume they occupy. Nevertheless, to an excellent 
approximation, if the density is low enough that there are not many atoms, the 
pressure is proportional to the density. 

We can also see something else: If we increase the temperature without 
changing the density of the gas, i.e., if we increase the speed of the atoms, what 
is going to happen to the pressure? Well, the atoms hit harder because they are 
moving faster, and in addition they hit more often, so the pressure increases. 
You see how simple the ideas of atomic theory are. 

Let us consider another situation. Suppose that the piston moves inward, 
so that the atoms are slowly compressed into a smaller space. What happens when 
an atom hits the moving piston? Evidently it picks up speed from the collision. 
You can try it by bouncing a ping-pong ball from a forward-moving paddle, for 
example, and you will find that it comes off with more speed than that with which 
it struck. (Special example: if an atom happens to be standing still and the piston 
hits it, it will certainly move.) So the atoms are "hotter" when they come away 
from the piston than they were before they struck it. Therefore all the atoms which 
are in the vessel will have picked up speed. This means that when we compress 
a gas slowly , the temperature of the gas increases. So, under slow compression, 
a gas will increase in temperature, and under slow expansion it will decrease in 
temperature. 

We now return to our drop of water and look in another direction. Suppose 
that we decrease the temperature of our drop of water. Suppose that the jiggling 
of the molecules of the atoms in the water is steadily decreasing. We know that 
there are forces of attraction between the atoms, so that after a while they will 
not be able to jiggle so well. What will happen at very low temperatures is in- 
dicated in Fig. 1-4: the molecules lock into a new pattern which is ice. This 
particular schematic diagram of ice is wrong because it is in two dimensions, but 
it is right qualitatively. The interesting point is that the material has a definite 
place for every atom, and you can easily appreciate that if somehow or other we 
were to hold all the atoms at one end of the drop in a certain arrangement, each 
atom in a certain place, then because of the structure of interconnections, which is 
rigid, the other end miles away (at our magnified scale) will have a definite location. 
So if we hold a needle of ice at one end, the other end resists our pushing it aside, 
unlike the case of water, in which the structure is broken down because of the 
increased jiggling so that the atoms all move around in different ways. The differ- 
ence between solids and liquids is, then, that in a solid the atoms are arranged in 
some kind of an array, called a crystalline array, and they do not have a random 
position at long distances; the position of the atoms on one side of the crystal 
is determined by that of other atoms millions of atoms away on the other side of 
the crystal. Figure 14 is an invented arrangement for ice, and although it con- 
tains many of the correct features of ice, it is not the true arrangement. One of the 
correct features is that there is a part of the symmetry that is hexagonal. You can 
see that if we turn the picture around an axis by 120°, the picture returns to itself. 
So there is a symmetry in the ice which accounts for the six-sided appearance of 
snowflakes. Another thing we can see from Fig. 14 is why ice shrinks when it 
melts. The particular crystal pattern of ice shown here has many "holes" in it, 
as does the true ice structure. When the organization breaks down, these holes 
can be occupied by molecules. Most simple substances, with the exception of 
water and type metal, expand upon melting, because the atoms are closely packed 
in the solid crystal and upon melting need more room to jiggle around, but an 
open structure collapses, as in the case of water. 

Now although ice has a "rigid" crystalline form, its temperature can change — 
ice has heat. If we wish, we can change the amount of heat. What is the heat in 
14 



the case of ice? The atoms are not standing still. They are jiggling and vibrating. 
So even though there is a definite order to the crystal — a definite structure — all 
of the atoms are vibrating "in place." As we increase the temperature, they vibrate 
with greater and greater amplitude, until they shake themselves out of place. 
We call this melting. As we decrease the temperature, the vibration decreases 
and decreases until, at absolute zero, there is a minimum amount of vibration 
that the atoms can have, but not zero. This minimum amount of motion that atoms 
can have is not enough to melt a substance, with one exception: helium. Helium 
merely decreases the atomic motions as much as it can, but even at absolute zero 
there is still enough motion to keep it from freezing. Helium, even at absolute 
zero, does not freeze, unless the pressure is made so great as to make the atoms 
squash together. If we increase the pressure, we can make it solidify. 

1-3 Atomic processes 

So much for the description of solids, liquids, and gases from the atomic 
point of view. However, the atomic hypothesis also describes processes , and so we 
shall now look at a number of processes from an atomic standpoint. The first 
process that we shall look at is associated with the surface of the water. What 
happens at the surface of the water? We shall now make the picture more com- 
plicated — and more realistic — by imagining that the surface is in air. Figure 1-5 
shows the surface of water in air. We see the water molecules as before, forming 
a body of liquid water, but now we also see the surface of the water. Above the 
surface we find a number of things: First of all there are water molecules, as in steam. 
This is water vapor , which is always found above liquid water. (There is an 
equilibrium between the steam vapor and the water which will be described later.) 
In addition we find some other molecules — here two oxygen atoms stuck together 
by themselves, forming an oxygen molecule , there two nitrogen atoms also stuck 
together to make a nitrogen molecule. Air consists almost entirely of nitrogen, 
oxygen, some water vapor, and lesser amounts of carbon dioxide, argon, and 
other things. So above the water surface is the air, a gas, containing some water 
vapor. Now what is happening in this picture? The molecules in the water are 
always jiggling around. From time to time, one on the surface happens to be hit 
a little harder than usual, and gets knocked away. It is hard to see that happening 
in the picture because it is a still picture. But we can imagine that one molecule 
near the surface has just been hit and is flying out, or perhaps another one has 
been hit and is flying out. Thus, molecule by molecule, the water disappears — 
it evaporates. But if we close the vessel above, after a while we shall find a large 
number of molecules of water amongst the air molecules. From time to time, one 
of these vapor molecules comes flying down to the water and gets stuck again. 
So we see that what looks like a dead, uninteresting thing — a glass of water with 
a cover, that has been sitting there for perhaps twenty years — really contains a 
dynamic and interesting phenomenon which is going on all the time. To our eyes, 
our crude eyes, nothing is changing, but if we could see it a billion times magni- 
fied, we would see that from its own point of view it is always changing: molecules 
are leaving the surface, molecules are coming back. 

Why do we see no change? Because just as many molecules are leaving as 
are coming back! In the long run "nothing happens." If we then take the top of 
the vessel off and blow the moist air away, replacing it with dry air, then the 
number of molecules leaving is just the same as it was before, because this depends 
on the jiggling of the water, but the number coming back is greatly reduced be- 
cause there are so many fewer water molecules above the water. Therefore there 
are more going out than coming in, and the water evaporates. Hence, if you wish 
to evaporate water turn on the fan! 

Here is something else: Which molecules leave? When a molecule leaves it 
is due to an accidental, extra accumulation of a little bit more than ordinary 
energy, which it needs if it is to break away from the attractions of its neighbors. 
Therefore, since those that leave have more energy than the average, the ones that 
are left have less average motion than they had before. So the liquid gradually 
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cools if it evaporates. Of course, when a molecule of vapor comes from the air to 
the water below there is a sudden great attraction as the molecule approaches 
the surface. This speeds up the incoming molecule and results in generation of 
heat. So when they leave they take away heat; when they come back they generate 
heat. Of course when there is no net evaporation the result is nothing — the water 
is not changing temperature. If we blow on the water so as to maintain a continuous 
preponderance in the number evaporating, then the water is cooled. Hence, 
blow on soup to cool it! 

Of course you should realize that the processes just described are more com- 
plicated than we have indicated. Not only does the water go into the air, but also, 
from time to time, one of the oxygen or nitrogen molecules will come in and "get 
lost" in the mass of water molecules, and work its way into the water. Thus the 
air dissolves in the water; oxygen and nitrogen molecules will work their way into 
the water and the water will contain air. If we suddenly take the air away from the 
vessel, then the air molecules will leave more rapidly than they come in, and in 
doing so will make bubbles. This is very bad for divers, as you may know. 

Now we go on to another process. In Fig. 1-6 we see, from an atomic point 
of view, a solid dissolving in water. If we put a crystal of salt in the water, what 
will happen? Salt is a solid, a crystal, an organized arrangement of "salt atoms." 
Figure 1-7 is an illustration of the three-dimensional structure of common salt, 
sodium chloride. Strictly speaking, the crystal is not made of atoms, but of what 
we call ions. An ion is an atom which either has a few extra electrons or has lost 
a few electrons. In a salt crystal we find chlorine ions (chlorine atoms with an 
extra electron) and sodium ions (sodium atoms with one electron missing). The 
ions all stick together by electrical attraction in the solid salt, but when we put 
them in the water we find, because of the attractions of the negative oxygen and 
positive hydrogen for the ions, that some of the ions jiggle loose. In Fig. 1-6 
we see a chlorine ion getting loose, and other atoms floating in the water in the form 
of ions. This picture was made with some care. Notice, for example, that the 
hydrogen ends of the water molecules are more likely to be near the chlorine ion, 
while near the sodium ion we are more likely to find the oxygen end, because the 
sodium is positive and the oxygen end of the water is negative, and they attract 
electrically. Can we tell from this picture whether the salt is dissolving in water or 
crystallizing out of water? Of course we cannot tell, because while some of the 
atoms are leaving the crystal other atoms are rejoining it. The process is a dynamic 
one, just as in the case of evaporation, and it depends on whether there is more or 
less salt in the water than the amount needed for equilibrium. By equilibrium we 
mean that situation in which the rate at which atoms are leaving just matches the 
rate at which they are coming back. If there is almost no salt in the water, more 
atoms leave than return, and the salt dissolves. If, on the other hand, there are 
too many "salt atoms," more return than leave, and the salt is crystallizing. 

In passing, we mention that the concept of a molecule of a substance is only 
approximate and exists only for a certain class of substances. It is clear in the 
case of water that the three atoms are actually stuck together. It is not so clear 
in the case of sodium chloride in the solid. There is just an arrangement of sodium 
and chlorine ions in a cubic pattern. There is no natural way to group them as 
"molecules of salt." 

Returning to our discussion of solution and precipitation, if we increase the 
temperature of the salt solution, then the rate at which atoms are taken away is 
increased, and so is the rate at which atoms are brought back. It turns out to be 
very difficult, in general, to predict which way it is going to go, whether more or 
less of the solid will dissolve. Most substances dissolve more, but some substances 
dissolve less, as the temperature increases. 


14 Chemical reactions 

In all of the processes which have been described so far, the atoms and the 
ions have not changed partners, but of course there are circumstances in which 
the atoms do change combinations, forming new molecules. This is illustrated in 
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Fig. 1-8. A process in which the rearrangement of the atomic partners occurs is 
what we call a chemical reaction. The other processes so far described are called 
physical processes, but there is no sharp distinction between the two. (Nature 
does not care what we call it, she just keeps on doing it.) This figure is supposed 
to represent carbon burning in oxygen. In the case of oxygen, two oxygen atoms 
stick together very strongly. (Why do not three or even four stick together? That 
is one of the very peculiar characteristics of such atomic processes. Atoms are 
very special: they like certain particular partners, certain particular directions, and 
so on. It is the job of physics to analyze why each one wants what it wants. At 
any rate, two oxygen atoms form, saturated and happy, a molecule.) 

The carbon atoms are supposed to be in a solid crystal (which could be graphite 
or diamond*). Now, for example, one of the oxygen molecules can come over to 
the carbon, and each atom can pick up a carbon atom and go flying off in a new 
combination — "carbon-oxygen" — which is a molecule of the gas called carbon 
monoxide. It is given the chemical name CO. It is very simple: the letters "CO" 
are practically a picture of that molecule. But carbon attracts oxygen much more 
than oxygen attracts oxygen or carbon attracts carbon. Therefore in this process 
the oxygen may arrive with only a little energy, but the oxygen and carbon will 
snap together with a tremendous vengeance and commotion, and everything near 
them will pick up the energy. A large amount of motion energy, kinetic energy, 
is thus generated. This of course is burning; we are getting heat from the com- 
bination of oxygen and carbon. The heat is ordinarily in the form of the molecular 
motion of the hot gas, but in certain circumstances it can be so enormous that it 
generates light. That is how one gets flames. 

In addition, the carbon monoxide is not quite satisfied. It is possible for it 
to attach another oxygen, so that we might have a much more complicated reac- 
tion in which the oxygen is combining with the carbon, while at the same time there 
happens to be a collision with a carbon monoxide molecule. One oxygen atom 
could attach itself to the CO and ultimately form a molecule, composed of one 
carbon and two oxygens, which is designated C0 2 and called carbon dioxide. 
If we burn the carbon with very little oxygen in a very rapid reaction (for example, 
in an automobile engine, where the explosion is so fast that there is not time for 
it to make carbon dioxide) a considerable amount of carbon monoxide is formed. 
In many such rearrangements, a very large amount of energy is released, forming 
explosions, flames, etc., depending on the reactions. Chemists have studied these 
arrangements of the atoms, and found that every substance is some type of arrange- 
ment of atoms. 

To illustrate this idea, let us consider another example. If we go into a field 
of small violets, we know what "that smell" is. It is some kind of molecule , or 
arrangement of atoms, that has worked its way into our noses. First of all, how 
did it work its way in? That is rather easy. If the smell is some kind of molecule 
in the air, jiggling around and being knocked every which way, it might have 
accidentally worked its way into the nose. Certainly it has no particular desire to 
get into our nose. It is merely one helpless part of a jostling crowd of molecules, 
and in its aimless wanderings this particular chunk of matter happens to find 
itself in the nose. 

Now chemists can take special molecules like the odor of violets, and analyze 
them and tell us the exact arrangement of the atoms in space. We know that the 
carbon dioxide molecule is straight and symmetrical: O — C — O. (That can be deter- 
mined easily, too, by physical methods.) However, even for the vastly more com- 
plicated arrangements of atoms that there are in chemistry, one can, by a long, 
remarkable process of detective work, find the arrangements of the atoms. Figure 
1-9 is a picture of the air in the neighborhood of a violet; again we find nitrogen 
and oxygen in the air, and water vapor. (Why is there water vapor? Because the 
violet is wet. All plants transpire.) However, we also see a "monster" composed 
of carbon atoms, hydrogen atoms, and oxygen atoms, which have picked a certain 
particular pattern in which to be arranged. It is a much more complicated arrange- 

*One can burn a diamond in air. 
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ment than that of carbon dioxide; in fact, it is an enormously complicated arrange- 
ment. Unfortunately, we cannot picture all that is really known about it chemically, 
because the precise arrangement of all the atoms is actually known in three 
dimensions, while our picture is in only two dimensions. The six carbons which 
form a ring do not form a flat ring, but a kind of "puckered" ring. All of the 
angles and distances are known. So a chemical formula is merely a picture of such 
a molecule. When the chemist writes such a thing on the blackboard, he is trying 
to "draw," roughly speaking, in two dimensions. For example, we see a "ring" 
of six carbons, and a "chain" of carbons hanging on the end, with an oxygen 
second from the end, three hydrogens tied to that carbon, two carbons and three 
hydrogens sticking up here, etc. 

How does the chemist find what the arrangement is? He mixes bottles full 
of stuff together, and if it turns red, it tells him that it consists of one hydrogen and 
two carbons tied on here; if it turns blue, on the other hand, that is not the way 
it is at all. This is one of the most fantastic pieces of detective work that has ever 
been done — organic chemistry. To discover the arrangement of the atoms in these 
enormously complicated arrays the chemist looks at what happens when he mixes 
two different substances together. The physicist could never quite believe that the 
chemist knew what he was talking about when he described the arrangement of 
the atoms. For about twenty years it has been possible, in some cases, to look at 
such molecules (not quite as complicated as this one, but some which contain 
parts of it) by a physical method, and it has been possible to locate every atom, 
not by looking at colors, but by measuring where they are. And lo and behold!, 
the chemists are almost always correct. 

It turns out, in fact, that in the odor of violets there are three slightly different 
molecules, which differ only in the arrangement of the hydrogen atoms. 

One problem of chemistry is to name a substance, so that we will know what 
it is. Find a name for this shape! Not only must the name tell the shape, but it 
must also tell that here is an oxygen atom, there a hydrogen — exactly what and 
where each atom is. So we can appreciate that the chemical names must be com- 
plex in order to be complete. You see that the name of this thing in the more com- 
plete form that will tell you the structure of it is 4-(2, 2, 3, 6 tetramethyl-5- 
cyclohexanyl)-3-buten-2-one, and that tells you that this is the arrangement. We 
can appreciate the difficulties that the chemists have, and also appreciate the reason 
for such long names. It is not that they wish to be obscure, but they have an 
extremely difficult problem in trying to describe the molecules in words! 

How do we know that there are atoms? By one of the tricks mentioned earlier: 
we make the hypothesis that there are atoms, and one after the other results come 
out the way we predict, as they ought to if things are made of atoms. There is 
also somewhat more direct evidence, a good example of which is the following: 
The atoms are so small that you cannot see them with a light microscope — in 
fact, not even with an electron microscope. (With a light microscope you can only 
see things which are much bigger.) Now if the atoms are always in motion, say in 
water, and we put a big ball of something in the water, a ball much bigger than the 
atoms, the ball will jiggle around — much as in a push ball game, where a great 
big ball is pushed around by a lot of people. The people are pushing in various 
directions, and the ball moves around the field in an irregular fashion. So, in the 
same way, the "large ball" will move because of the inequalities of the collisions 
on one side to the other, from one moment to the next. Therefore, if we look at 
very tiny particles (colloids) in water through an excellent microscope, we see 
a perpetual jiggling of the particles, which is the result of the bombardment of the 
atoms. This is called the Brownian motion. 

We can see further evidence for atoms in the structure of crystals. In many 
cases the structures deduced by x-ray analysis agree in their spatial "shapes" with 
the forms actually exhibited by crystals as they occur in nature. The angles be- 
tween the various "faces" of a crystal agree, within seconds of arc, with angles 
deduced on the assumption that a crystal is made of many "layers" of atoms. 

Everything is made of atoms. That is the key hypothesis. The most important 
hypothesis in all of biology, for example, is that everything that animals do, atoms 
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do. In other words, there is nothing that living things do that cannot be understood 
from the point of view that they are made of atoms acting according to the laws 
of physics. This was not known from the beginning: it took some experimenting 
and theorizing to suggest this hypothesis, but now it is accepted, and it is the most 
useful theory for producing new ideas in the field of biology. 

If a piece of steel or a piece of salt, consisting of atoms one next to the other, 
can have such interesting properties; if water — which is nothing but these little 
blobs, mile upon mile of the same thing over the earth — can form waves and foam, 
and make rushing noises and strange patterns as it runs over cement; if all of 
this, all the life of a stream of water, can be nothing but a pile of atoms, how much 
more is possible? If instead of arranging the atoms in some definite pattern, 
again and again repeated, on and on, or even forming little lumps of complexity 
like the odor of violets, we make an arrangement which is always different from 
place to place, with different kinds of atoms arranged in many ways, continually 
changing, not repeating, how much more marvelously is it possible that this thing 
might behave? Is it possible that that "thing" walking back and forth in front of 
you, talking to you, is a great glob of these atoms in a very complex arrangement, 
such that the sheer complexity of it staggers the imagination as to what it can do? 
When we say we are a pile of atoms, we do not mean we are merely a pile of atoms, 
because a pile of atoms which is not repeated from one to the other might well 
have the possibilities which you see before you in the mirror. 


1-9 



2 


Basic Physics 


2-1 Introduction 

In this chapter, we shall examine the most fundamental ideas that we have 
about physics — the nature of things as we see them at the present time. We shall 
not discuss the history of how we know that all these ideas are true; you will 
learn these details in due time. 

The things with which we concern ourselves in science appear in myriad forms, 
and with a multitude of attributes. For example, if we stand on the shore and look 
at the sea, we see the water, the waves breaking, the foam, the sloshing motion 
of the water, the sound, the air, the winds and the clouds, the sun and the blue 
sky, and light; there is sand and there are rocks of various hardness and perma- 
nence, color and texture. There are animals and seaweed, hunger and disease, and 
the observer on the beach; there may be even happiness and thought. Any other 
spot in nature has a similar variety of things and influences. It is always as com- 
plicated as that, no matter where it is. Curiosity demands that we ask questions, 
that we try to put things together and try to understand this multitude of aspects 
as perhaps resulting from the action of a relatively small number of elemental 
things and forces acting in an infinite variety of combinations. 

For example: Is the sand other than the rocks? That is, is the sand perhaps 
nothing but a great number of very tiny stones? Is the moon a great rock? If we 
understood rocks, would we also understand the sand and the moon? Is the wind 
a sloshing of the air analogous to the sloshing motion of the water in the sea? 
What common features do different movements have? What is common to dif- 
ferent kinds of sound? How many different colors are there? And so on. In this 
way we try gradually to analyze all things, to put together things which at first 
sight look different, with the hope that we may be able to reduce the number of 
different things and thereby understand them better. 

A few hundred years ago, a method was devised to find partial answers to 
such questions. Observation, reason, and experiment make up what we call the 
scientific method. We shall have to limit ourselves to a bare description of our 
basic view of what is sometimes called fundamental physics, or fundamental ideas 
which have arisen from the application of the scientific method. 

What do we mean by "understanding" something? We can imagine that this 
complicated array of moving things which constitutes "the world" is something 
like a great chess game being played by the gods, and we are observers of the game. 
We do not know what the rules of the game are; all we are allowed to do is to 
watch the playing. Of course, if we watch long enough, we may eventually catch 
on to a few of the rules. The rules of the game are what we mean by fundamental 
physics. Even if we knew every rule, however, we might not be able to under- 
stand why a particular move is made in the game, merely because it is too com- 
plicated and our minds are limited. If you play chess you must know that it is 
easy to learn all the rules, and yet it is often very hard to select the best move or 
to understand why a player moves as he does. So it is in nature, only much more 
so; but we may be able at least to find all the rules. Actually, we do not have all 
the rules now. (Every once in a while something like castling is going on that we 
still do not understand.) Aside from not knowing all of the rules, what we really 
can explain in terms of those rules is very limited, because almost all situations are 
so enormously complicated that we cannot follow the plays of the game using the 
rules, much less tell what is going to happen next. We must, therefore, limit our- 
selves to the more basic question of the rules of the game. If we know the rules, 
we consider that we "understand" the world. 
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How can we tell whether the rules which we "guess" at are really right if we 
cannot analyze the game very well? There are, roughly speaking, three ways. First, 
there may be situations where nature has arranged, or we arrange nature, to be 
simple and to have so few parts that we can predict exactly what will happen, 
and thus we can check how our rules work. (In one corner of the board there may 
be only a few chess pieces at work, and that we can figure out exactly.) 

A second good way to check rules is in terms of less specific rules derived 
from them. For example, the rule on the move of a bishop on a chessboard is 
that it moves only on the diagonal. One can deduce, no matter how many moves 
may be made, that a certain bishop will always be on a red square. So, without 
being able to follow the details, we can always check our idea about the bishop's 
motion by finding out whether it is always on a red square. Of course it will be, 
for a long time, until all of a sudden we find that it is on a black square (what 
happened of course, is that in the meantime it was captured, another pawn crossed 
for queening, and it turned into a bishop on a black square). That is the way it is 
in physics. For a long time we will have a rule that works excellently in an over-all 
way, even when we cannot follow the details, and then some time we may discover 
a new rule. From the point of view of basic physics, the most interesting phenomena 
are of course in the new places, the places where the rules do not work — not the 
places where they do work! That is the way in which we discover new rules. 

The third way to tell whether our ideas are right is relatively crude but prob- 
ably the most powerful of them all. That is, by rough approximation. While we 
may not be able to tell why Alekhine moves this particular piece, perhaps we can 
roughly understand that he is gathering his pieces around the king to protect it, 
more or less, since that is the sensible thing to do in the circumstances. In the 
same way, we can often understand nature, more or less, without being able to see 
what every little piece is doing, in terms of our understanding of the game. 

At first the phenomena of nature were roughly divided into classes, like heat, 
electricity, mechanics, magnetism, properties of substances, chemical phenomena, 
light or optics, x-rays, nuclear physics, gravitation, meson phenomena, etc. How- 
ever, the aim is to see complete nature as different aspects of one set of phenomena. 
That is the problem in basic theoretical physics, today — to find the laws behind 
experiment; to amalgamate these classes. Historically, we have always been able 
to amalgamate them, but as time goes on new things are found. We were amalga- 
mating very well, when all of a sudden x-rays were found. Then we amalgamated 
some more, and mesons were found. Therefore, at any stage of the game, it always 
looks rather messy. A great deal is amalgamated, but there are always many wires 
or threads hanging out in all directions. That is the situation today, which we shall 
try to describe. 

Some historic examples of amalgamation are the following. First, take heat 
and mechanics. When atoms are in motion, the more motion, the more heat the 
system contains, and so heat and all temperature effects can be represented by the 
laws of mechanics. Another tremendous amalgamation was the discovery of the 
relation between electricity, magnetism, and light, which were found to be dif- 
ferent aspects of the same thing, which we call today the electromagnetic field. 
Another amalgamation is the unification of chemical phenomena, the various 
properties of various substances, and the behavior of atomic particles, which is in 
the quantum mechanics of chemistry. 

The question is, of course, is it going to be possible to amalgamate everything, 
and merely discover that this world represents different aspects of one thing? 
Nobody knows. All we know is that as we go along, we find that we can amalga- 
mate pieces, and then we find some pieces that do not fit, and we keep trying to 
put the jigsaw puzzle together. Whether there are a finite number of pieces, and 
whether there is even a border to the puzzle, is of course unknown. It will never 
be known until we finish the picture, if ever. What we wish to do here is to see to 
what extent this amalgamation process has gone on, and what the situation is at 
present, in understanding basic phenomena in terms of the smallest set of principles. 
To express it in a simple manner, what are things made of and how few elements 
are there ? 
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2-2 Physics before 1920 

It is a little difficult to begin at once with the present view, so we shall first 
see how things looked in about 1920 and then take a few things out of that picture. 
Before 1920, our world picture was something like this: The "stage" on which 
the universe goes is the three-dimensional space of geometry, as described by 
Euclid, and things change in a medium called time. The elements on the stage are 
particles, for example the atoms, which have some properties. First, the property 
of inertia: if a particle is moving it keeps on going in the same direction unless 
forces act upon it. The second element, then, is forces, which were then thought 
to be of two varieties: First, an enormously complicated, detailed kind of inter- 
action force which held the various atoms in different combinations in a com- 
plicated way, which determined whether salt would dissolve faster or slower when 
we raise the temperature. The other force that was known was a long-range 
interaction — a smooth and quiet attraction — which varied inversely as the square 
of the distance, and was called gravitation. This law was known and was very 
simple. Why things remain in motion when they are moving, or why there is a 
law of gravitation was, of course, not known. 

A description of nature is what we are concerned with here. From this point 
of view, then, a gas, and indeed all matter, is a myriad of moving particles. Thus 
many of the things we saw while standing at the seashore can immediately be 
connected. First the pressure: this comes from the collisions of the atoms with 
the walls or whatever; the drift of the atoms, if they are all moving in one direc- 
tion on the average, is wind; the random internal motions are the heat. There are 
waves of excess density, where too many particles have collected, and so as they 
Tush off they push up piles of particles farther out, and so on. This wave of excess 
density is sound. It is a tremendous achievement to be able to understand so much. 
Some of these things were described in the previous chapter. 

What kinds of particles are there? There were considered to be 92 at that time: 
92 different kinds of atoms were ultimately discovered. They had different names 
associated with their chemical properties. 

The next part of the problem was, what are the short-range forces ? Why 
does carbon attract one oxygen or perhaps two oxygens, but not three oxygens? 
What is the machinery of interaction between atoms? Is it gravitation? The answer 
is no. Gravity is entirely too weak. But imagine a force analogous to gravity, 
varying inversely with the square of the distance, but enormously more powerful 
and having one difference. In gravity everything attracts everything else, but now 
imagine that there are two kinds of "things," and that this new force (which is 
the electrical force, of course) has the property that likes repel but unlikes attract. 
The "thing" that carries this strong interaction is called charge. 

Then what do we have? Suppose that we have two unlikes that attract each 
other, a plus and a minus, and that they stick very close together. Suppose we 
have another charge some distance away. Would it feel any attraction? It would 
feel practically none, because if the first two are equal in size, the attraction for 
the one and the repulsion for the other balance out. Therefore there is very little 
force at any appreciable distance. On the other hand, if we get very close with the 
extra charge, attraction arises, because the repulsion of likes and attraction of 
unlikes will tend to bring unlikes closer together and push likes farther apart. 
Then the repulsion will be less than the attraction. This is the reason why the atoms, 
which are constituted out of plus and minus electric charges, feel very little force 
when they are separated by appreciable distance (aside from gravity). When they 
come close together, they can "see inside" each other and rearrange their charges, 
with the result that they have a very strong interaction. The ultimate basis of 
an interaction between the atoms is electrical. Since this force is so enormous, all 
the plusses and all minuses will normally come together in as intimate a combina- 
tion as they can. All things, even ourselves, are made of fine-grained, enormously 
strongly interacting plus and minus parts, all neatly balanced out. Once in a while, 
by accident, we may rub off a few minuses or a few plusses (usually it is easier 
to rub off minuses), and in those circumstances we find the force of electricity 
unbalanced, and we can then see the effects of these electrical attractions. 
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To give an idea of how much stronger electricity is than gravitation, consider 
two grains of sand, a millimeter across, thirty meters apart. If the force between 
them were not balanced, if everything attracted everything else instead of likes 
repelling, so that there were no cancellation, how much force would there be? 
There would be a force of three million tons between the two! You see, there is 
very, very little excess or deficit of the number of negative or positive charges 
necessary to produce appreciable electrical effects. This is, of course, the reason 
why you cannot see the difference between an electrically charged or uncharged 
thing — so few particles are involved that they hardly make a difference in the weight 
or size of an object. 

With this picture the atoms were easier to understand. They were thought to 
have a "nucleus" at the center, which is positively electrically charged and very 
massive, and the nucleus is surrounded by a certain number of "electrons" which 
are very light and negatively charged. Now we go a little ahead in our story to 
remark that in the nucleus itself there were found two kinds of particles, protons 
and neutrons, almost of the same weight and very heavy. The protons are elec- 
trically charged and the neutrons are neutral. If we have an atom with six protons 
inside its nucleus, and this is surrounded by six electrons (the negative particles in 
the ordinary world of matter are all electrons, and these are very light compared 
with the protons and neutrons which make nuclei), this would be atom number 
six in the chemical table, and it is called carbon. Atom number eight is called 
oxygen, etc., because the chemical properties depend upon the electrons on the 
outside, and in fact only upon how many electrons there are. So the chemical 
properties of a substance depend only on a number, the number of electrons. (The 
whole list of elements of the chemists really could have been called 1, 2, 3, 4, 5, 
etc. Instead of saying "carbon," we could say "element six," meaning six electrons, 
but of course, when the elements were first discovered, it was not known that they 
could be numbered that way, and secondly, it would make everything look rather 
complicated. It is better to have names and symbols for these things, rather than 
to call everything by number.) 

More was discovered about the electrical force. The natural interpretation 
of electrical interaction is that two objects simply attract each other: plus against 
minus. However, this was discovered to be an inadequate idea to represent it. 
A more adequate representation of the situation is to say that the existence of the 
positive charge, in some sense, distorts, or creates a "condition" in space, so that 
when we put the negative charge in, it feels a force. This potentiality for produc- 
ing a force is called an electric field. When we put an electron in an electric field, 
we say it is "pulled." We then have two rules: (a) charges make a field, and 
(b) charges in fields have forces on them and move. The reason for this will be- 
come clear when we discuss the following phenomena: If we were to charge a body, 
say a comb, electrically, and then place a charged piece of paper at a distance and 
move the comb back and forth, the paper will respond by always pointing to the 
comb. If we shake it faster, it will be discovered that the paper is a little behind, 
there is a delay in the action. (At the first stage, when we move the comb rather 
slowly, we find a complication which is magnetism. Magnetic influences have to 
do with charges in relative motion, so magnetic forces and electric forces can really 
be attributed to one field, as two different aspects of exactly the same thing. A 
changing electric field cannot exist without magnetism.) If we move the charged 
paper farther out, the delay is greater. Then an interesting thing is observed. 
Although the forces between two charged objects should go inversely as the 
square of the distance, it is found, when we shake a charge, that the influence 
extends very much farther out than we would guess at first sight. That is, the effect 
falls off more slowly than the inverse square. 

Here is an analogy: If we are in a pool of water and there is a floating cork 
very close by, we can move it "directly" by pushing the water with another cork. 
If you looked only at the two corks, all you would see would be that one moved 
immediately in response to the motion of the other — there is some kind of "inter- 
action" between them. Of course, what we really do is to disturb the water; the 
water then disturbs the other cork. We could make up a "law" that if you pushed 
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the water a little bit, an object close by in the water would move. If it were farther 
away, of course, the second cork would scarcely move, for we move the water 
locally. On the other hand, if we jiggle the cork a new phenomenon is involved, 
in which the motion of the water moves the water there, etc., and waves travel 
away, so that by jiggling, there is an influence wry much farther out, an oscillatory 
influence, that cannot be understood from the direct interaction. Therefore the- 
idea of direct interaction must be replaced with the existence of the water, or-in- 
the electrical case, with what we call the electromagnetic field. 

The electromagnetic field can carry waves; some of these waves are light, 
others are used in radio broadcasts, but the general name is electromagnetic waves. 
These oscillatory waves can have various frequencies. The only thing that is really 
different from one wave to another is the frequency of oscillation. If we shake a 
charge back and forth more and more rapidly, and look at the effects, we get a 
whole series of different kinds of effects, which are all unified by specifying but 
one number, the number of oscillations per second. The usual "pickup" that we 
get from electric currents in the circuits in the walls of a building have a frequency 
of about one hundred cycles per second. If we increase the frequency to 500 or 
1000 kilocycles (1 kilocycle = 1000 cycles) per second, we are "on the air," for 
this is the frequency range which is used for radio broadcasts. (Of course it has 
nothing to do with the air! We can have radio broadcasts without any air.) If 
we again increase the frequency, we come into the range that is used for FM and 
TV. Going still further, we use certain short waves, for example for radar. Still 
higher, and we do not need an instrument to "see" the stuff, we can see it with the 
human eye. In the range of frequency from 5 X 10 14 to 5 X 10 15 cycles per 
second our eyes would see the oscillation of the charged comb, if we could shake it 
that fast, as red, blue, or violet light, depending on the frequency. Frequencies 
below this range are called infrared, and above it, ultraviolet. The fact that we 
can see in a particular frequency range makes that part of the electromagnetic 
spectrum no more impressive than the other parts from a physicist's standpoint, 
but from a human standpoint, of course, it is more interesting. If we go up even 
higher in frequency, we get x-rays. X-rays are nothing but very high-frequency 
light. If we go still higher, we get gamma rays. These two terms, x-rays and gamma 
rays, are used almost synonymously. Usually electromagnetic rays coming from 
nuclei are called gamma rays, while those of high energy from atoms are called 
x-rays, but at the same frequency they are indistinguishable physically, no matter 
what their source. If we go to still higher frequencies, say to 10 24 cycles per 
second, we find that we can make those waves artificially, for example with the 
synchrotron here at Caltech. We can find electromagnetic waves with stupendously 
high frequencies — with even a thousand times more rapid oscillation — in the waves 
found in cosmic rays. These waves cannot be controlled by us. 


Table 2-1 

The Electromagnetic Spectrum 


Frequency in 
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Light 


10 18 

A 

X-rays 
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> Particle 

10 24 

7-rays, “artificial” 


10 27 

7-rays, in cosmic rays , 
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2-3 Quantum physics 

Having described the idea of the electromagnetic field, and that this field 
can carry waves, we soon learn that these waves actually behave in a strange way 
which seems very unwavelike. At higher frequencies they behave much more like 
particles! It is quantum mechanics , discovered just after 1920, which explains this 
strange behavior. In the years before 1920, the picture of space as a three-dimen- 
sional space, and of time as a separate thing, was changed by Einstein, first into 
a combination which we call space-time, and then still further into a curved 
space-time to represent gravitation. So the "stage" is changed into space-time, 
and gravitation is presumably a modification of space-time. Then it was also found 
that the rules for the motions of particles were incorrect. The mechanical rules of 
"inertia" and "forces" are wrong — Newton's laws are wrong — in the world of atoms. 
Instead, it was discovered that things on a small scale behave nothing like things on 
a large scale. That is what makes physics difficult — and very interesting. It is hard 
because the way things behave on a small scale is so "unnatural"; we have no 
direct experience with it. Here things behave like nothing we know of, so that it is 
impossible to describe this behavior in any other than analytic ways. It is difficult, 
and takes a lot of imagination. 

Quantum mechanics has many aspects. In the first place, the idea that a 
particle has a definite location and a definite speed is no longer allowed; that is 
wrong. To give an example of how wrong classical physics is, there is a rule in 
quantum mechanics that says that one cannot know both where something is and 
how fast it is moving. The uncertainty of the momentum and the uncertainty of 
the position are complementary, and the product of the two is constant. We can 
write the law like this: Dx Dp 3 h/2p, but we shall explain it in more detail later. 
This rule is the explanation of a very mysterious paradox: if the atoms are made 
out of plus and minus charges, why don't the minus charges simply sit on top of 
the plus charges (they attract each other) and get so close as to completely cancel 
them out? Why are atoms so big? Why is the nucleus at the center with the 
electrons around it? It was first thought that this was because the nucleus was so 
big; but no, the nucleus is very small An atom has a diameter of about 10" 8 cm. 
The nucleus has a diameter of about 10" 13 cm. If we had an atom and wished to 
see the nucleus, we would have to magnify it until the whole atom was the size of 
a large room, and then the nucleus would be a bare speck which you could just 
about make out with the eye, but very nearly all the weight of the atom is in that 
infinitesimal nucleus. What keeps the electrons from simply falling in? This 
principle: If they were in the nucleus, we would know their position precisely, and 
the uncertainty principle would then require that they have a very large (but 
uncertain) momentum, i.e., a very large kinetic energy. With this energy they 
would break away from the nucleus. They make a compromise: they leave them- 
selves a little room for this uncertainty and then jiggle with a certain amount of 
minimum motion in accordance with this rule. (Remember that when a crystal 
is cooled to absolute zero, we said that the atoms do not stop moving, they still 
jiggle. Why? If they stopped moving, we would know where they were and that 
they had zero motion, and that is against the uncertainty principle. We cannot 
know where they are and how fast they are moving, so they must be continually 
wiggling in there!) 

Another most interesting change in the ideas and philosophy of science 
brought about by quantum mechanics is this: it is not possible to predict exactly 
what will happen in any circumstance. For example, it is possible to arrange an 
atom which is ready to emit light, and we can measure when it has emitted light 
by picking up a photon particle, which we shall describe shortly. We cannot, 
however, predict when it is going to emit the light or, with several atoms, which 
one is going to. You may say that this is because there are some internal "wheels" 
which we have not looked at closely enough. No, there are no internal wheels; 
nature, as we understand it today, behaves in such a way that it is fundamentally 
impossible to make a precise prediction of exactly what will happen in a given 
experiment. This is a horrible thing; in fact, philosophers have said before that 
one of the fundamental requisites of science is that whenever you set up the same 
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conditions, the same thing must happen. This is simply not true, it is not a funda- 
mental condition of science. The fact is that the same thing does not happen, 
that we can find only an average, statistically, as to what happens. Nevertheless, 
science has not completely collapsed. Philosophers, incidentally, say a great deal 
about what is absolutely necessary for science, and it is always, so far as one can 
see, rather naive, and probably wrong. For example, some philosopher or other 
said it is fundamental to the scientific effort that if an experiment is performed in, 
say, Stockholm, and then the same experiment is done in, say, Quito, the same 
results must occur. That is quite false. It is not necessary that science do that; 
it may be a fact of experience, but it is not necessary. For example, if one of the 
experiments is to look out at the sky and see the aurora borealis in Stockholm, 
you do not see it in Quito; that is a different phenomenon. "But," you say, "that 
is something that has to do with the outside; can you close yourself up in a box 
in Stockholm and pull down the shade and get any difference?" Surely. If we take 
a pendulum on a universal joint, and pull it out and let go, then the pendulum 
will swing almost in a plane, but not quite. Slowly the plane keeps changing in 
Stockholm, but not in Quito. The blinds are down, too. The fact that this happened 
does not bring on the destruction of science. What is the fundamental hypothesis of 
science, the fundamental philosophy? We stated it in the first chapter: the sole 
test of the validity of any idea is experiment. If it turns out that most experiments 
work out the same in Quito as they do in Stockholm, then those "most experi- 
ments" will be used to formulate some general law, and those experiments which 
do not come out the same we will say were a result of the environment near 
Stockholm. We will invent some way to summarize the results of the experiment, 
and we do not have to be told ahead of time what this way will look like. If we 
are told that the same experiment will always produce the same result, that is all 
very well, but if when we try it, it does not, then it does not. We just have to take 
what we see, and then formulate all the rest of our ideas in terms of our actual 
experience. 

Returning again to quantum mechanics and fundamental physics, we cannot 
go into details of the quantum-mechanical principles at this time, of course, be- 
cause these are rather difficult to understand. We shall assume that they are there, 
and go on to describe what some of the consequences are. One of the consequences 
is that things which we used to consider as waves also behave like particles, and 
particles behave like waves; in fact everything behaves the same way. There is no 
distinction between a wave and a particle. So quantum mechanics unifies the idea 
of the field and its waves, and the particles, all into one. Now it is true that when 
the frequency is low, the field aspect of the phenomenon is more evident, or more 
useful as an approximate description in terms of everyday experiences. But as the 
frequency increases, the particle aspects of the phenomenon become more evident 
with the equipment with which we usually make the measurements. In fact, al- 
though we mentioned many frequencies, no phenomenon directly involving a fre- 
quency has yet been detected above approximately 10 12 cycles per second. We 
only deduce the higher frequencies from the energy of the particles, by a rule which 
assumes that the particle-wave idea of quantum mechanics is valid. 

Thus we have a new view of electromagnetic interaction. We have a new kind 
of particle to add to the electron, the proton, and the neutron. That new particle 
is called a photon. The new view of the interaction of electrons and protons that 
is electromagnetic theory, but with everything quantum-mechanically correct, is 
called quantum electrodynamics. This fundamental theory of the interaction of 
light and matter, or electric field and charges, is our greatest success so far in 
physics. In this one theory we have the basic rules for all ordinary phenomena 
except for gravitation and nuclear processes. For example, out of quantum electro- 
dynamics come all known electrical, mechanical, and chemical laws: the laws for 
the collision of billiard balls, the motions of wires in magnetic fields, the specific 
heat of carbon monoxide, the color of neon signs, the density of salt, and the 
reactions of hydrogen and oxygen to make water are all consequences of this one 
law. All these details can be worked out if the situation is simple enough for us to 
make an approximation, which is almost never, but often we can understand more 
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or less what is happening. At the present time no exceptions are found to the 
quantum-electrodynamic laws outside the nucleus, and there we do not know 
whether there is an exception because we simply do not know what is going on in 
the nucleus. 

In principle, then, quantum electrodynamics is the theory of all chemistry, 
and of life, if life is ultimately reduced to chemistry and therefore just to physics 
because chemistry is already reduced (the part of physics which is involved in 
chemistry being already known). Furthermore, the same quantum electrodynamics, 
this great thing, predicts a lot of new things. In the first place, it tells the properties 
of very high-energy photons, gamma rays, etc. It predicted another very re- 
markable thing: besides the electron, there should be another particle of the 
same mass, but of opposite charge, called a positron, and these two, coming to- 
gether, could annihilate each other with the emission of light or gamma rays. 
(After all, light and gamma rays are all the same, they are just different points on 
a frequency scale.) The generalization of this, that for each particle there is an 
antiparticle, turns out to be true. In the case of electrons, the antiparticle has 
another name — it is called a positron, but for most other particles, it is called anti- 
so-and-so, like antiproton or antineutron. In quantum electrodynamics, two 
numbers are put in and most of the other numbers in the world are supposed to 
come out. The two numbers that are put in are called the mass of the electron and 
the charge of the electron. Actually, that is not quite true, for we have a whole 
set of numbers for chemistry which tells how heavy the nuclei are. That leads us 
to the next part. 

2-4 Nuclei and particles 

What are the nuclei made of, and how are they held together? It is found 
that the nuclei are held together by enormous forces. When these are released, 
the energy released is tremendous compared with chemical energy, in the same 
ratio as the atomic bomb explosion is to a TNT explosion, because, of course, 
the atomic bomb has to do with changes inside the nucleus, while the explosion 
of TNT has to do with the changes of the electrons on the outside of the atoms. 
The question is, what are the forces which hold the protons and neutrons together 
in the nucleus? Just as the electrical interaction can be connected to a particle, 
a photon, Yukawa suggested that the forces between neutrons and protons also 
have a field of some kind, and that when this field jiggles it behaves like a particle. 
Thus there could be some other particles in the world besides protons and neutrons, 
and he was able to deduce the properties of these particles from the already known 
characteristics of nuclear forces. For example, he predicted they should have a 
mass of two or three hundred times that of an electron; and lo and behold, in 
cosmic rays there was discovered a particle of the right mass! But it later turned 
out to be the wrong particle. It was called a m-meson, or muon. 

However, a little while later, in 1947 or 1948, another particle was found, 
the p-meson, or pion, which satisfied Yukawa's criterion. Besides the proton and 
the neutron, then, in order to get nuclear forces we must add the pion. Now, 
you say, "Oh great!, with this theory we make quantum nucleodynamics using 
the pions just like Yukawa wanted to do, and see if it works, and everything will 
be explained." Bad luck. It turns out that the calculations that are involved in 
this theory are so difficult that no one has ever been able to figure out what the 
consequences of the theory are, or to check it against experiment, and this has 
been going on now for almost twenty years! 

So we are stuck with a theory, and we do not know whether it is right or wrong, 
but we do know that it is a little wrong, or at least incomplete. While we have 
been dawdling around theoretically, trying to calculate the consequences of this 
theory, the experimentalists have been discovering some things. For example, 
they had already discovered this m-meson or muon, and we do not yet know where 
it fits. Also, in cosmic rays, a large number of other "extra" particles were found. 
It turns out that today we have approximately thirty particles, and it is very 
difficult to understand the relationships of all these particles, and what nature 
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, wants them for, or what the connections are from one to another. We do not today 
understand these various particles as different aspects of the same thing, and the 
fact that we have so many unconnected particles is a representation of the fact 
that we have so much unconnected information without a good theory. After 
the great successes of quantum electrodynamics, there is a certain amount of 
knowledge of nuclear physics which is rough knowledge, sort of half experience 
and half theory, assuming a type of force between protons and neutrons and see- 
ing what will happen, but not really understanding where the force comes from. 
Aside from that, we have made very little progress. We have collected an enor- 
mous number of chemical elements. In the chemical case, there suddenly appeared 
a relationship among these elements which was unexpected, and which is embodied 
in the periodic table of Mendeleev. For example, sodium and potassium are 
about the same in their chemical properties and are found in the same column 
in the Mendeleev chart. We have been seeking a Mendeleev-type chart for the 
new particles. One such chart of the new particles was made independently by 
Gell-Mann in the U.S.A. and Nishijima in Japan. The basis of their classification 
is a new number, like the electric charge, which can be assigned to each particle, 
called its "strangeness," S. This number is conserved, like the electric charge, in 
reactions which take place by nuclear forces. 

In Table 2-2 are listed all the particles. We cannot discuss them much at 
this stage, but the table will at least show you how much we do not know. Under- 
neath each particle its mass is given in a certain unit, called the Mev. One Mev 
is equal to 1.782 X 10 -27 gram. The reason this unit was chosen is historical, 
and we shall not go into it now. More massive particles are put higher up on the 
chart; we see that a neutron and a proton have almost the same mass. In vertical 
columns we have put the particles with the same electrical charge, all neutral 
objects in one column, all positively charged ones to the right of this one, and all 
negatively charged objects to the left. 

Particles are shown with a solid line and "resonances" with a dashed one. 
Several particles have been omitted from the table. These include the important 
zero-mass, zero-charge particles, the photon and the graviton, which do not fall into 
the baryon-meson-lepton classification scheme, and also some of the newer 
resonances (K*, <p, ri). The antiparticles of the mesons are listed in the table, but 
the antiparticles of the leptons and baryons would have to be listed in another 
table which would look exactly like this one reflected on the zero-charge column. 
Although all of the particles except the electron, neutrino, photon, graviton, and 
proton are unstable, decay products have been shown only for the resonances. 
Strangeness assignments are not applicable for leptons, since they do not interact 
strongly with nuclei. 

All particles which are together with the neutrons and protons are called 
baryons , and the following ones exist: There is a "lambda," with a mass of 1154 
Mev, and three others, called sigmas, minus, neutral, and plus, with several masses 
almost the same. There are groups or multiplets with almost the same mass, within 
one or two percent. Each particle in a multiplet has the same strangeness. The 
first multiplet is the proton-neutron doublet, and then there is a singlet (the 
lambda) then the sigma triplet, and finally the xi doublet. Very recently, in 1961, 
even a few more particles were found. Or are they particles? They live so short a 
time, they disintegrate almost instantaneously, as soon as they are formed, that 
we do not know whether they should be considered as new particles, or some kind 
of "resonance" interaction of a certain definite energy between the A and T products 
into which they disintegrate. 

In addition to the baryons the other particles which are involved in the nuclear 
interaction are called mesons. There are first the pions, which come in three varie- 
ties, positive, negative, and neutral; they form another multiplet. We have also 
found some new things called A'-mesons, and they occur as a doublet, K + and 
K°. Also, every particle has its antiparticle, unless a particle is its own antiparticle. 
For example, the i t~ and the 7r + are antiparticles, but the 7r° is its own antiparticle. 
The K~ and K + are antiparticles, and the K° and K°. In addition, in 1961 we also 
found some more mesons or maybe mesons which disintegrate almost immediately. 
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A thing called w which goes into three pions has a mass 780 on this scale, and 
somewhat less certain is an object which disintegrates into two pions. These parti- 
cles, called mesons and baryons, and the antiparticles of the mesons are on the 
same chart, but the antiparticles of the baryons must be put on another chart, 
"reflected" through the charge-zero column. 

Just as Mendeleev's chart was very good, except for the fact that there were 
a number of rare earth elements which were hanging out loose from it, so we have 
a number of things hanging out loose from this chart — particles which do not 
interact strongly in nuclei, have nothing to do with a nuclear interaction, and do 
not have a strong interaction (I mean the powerful kind of interaction of nuclear 
energy). These are called leptons, and they are the following: there is the electron, 
which has a very small mass on this scale, only 0.510 Mev. Then there is that 
other, the fx- meson, the muon, which has a mass much higher, 206 times as heavy 
as an electron. So far as we can tell, by all experiments so far, the difference 
between the electron and the muon is nothing but the mass. Everything works 
exactly the same for the muon as for the electron, except that one is heavier than 
the other. Why is there another one heavier; what is the use for it? We do not 
know. In addition, there is a lepton which is neutral, called a neutrino, and this 
particle has zero mass. In fact, it is now known that there are two different kinds 
of neutrinos, one related to electrons and the other related to muons. 

Finally, we have two other particles which do not interact strongly with the 
nuclear ones: one is a photon, and perhaps, if the field of gravity also has a quan- 
tum-mechanical analog (a quantum theory of gravitation has not yet been worked 
out), then there will be a particle, a graviton, which will have zero mass. 

What is this "zero mass"? The masses given here are the masses of the 
particles at rest . The fact that a particle has zero mass means, in a way, that it 
cannot be at rest . A photon is never at rest, it is always moving at 186,000 miles a 
second. We will understand more what mass means when we understand the theory 
of relativity, which will come in due time. 

Thus we are confronted with a large number of particles, which together seem 
to be the fundamental constituents of matter. Fortunately, these particles are 
not all different in their interactions with one another. In fact, there seem to be 
just four kinds of interaction between particles which, in the order of decreasing 
strength, are the nuclear force, electrical interactions, the beta-decay interaction, ^ 
and gravity. The photon is coupled to all charged particles and the strength of 
the interaction is measured by some number, which is 1/137. The detailed law 
of this coupling is known, that is quantum electrodynamics. • Gravity is coupled 
to all energy , but its coupling is extremely weak, much weaker than that of elec- 
tricity. This law is also known. Then there are the so-called weak decays — 
beta decay, which causes the neutron to disintegrate into proton, electron, and 
neutrino, relatively slowly. This law is only partly known. The so-called strong 
interaction, the meson-baryon interaction, has a strength of 1 in this scale, and the 
law is completely unknown, although there are a number of known rules, such 
as that the number of baryons does not change in any reaction. 


Table 2-3. Elementary Interactions 


Coupling 

Strength* 

Law 

Photon to charged particles 

~10“ 2 

Law known 

Gravity to all energy 

~10“ 40 

Law known 

Weak decays 

~10 -5 

Law partly known 

Mesons to baryons 

~1 

Law unknown (some rules known) 


* The "strength" is a dimensionless measure of the coupling constant involved in each 
interaction (~ means "approximately"). 
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This then, is the horrible condition of our physics today. To summarize it, 
I would say this: outside the nucleus, we seem to know all; inside it, quantum 
mechanics is valid — the principles of quantum mechanics have not been found to 
fail. The stage on which we put all of our knowledge, we would say, is relativistic 
space-time; perhaps gravity is involved in space-time. We do not know how the 
universe got started, and we have never made experiments which check our ideas 
of space and time accurately, below some tiny distance, so we only know that 
our ideas work above that distance. We should also add that the rules of the game 
are the quantum-mechanical principles, and those principles apply, so far as we 
can tell, to the new particles as well as to the old. The origin of the forces in 
nuclei leads us to new particles, but unfortunately they appear in great profusion 
and we lack a complete understanding of their interrelationship, although we 
already know that there are some very surprising relationships among them. 
We seem gradually to be groping toward an understanding of the world of sub- 
atomic particles, but we really do not know how far we have yet to go in this task. 
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3 


The Relation of Physics to Other Sciences 


3-1 Introduction 

Physics is the most fundamental and all-inclusive of the sciences, and has 
had a profound effect on all scientific development. In fact, physics is the present- 
day equivalent of what used to be called natural philosophy, from which most of 
our modern sciences arose. Students of many fields find themselves studying 
physics because of the basic role it plays in all phenomena. In this chapter we 
shall try to explain what the fundamental problems in the other sciences are, 
but of course it is impossible in so small a space really to deal with the complex, 
subtle, beautiful matters in these other fields. Lack of space also prevents our 
discussing the relation of physics to engineering, industry, society, and war, or 
even the most remarkable relationship between mathematics and physics. (Mathe- 
matics is not a science from our point of view, in the sense that it is not a natural 
science. The test of its validity is not experiment.) We must, incidentally, make it 
clear from the beginning that if a thing is not a science, it is not necessarily bad. 
For example, love is not a science. So, if something is said not to be a science, 
it does not mean that there is something wrong with it; it just means that it is not 
a science. 


3-1 Introduction 

3-2 Chemistry 

3-3 Biology 

3-4 Astronomy 

3-5 Geology 

3-6 Psychology 

3-7 How did it get that way? 


3-2 Chemistry 

The science which is perhaps the most deeply affected by physics is chemistry. 
Historically, the early days of chemistry dealt almost entirely with what we now call 
inorganic chemistry, the chemistry of substances which are not associated with 
living things. Considerable analysis was required to discover the existence of the 
many elements and their relationships — how they make the various relatively 
simple compounds found in rocks, earth, etc. This early chemistry was very 
important for physics. The interaction between the two sciences was very great 
because the theory of atoms was substantiated to a large extent by experiments 
in chemistry. The theory of chemistry, i.e., of the reactions themselves, was 
summarized to a large extent in the periodic chart of Mendeleev, which brings out 
many strange relationships among the various elements, and it was the collection 
of rules as to which substance is combined with which, and how, that constituted 
inorganic chemistry. All these rules were ultimately explained in principle by 
quantum mechanics, so that theoretical chemistry is in fact physics. On the 
other hand, it must be emphasized that this explanation is in principle. We have 
already discussed the difference between knowing the rules of the game of chess, 
and being able to play. So it is that we may know the rules, but we cannot play 
very well. It turns out to be very difficult to predict precisely what will happen in 
a given chemical reaction; nevertheless, the deepest part of theoretical chemistry 
must end up in quantum mechanics. 

There is also a branch of physics and chemistry which was developed by both 
sciences together, and which is extremely important. This is the method of 
statistics applied in a situation in which there are mechanical laws, which is aptly 
called statistical mechanics. In any chemical situation a large number of atoms are 
involved, and we have seen that the atoms are all jiggling around in a very random 
and complicated way. If we could analyze each collision, and be able to follow 
in detail the motion of each molecule, we might hope to figure out what would 
happen, but the many numbers needed to keep track of all these molecules ex- 
ceeds so enormously the capacity of any computer, and certainly the capacity of 
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the mind, that it was important to develop a method for dealing with such com- 
plicated situations. Statistical mechanics, then, is the science of the phenomena 
of heat, or thermodynamics. Inorganic chemistry is, as a science, now reduced 
essentially to what are called physical chemistry and quantum chemistry; physical 
chemistry to study the rates at which reactions occur and what is happening in 
detail (How do the molecules hit? Which pieces fly off first?, etc.), and quantum 
chemistry to help us understand what happens in terms of the physical laws. 

The other branch of chemistry is organic chemistry , the chemistry of the 
substances which are associated with living things. For a time it was believed 
that the substances which are associated with living things were so marvelous 
that they could not be made by hand, from inorganic materials. This is not at 
all true — they are just the same as the substances made in inorganic chemistry, 
but more complicated arrangements of atoms are involved. Organic chemistry 
obviously has a very close relationship to the biology which supplies its substances, 
and to industry, and furthermore, much physical chemistry and quantum mechanics 
can be applied to organic as well as to inorganic compounds. However, the main 
problems of organic chemistry are not in these aspects, but rather in the analysis 
and synthesis of the substances which are formed in biological systems, in living 
things. This leads imperceptibly, in steps, toward biochemistry, and then into 
biology itself, or molecular biology. 

3-3 Biology 

Thus we come to the science of biology, which is the study of living things. 
In the early days of biology, the biologists had to deal with the purely descriptive 
problem of finding out what living things there were, and so they just had to 
count such things as the hairs of the limbs of fleas. After these matters were worked 
out with a great deal of interest, the biologists went into the machinery inside the 
living bodies, first from a gross standpoint, naturally, because it takes some effort 
to get into the finer details. 

There was an interesting early relationship between physics and biology in 
which biology helped physics in the discovery of the conservation of energy, which 
was first demonstrated by Mayer in connection with the amount of heat taken in 
and given out by a living creature. 

If we look at the processes of biology of living animals more closely, we see 
many physical phenomena: the circulation of blood, pumps, pressure, etc. There 
are nerves: we know what is happening when we step on a sharp stone, and that 
somehow or other the information goes from the leg up. It is interesting how that 
happens. In their study of nerves, the biologists have come to the conclusion that 
nerves are very fine tubes with a complex wall which is very thin; through this 
wall the cell pumps ions, so that there are positive ions on the outside and nega- 
tive ions on the inside, like a capacitor. Now this membrane has an interesting 
property; if it "discharges" in one place, i.e., if some of the ions were able to move 
through one place, so that the electric voltage is reduced there, that electrical 
influence makes itself felt on the ions in the neighborhood, and it affects the 
membrane in such a way that it lets the ions through at neighboring points also. 
This in turn affects it farther along, etc., and so there is a wave of "penetrability" 
of the membrane which runs down the fiber when it is "excited" at one end by 
stepping on the sharp stone. This wave is somewhat analogous to a long sequence 
of vertical dominoes; if the end one is pushed over, that one pushes the next, 
etc. Of course this will transmit only one message unless the dominoes are set 
up again; and similarly in the nerve cell, there are processes which pump the ions 
slowly out again, to get the nerve ready for the next impulse. So it is that we know 
what we are doing (or at least where we are). Of course the electrical effects 
associated with this nerve impulse can be picked up with electrical instruments, 
and because there are electrical effects, obviously the physics of electrical effects 
has had a great deal of influence on understanding the phenomenon. 

The opposite effect is that, from somewhere in the brain, a message is sent 
out along a nerve. What happens at the end of the nerve? There the nerve branches 
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out into fine little things, connected to a structure near a muscle, called an end- 
plate. For reasons which are not exactly understood, when the impulse reaches 
the end of the nerve, little packets of a chemical called acetylcholine are shot off 
(five or ten molecules at a time) and they affect the muscle fiber and make it con- 
tract — how simple! What makes a muscle contract? A muscle is a very large num- 
ber of fibers close together, containing two different substances, myosin and 
actomyosin, but the machinery by which the chemical reaction induced by acetyl- 
choline can modify the dimensions of the molecule is not yet known. Thus the 
fundamental processes in the muscle that make mechanical motions are not known. 

Biology is such an enormously wide field that there are hosts of other problems 
that we cannot mention at all — problems on how vision works (what the light does 
in the eye), how hearing works, etc. (The way in which thinking works we shall 
discuss later under psychology.) Now, these things concerning biology which 
we have just discussed are, from a biological standpoint, really not fundamental, 
at the bottom of life, in the sense that even if we understood them we still would 
not understand life itself. To illustrate: the men who study nerves feel their work 
is very important, because after all you cannot have animals without nerves. 
But you can have life without nerves. Plants have neither nerves nor muscles, 
but they are working, they are alive, just the same. So for the fundamental prob- 
lems of biology we must look deeper; when we do, we discover that all living 
things have a great many characteristics in common. The most common feature 
is that they are made of cells , within each of which is complex machinery for doing 
things chemically. In plant cells, for example, there is machinery for picking up 
light and generating sucrose, which is consumed in the dark to keep the plant 
alive. When the plant is eaten the sucrose itself generates in the animal a series 
of chemical reactions very closely related to photosynthesis (and its opposite 
effect in the dark) in plants. 

In the cells of living systems there are many elaborate chemical reactions, 
in which one compound is changed into another and another. To give some im- 
pression of the enormous efforts that have gone into the study of biochemistry, 
the chart in Fig. 3-1 summarizes our knowledge to date on just one small part of 
the many series of reactions which occur in cells, perhaps a percent or so of it. 

Here we see a whole series of molecules which change from one to another 
in a sequence or cycle of rather small steps. It is called the Krebs cycle, the respira- 
tory cycle. Each of the chemicals and each of the steps is fairly simple, in terms 
of what change is made in the molecule, but — and this is a centrally important 
discovery in biochemistry — these changes are relatively difficult to accomplish in a 
laboratory. If we have one substance and another very similar substance, the one 
does not just turn into the other, because the two forms are usually separated by 
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Fig. 3-1 . The Krebs cycle. 
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an energy barrier or "hill." Consider this analogy: If we wanted to take an object 
from one place to another, at the same level but on the other side of a hill, we could 
push it over the top, but to do so requires the addition of some energy. Thus 
most chemical reactions do not occur, because there is what is called an activa- 
tion energy in the way. In order to add an extra atom to our chemical requires 
that we get it close enough that some rearrangement can occur; then it will stick. 
But if we cannot give it enough energy to get it close enough, it will not go to com- 
pletion, it will just go part way up the "hill" and back down again. However, 
if we could literally take the molecules in our hands and push and pull the atoms 
around in such a way as to open a hole to let the new atom in, and then let it snap 
back, we would have found another way, around the hill, which would not require 
extra energy, and the reaction would go easily. Now there actually are , in the cells, 
very large molecules, much larger than the ones whose changes we have been de- 
scribing, which in some complicated way hold the smaller molecules just right, so 
that the reaction can occur easily. These very large and complicated things are 
called enzymes. (They were first called ferments, because they were originally 
discovered in the fermentation of sugar. In fact, some of the first reactions in 
the cycle were discovered there.) In the presence of an enzyme the reaction will go. 

An enzyme is made of another substance called protein. Enzymes are very 
big and complicated, and each one is different, each being built to control a certain 
special reaction. The names of the enzymes are written in Fig. 3-1 at each reaction. 
(Sometimes the same enzyme may control two reactions.) We emphasize that the 
enzymes themselves are not involved in the reaction directly. They do not change; 
they merely let an atom go from one place to another. Having done so, the enzyme 
is ready to do it to the next molecule, like a machine in a factory. Of course, there 
must be a supply of certain atoms and a way of disposing of other atoms. Take 
hydrogen, for example: there are enzymes which have special units on them which 
carry the hydrogen for all chemical reactions. For example, there are three or four 
hydrogen-reducing enzymes which are used all over our cycle in different places. 
It is interesting that the machinery which liberates some hydrogen at one place 
will take that hydrogen and use it somewhere else. 

The most important feature of the cycle of Fig. 3-1 is the transformation 
from GDP to GTP (guanadine-di-phosphate to guanadine-tri-phosphate) because 
the one substance has much more energy in it than the other. Just as there is a 
"box" in certain enzymes for carrying hydrogen atoms around, there are special 
energy-carrying "boxes" which involve the triphosphate group. So, GTP has more 
energy than GDP and if the cycle is going one way, we are producing molecules 
which have extra energy and which can go drive some other cycle which requires 
energy, for example the contraction of muscle. The muscle will not contract 
unless there is GTP. We can take muscle fiber, put it in water, and add GTP, 
and the fibers contract, changing GTP to GDP if the right enzymes are present. 
So the real system is in the GDP-GTP transformation; in the dark the GTP 
which has been stored up during the day is used to run the whole cycle around the 
other way. An enzyme you see, does not care in which direction the reaction goes, 
for if it did it would violate one of the laws of physics. 

Physics is of great importance in biology and other sciences for still another 
reason, that has to do with experimental techniques. In fact, if it were not for the 
great development of experimental physics, these biochemistry charts would not 
be known today. The reason is that the most useful tool of all for analyzing this 
fantastically complex system is to label the atoms which are used in the reactions. 
Thus, if we could introduce into the cycle some carbon dioxide which has a 
"green mark" on it, and then measure after three seconds where the green mark 
is, and again measure after ten seconds, etc., we could trace out the course of the 
reactions. What are the "green marks"? They are different isotopes. We recall 
that the chemical properties of atoms are determined by the number of electrcns , 
not by the mass of the nucleus. But there can be, for example in carbon, six 
neutrons or seven neutrons, together with the six protons which all carbon nuclei 
have. Chemically, the two atoms C12 and C13 are the same, but they differ in 
weight and they have different nuclear properties, and so they are distinguishable. 
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By using these isotopes of different weights, or even radioactive isotopes like C14, 
which provide a more sensitive means for tracing very small quantities, it is pos- 
sible to trace the reactions. 

Now, we return to the description of enzymes and proteins. All proteins are 
not enzymes, but all enzymes are proteins. There are many proteins, such as the 
proteins in muscle, the structural proteins which are, for example, in cartilage and 
hair, skin, etc., that are not themselves enzymes. However, proteins are a very 
characteristic substance of life: first of all they make up all the enzymes, and 
second, they make up much of the rest of living material. Proteins have a very 
interesting and simple structure. They are a series, or chain, of different ammo 
acids. There are twenty different amino acids, and they all can combine with 
each other to form chains in which the backbone is CO-NH, etc. Proteins are 
nothing but chains of various ones of these twenty amino acids. Each of the amino 
acids probably serves some special purpose. Some, for example, have a sulphur 
atom at a certain place; when two sulphur atoms are in the same protein, they 
form a bond, that is, they tie the chain together at two points and form a loop. 
Another has extra oxygen atoms which make it an acidic substance, another has 
a basic characteristic. Some of them have big groups hanging out to one side, so - 
that they take up a lot of space. One of the amino acids, called prolene, is not 
really an amino acid, but imino acid. There is a slight difference, with the result 
that when prolene is in the chain, there is a kink in the chain. If we wished to 
manufacture a particular protein, we would give these instructions: put one of 
those sulphur hooks here; next, add something to take up space; then attach some- 
thing to put a kink in the chain. In this way, we will get a complicated-looking 
chain, hooked together and having some complex structure; this is presumably 
just the manner in which all the various enzymes are made. One of the great tri- 
umphs in recent times (since 1960), was at last to discover the exact spatial atomic 
arrangement of certain proteins, which involve some fifty-six or sixty amino acids 
in a row. Over a thousand atoms (more nearly two thousand, if we count the 
hydrogen atoms) have been located in a complex pattern in two proteins. The 
first was hemoglobin. One of the sad aspects of this discovery is that we cannot see 
anything from the pattern; we do not understand why it works the way it does. 
Of course, that is the next problem to be attacked. 

Another problem is how do the enzymes know what to be? A red-eyed fly 
makes a red-eyed fly baby, and so the information for the whole pattern of enzymes 
to make red pigment must be passed from one fly to the next. This is done by a 
substance in the nucleus of the cell, not a protein, called DNA (short for des- 
oxyribose nucleic acid). This is the key substance which is passed from one cell 
to another (for instance, sperm cells consist mostly of DNA) and carries the 
information as to how to make the enzymes. DNA is the "blueprint." What does 
the blueprint look like and how does it work? First, the blueprint must be able 
to reproduce itself. Secondly, it must be able to instruct the protein. Concerning 
the reproduction, we might think that this proceeds like cell reproduction. Cells 
simply grow bigger and then divide in half. Must it be thus with DNA molecules, 
then, that they too grow bigger and divide in half? Every atom certainly does not 
grow bigger and divide in half! No, it is impossible to reproduce a molecule 
except by some more clever way. 

The structure of the substance DNA was studied for a long time, first chemi- 
cally to find the composition, and then with x-rays to find the pattern in space. 
The result was the following remarkable discovery: The DNA molecule is a pair 
of chains, twisted upon each other. The backbone of each of these chains, which 
are analogous to the chains of proteins but chemically quite different, is a series 
of sugar and phosphate groups, as shown in Fig. 3-2. Now we see how the chain 
can contain instructions, for if we could split this chain down the middle, we would 
have a series BAADC . . . and every living thing could have a different series. 
Thus perhaps, in some way, the specific instructions for the manufacture of pro- 
teins are contained in the specific series of the DNA. 

Attached to each sugar along the line, and linking the two chains together, are 
certain pairs of cross-links. However, they are not all of the same kind; there are 
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four kinds, called adenine, thymine, cytosine, and guanine, but let us call them 
A, B, C, and D. The interesting thing is that only certain pairs can sit opposite 
each other, for example A with B and C with D. These pairs are put on the two 
chains in such a way that they "fit together," and have a strong energy of interac- 
tion. However, C will not fit with A, and B will not fit with C; they will only fit 
in pairs, A against B and C against D. Therefore if one is C, the other must be 
D, etc. Whatever the letters may be in one chain, each one must have its specific 
complementary letter on the other chain. 

What then about reproduction? Suppose we split this chain in two. How 
can we make another one just like it? If, in the substances of the cells, there is a 
manufacturing department which brings up phosphate, sugar, and A, B, C, D 
units not connected in a chain, the only ones which will attach to our split chain 
will be the correct ones, the complements of BAADC . . ., namely, ABBCD ... 
Thus what happens is that the chain splits down the middle during cell division, 
one half ultimately to go with one cell, the other half to end up in the other cell; 
when separated, a new complementary chain is made by each half-chain. 

Next comes the question, precisely how does the order of the A, B, C, D units 
determine the arrangement of the amino acids in the protein? This is the central 
unsolved problem in biology today. The first clues, or pieces of information, 
however, are these: There are in the cell tiny particles called microsomes, and 
it is now known that that is the place where proteins are made. But the micro- 
somes are not in the nucleus, where the DNA and its instructions are. Something 
seems to be the matter. However, it is also known that little molecule pieces come 
off the DNA — not as long as the big DNA molecule that carries all the informa- 
tion itself, but like a small section of it. This is called RNA, but that is not essential. 
It is a kind of copy of the DNA, a short copy. The RNA, which somehow carries 
a message as to what kind of protein to make goes over to the microsome; that 
is known. When it gets there, protein is synthesized at the microsome. That is 
also known. However, the details of how the amino acids come in and are arranged 
in accordance with a code that is on the RNA are, as yet, still unknown. We do 
not know how to read it. If we knew, for example, the "lineup" A, B, C, C, A , 
we could not tell you what protein is to be made. 

Certainly no subject or field is making more progress on so many fronts at 
the present moment, than biology, and if we were to name the most powerful 
assumption of all, which leads one on and on in an attempt to understand life, 
it is that all things are made of atoms, and that everything that living things do can 
be understood in terms of the jigglings and wigglings of atoms. 


3-4 Astronomy 

In this rapid-fire explanation of the whole world, we must now turn to 
astronomy. Astronomy is older than physics. In fact, it got physics started by 
showing the beautiful simplicity of the motion of the stars and planets, the under- 
standing of which was the beginning of physics. But the most remarkable discovery 
in all of astronomy is that the stars are made of atoms of the same kind as those on 
the earth* How was this done? Atoms liberate light which has definite fre- 

* How I'm rushing through this! How much each sentence in this brief story contains. 
"The stars are made of the same atoms as the earth." I usually pick one small topic like 
this to give a lecture on. Poets say science takes away from the beauty of the stars — mere 
globs of gas atoms. Nothing is "mere." I too can see the stars on a desert night, and 
feel them. But do I see less or more ? The vastness of the heavens stretches my imagina- 
tion — stuck on this carousel my little eye can catch one-million-year-old light. A vast 
pattern — of which I am a part — perhaps my stuff was belched from some forgotten 
star, as one is belching there. Or see them with the greater eye of Palomar, rushing all 
apart from some common starting point when they were perhaps all together. What 
is the pattern, or the meaning, or the why ? It does not do harm to the mystery to know 
a little about it. For far more marvelous is the truth than any artists of the past imagined! 
Why do the poets of the present not speak of it ? What men are poets who can speak of 
Jupiter if he were like a man, but if he is an immense spinning sphere of methane and 
ammonia must be silent? 
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quencies, something like the timbre of a musical instrument, which has definite 
pitches or frequencies of sound. When we are listening to several different tones 
we can tell them apart, but when we look with our eyes at a mixture of colors we 
cannot tell the parts from which it was made, because the eye is nowhere near as 
discerning as the ear in this connection. However, with a spectroscope we can 
analyze the frequencies of the light waves and in this way we can see the very tunes 
of the atoms that are in the different stars. As a matter of fact, two of the chemical 
elements were discovered on a star before they were discovered on the earth. 
Helium was discovered on the sun, whence its name, and technetium was dis- 
covered in certain cool stars. This, of course, permits us to make headway in 
understanding the stars, because they are made of the same kinds of atoms which 
are on the earth. Now we know a great deal about the atoms, especially con- 
cerning their behavior under conditions of high temperature but not very great 
density, so that we can analyze by statistical mechanics the behavior of the stellar 
substance. Even though we cannot reproduce the conditions on the earth, using 
the basic physical laws we often can tell precisely, or very closely, what will happen. 
So it is that physics aids astronomy. Strange as it may seem, we understand the 
distribution of matter in the interior of the sun far better than we understand the 
interior of the earth. What goes on inside a star is better understood than one might 
guess from the difficulty of having to look at a little dot of light through a telescope, 
because we can calculate what the atoms in the stars should do in most circum- 
stances. 

One of the most impressive discoveries was the origin of the energy of the 
stars, that makes them continue to burn. One of the men who discovered this was 
out with his girl friend the night after he realized that nuclear reactions must be 
going on in the stars in order to make them shine. She said "Look at how pretty 
the stars shine!" He said "Yes, and right now I am the only man in the world 
who knows why they shine." She merely laughed at him. She was not impressed 
with being out with the only man who, at that moment, knew why stars shine. 
Well, it is sad to be alone, but that is the way it is in this world. 

It is the nuclear "burning" of hydrogen which supplies the energy of the sun; 
the hydrogen is converted into helium. Furthermore, ultimately, the manufacture 
of various chemical elements proceeds in the centers of the stars, from hydrogen. 
The stuff of which we are made, was "cooked" once, in a star, and spit out. How 
do we know? Because there is a clue. The proportion of the different isotopes — 
how much C 12 , how much C 13 , etc., is something which is never changed by 
chemical reactions, because the chemical reactions are so much the same for the 
two. The proportions are purely the result of nuclear reactions. By looking at the 
proportions of the isotopes in the cold, dead ember which we are, we can discover 
what the furnace was like in which the stuff of which we are made was formed. 
That furnace was like the stars, and so it is very likely that our elements were 
"made" in the stars and spit out in the explosions which we call novae and super- 
novae. Astronomy is so close to physics that we shall study many astronomical 
things as we go along. 

3-5 Geology 

We turn now to what are called earth sciences , or geology. First, meteorology 
and the weather. Of course the instruments of meteorology are physical instru- 
ments, and the development of experimental physics made these instruments 
possible, as was explained before. However, the theory of meteorology has never 
been satisfactorily worked out by the physicist. "Well," you say, "there is nothing 
but air, and we know the equations of the motions of air." Yes we do. "So if 
we know the condition of air today, why can't we figure out the condition of the 
air tomorrow?" First, we do not really know what the condition is today, because 
the air is swirling and twisting everywhere. It turns out to be very sensitive, and 
even unstable. If you have ever seen water run smoothly over a dam, and then 
turn into a large number of blobs and drops as it falls, you will understand what I 
mean by unstable. You know the condition of the water before it goes over the 
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spillway; it is perfectly smooth; but the moment it begins to fall, where do the 
drops begin? What determines how big the lumps are going to be and where they 
will be? That is not known, because the water is unstable. Even a smooth moving 
mass of air, in going over a mountain turns into complex whirlpools and eddies. 
In many fields we find this situation of turbulent flow that we cannot analyze today. 
Quickly we leave the subject of weather, and discuss geology! 

The question basic to geology is, what makes the earth the way it is? The 
most obvious processes are in front of your very eyes, the erosion processes of 
the rivers, the winds, etc. It is easy enough to understand these, but for every bit 
of erosion there is an equal amount of something else going on. Mountains are 
no lower today, on the average, than they were in the past. There must be moun- 
tsim-forming processes. You will find, if you study geology, that there are 
mountain-forming processes and vulcanism, which nobody understands but which 
is half of geology. The phenomenon of volcanoes is really not understood. What 
makes an earthquake is, ultimately, not understood. It is understood that if 
something is pushing something else, it snaps and will slide — that is all right. 
But what pushes, and why? The theory is that there are currents inside the earth — 
circulating currents, due to the difference in temperature inside and outside — 
which, in their motion, push the surface slightly. Thus if there are two opposite 
circulations next to each other, the matter will collect in the region where they 
meet and make belts of mountains which are in unhappy stressed conditions, and 
so produce volcanoes and earthquakes. 

What about the inside of the earth? A great deal is known about the speed of 
earthquake waves through the earth and the density of distribution of the earth. 
However, physicists have been unable to get a good theory as to how dense a 
substance should be at the pressures that would be expected at the center of the 
earth. In other words, we cannot figure out the properties of matter very well in 
these circumstances. We do much less well with the earth than we do with the 
conditions of matter in the stars. The mathematics involved seems a little too 
difficult, so far, but perhaps it will not be too long before someone realizes that 
it is an important problem, and really work it out. The other aspect, of course, is 
that even if we did know the density, we cannot figure out the circulating currents. 
Nor can we really work out the properties of rocks at high pressure. We cannot 
tell how fast the rocks should "give"; that must all be worked out by experiment. 

3-6 Psychology 

Next, we consider the science of psychology. Incidentally, psychoanalysis is 
not a science: it is at best a medical process, and perhaps even more like witch- 
doctoring. It has a theory as to what causes disease — lots of different "spirits," 
etc. The witch doctor has a theory that a disease like malaria is caused by a spirit 
which comes into the air; it is not cured by shaking a snake over it, but quinine 
does help malaria. So, if you are sick, I would advise that you go to the witch 
doctor because he is the man in the tribe who knows the most about the disease; 
on the other hand, his knowledge is not science. Psychoanalysis has not been 
checked carefully by experiment, and there is no way to find a list of the number 
of cases in which it works, the number of cases in which it does not work, etc. 

The other branches of psychology, which involve things like the physiology 
of sensation — what happens in the eye, and what happens in the brain — are, if 
you wish, less interesting. But some small but real progress has been made in 
studying them. One of the most interesting technical problems may or may not 
be called psychology. The central problem of the mind, if you will, or the nervous 
system, is this: when an animal learns something, it can do something different 
than it could before, and its brain cell must have changed too, if it is made out of 
atoms. In what way is it different ? We do not know where to look, or what to 
look for, when something is memorized. We do not know what it means, or what 
change there is in the nervous system, when a fact is learned. This is a very impor- 
tant problem which has not been solved at all. Assuming, however, that there is 
some kind of memory thing, the brain is such an enormous mass of interconnect- 
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ing wires and nerves that it probably cannot be analyzed in a straightforward 
manner. There is an analog of this to computing machines and computing ele- 
ments, in that they also have a lot of lines, and they have some kind of element, 
analogous, perhaps, to the synapse, or connection of one nerve to another. This 
is a very interesting subject which we have not the time to discuss further — the 
relationship between thinking and computing machines. It must be appreciated, 
of course, that this subject will tell us very little about the real complexities of 
ordinary human behavior. All human beings are so different. It will be a long 
time before we get there. We must start much further back. If we could even figure 
out how a dog works, we would have gone pretty far. Dogs are easier to under- 
stand, but nobody yet knows how dogs work. 

3-7 How did it get that way? 

In order for physics to be useful to other sciences in a theoretical way, other 
than in the invention of instruments, the science in question must supply to the 
physicist a description of the object in a physicist's language. They can say "why 
does a frogjump?," and the physicist cannot answer. If they tell him what a frog 
is, that there are so many molecules, there is a nerve here, etc., that is different. 
If they will tell us, more or less, what the earth or the stars are like, then we can 
figure it out. In order for physical theory to be of any use, we must know where 
the atoms are located. In order to understand the chemistry, we must know 
exactly what atoms are present, for otherwise we cannot analyze it. That is but 
one limitation, of course. 

There is another kind of problem in the sister sciences which does not exist 
in physics; we might call it, for lack of a better term, the historical question. 
How did it get that way? If we understand all about biology, we will want to 
know how all the things which are on the earth got there. There is the theory of 
evolution, an important part of biology. In geology, we not only want to know 
how the mountains are forming, but how the entire earth was formed in the be- 
ginning, the origin of the solar system, etc. That, of course, leads us to want to 
know what kind of matter there was in the world. How did the stars evolve? 
What were the initial conditions? That is the problem of astronomical history. 
A great deal has been found out about the formation of stars, the formation of 
elements from which we were made, and even a little about the origin of the 
universe. 

There is no historical question being studied in physics at the present time. 
We do not have a question, "Here are the laws of physics, how did they get that 
way?" We do not imagine, at the moment, that the laws of physics are somehow 
changing with time, that they were different in the past than they are at present. 
Of course they may be, and the moment we find they are , the historical question 
of physics will be wrapped up with the rest of the history of the universe, and then 
the physicist will be talking about the same problems as astronomers, geologists, 
and biologists. 

Finally, there is a physical problem that is common to many fields, that is 
very old, and that has not been solved. It is not the problem of finding new funda- 
mental particles, but something left over from a long time ago — over a hundred 
years. Nobody in physics has really been able to analyze it mathematically 
satisfactorily in spite of its importance to the sister sciences. It is the analysis of 
circulating or turbulent fluids. If we watch the evolution of a star, there comes a 
point where we can deduce that it is going to start convection, and thereafter we 
can no longer deduce what should happen. A few million years later the star 
explodes, but we cannot figure out the reason. We cannot analyze the weather. 
We do not know the patterns of motions that there should be inside the earth. 
The simplest form of the problem is to take a pipe that is very long and push water 
through it at high speed. We ask: to push a given amount of water through that 
pipe, how much pressure is needed? No one can analyze it from first principles 
and the properties of water. If the water flows very slowly, or if we use a thick 
goo like honey, then we can do it nicely. You will find that in your textbook. 
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What we really cannot do is deal with actual, wet water running through a pipe. 
That is the central problem which we ought to solve some day, and we have not. 

A poet once said, "The whole universe is in a glass of wine." We will probably 
never know in what sense he meant that, for poets do not write to be understood. 
But it is true that if we look at a glass of wine closely enough we see the entire 
universe. There are the things of physics: the twisting liquid which evaporates 
depending on the wind and weather, the reflections in the glass, and our imagi- 
nation adds the atoms. The glass is a distillation of the earth's rocks, and in its 
composition we see the secrets of the universe's age, and the evolution of stars. 
What strange array of chemicals are in the wine? How did they come to be? 
There are the ferments, the enzymes, the substrates, and the products. There in 
wine is found the great generalization: all life is fermentation. Nobody can 
discover the chemistry of wine without discovering, as did Louis Pasteur, the cause 
of much disease. How vivid is the claret, pressing its existence into the conscious- 
ness that watches it! If our small minds, for some convenience, divide this glass 
of wine, this universe, into parts — physics, biology, geology, astronomy, psy- 
chology, and so on — remember that nature does not know it! So let us put it all 
back together, not forgetting ultimately what it is for. Let it give us one more final 
pleasure: drink it and forget it all! 
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4 


Conservation of Energy 


4-1 What is energy? 

In this chapter, we begin our more detailed study of the different aspects of 
physics, having finished our description of things in general. To illustrate the ideas 
and the kind of reasoning that might be used in theoretical physics, we shall now 
examine one of the most basic laws of physics, the conservation of energy. 

There is a fact, or if you wish, a law , governing all natural phenomena that 
are known to date. There is no known exception to this law — it is exact so far as 
we know. The law is called the conservation of energy. It states that there is a 
certain quantity, which we call energy, that does not change in the manifold 
changes which nature undergoes. That is a most abstract idea, because it is a 
mathematical principle; it says that there is a numerical quantity which does not 
change when something happens. It is not a description of a mechanism, or any- 
thing concrete; it is just a strange fact that we can calculate some number and when 
we finish watching nature go through her tricks and calculate the number again, 
it is the same. (Something like the bishop on a red square, and after a number of 
moves — details unknown — it is still on some red square. It is a law of this nature.) 
Since it is an abstract idea, we shall illustrate the meaning of it by an analogy. 

Imagine a child, perhaps "Dennis the Menace," who has blocks which are 
absolutely indestructible, and cannot be divided into pieces. Each is the same as 
the other. Let us suppose that he has 28 blocks. His mother puts him with his 
28 blocks into a room at the beginning of the day. At the end of the day, being 
curious, she counts the blocks very carefully, and discovers a phenomenal law — 
no matter what he does with the blocks, there are always 28 remaining! This 
continues for a number of days, until one day there are only 27 blocks, but a little 
investigating shows that there is one under the rug — she must look everywhere 
to be sure that the number of blocks has not changed. One day, however, the 
number appears to change — there are only 26 blocks. Careful investigation in- 
dicates that the window was open, and upon looking outside, the other two blocks 
are found. Another day, careful count indicates that there are 30 blocks! This 
causes considerable consternation, until it is realized that Bruce came to visit, 
bringing his blocks with him, and he left a few at Dennis' house. After she has 
disposed of the extra blocks, she closes the window, does not let Bruce in, and then 
everything is going along all right, until one time she counts and finds only 25 
blocks. However, there is a box in the room, a toy box, and the mother goes to 
open the toy box, but the boy says "No, do not open my toy box," and screams. 
Mother is not allowed to open the toy box. Being extremely curious, and somewhat 
ingenious, she invents a scheme! She knows that a block weighs three ounces, 
so she weighs the box at a time when she sees 28 blocks, and it weighs 16 ounces. 
The next time she wishes to check, she weighs the box again, subtracts sixteen 
ounces and divides by three. She discovers the following: 


/ number of \ (weig ht of box) — 16 ounces 

\ blocks seen / 3 ounces 


= constant. 


(4.1) 


4-1 What is energy? 

4-2 Gravitational potential energy 

4-3 Kinetic energy 

4-4 Other forms of energy 


There then appear to be some new deviations, but careful study indicates that the 
dirty water in the bathtub is changing its level. The child is throwing blocks into 
the water, and she cannot see them because it is so dirty, but she can find out how 
many blocks are in the water by adding another term to her formula. Since the 
original height of the water was 6 inches and each block raises the water a quarter 
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of an inch, this new formula would be: 


/ number of \ (weight of box) - 16 ounces 

\blocks seen/ 3 ounces 

(height of water) — 6 inches 


+ 


1/4 inch 


= constant. 


(4.2) 


In the gradual increase in the complexity of her world, she finds a whole series of 
terms representing ways of calculating how many blocks are in places where she 
is not allowed to look. As a result, she finds a complex formula, a quantity which 
has to be computed, which always stays the same in her situation. 

What is the analogy of this to the conservation of energy? The most re- 
markable aspect that must be abstracted from this picture is that there are no blocks. 
Take away the first terms in (4.1) and (4.2) and we find ourselves calculating more 
or less abstract things. The analogy has the following points. First, when we are 
calculating the energy, sometimes some of it leaves the system and goes away, 
or sometimes some comes in. In order to verify the conservation of energy, we 
must be careful that we have not put any in or taken any out. Second, the energy 
has a large number of different forms, and there is a formula for each one. These 
are: gravitational energy, kinetic energy, heat energy, elastic energy, electrical 
energy, chemical energy, radiant energy, nuclear energy, mass energy. If we total 
up the formulas for each of these contributions, it will not change except for energy 
going in and out. 

It is important to realize that in physics today, we have no knowledge of what 
energy is. We do not have a picture that energy comes in little blobs of a definite 
amount. It is not that way. However, there are formulas for calculating some 
numerical quantity, and when we add it all together it gives u 28 >5 — always the 
same number. It is an abstract thing in that it does not tell us the mechanism or 
the reasons for the various formulas. 



Fig. 4-1 . Simple weight-lifting machine. 


4-2 Gravitational potential energy 

Conservation of energy can be understood only if we have the formula for 
all of its forms. I wish to discuss the formula for gravitational energy near the 
surface of the Earth, and I wish to derive this formula in a way which has nothing 
to do with history but is simply a line of reasoning invented for this particular 
lecture to give you an illustration of the remarkable fact that a great deal about 
nature can be extracted from a few facts and close reasoning. It is an illustration 
of the kind of work theoretical physicists become involved in. It is patterned 
after a most excellent argument by Mr. Carnot on the efficiency of steam engines.* 

Consider weight-lifting machines — machines which have the property that 
they lift one weight by lowering another. Let us also make a hypothesis: that 
there is no such thing as perpetual motion with these weight-lifting machines. 
(In fact, that there is no perpetual motion at all is a general statement of the law 
of conservation of energy.) We must be careful to define perpetual motion. 
First, let us do it for weight-lifting machines. If, when we have lifted and lowered 
a lot of weights and restored the machine to the original condition, we find that 
the net result is to have lifted a weight, then we have a perpetual motion machine 
because we can use that lifted weight to run something else. That is, provided the 
machine which lifted the weight is brought back to its exact original condition, 
and furthermore that it is completely self-contained — that it has not received the 
energy to lift that weight from some external source — like Bruce's blocks. 

A very simple weight-lifting machine is shown in Fig. 4-1. This machine lifts 
weights three units "strong." We place three units on one balance pan, and one 
unit on the other. However, in order to get it actually to work, we must lift a 
little weight off the left pan. On the other hand, we could lift a one-unit weight 


* Our point here is not so much the result, (4.3), which in fact you may already know, 
as the possibility of arriving at it by theoretical reasoning. 
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by lowering the three-unit weight, if we cheat a little by lifting a little weight off 
the other pan. Of course, we realize that with any actual lifting machine, we must 
add a little extra to get it to run. This we disregard, temporarily. Ideal machines, 
although they do not exist, do not require anything extra. A machine that we 
actually use can be, in a sense, almost reversible: that is, if it will lift the weight of 
three by lowering a weight of one, then it will also lift nearly the weight of one the 
same amount by lowering the weight of three. 

We imagine that there are two classes of machines, those that are not re- 
versible, which includes all real machines, and those that are reversible, which of 
course are actually not attainable no matter how careful we may be in our design 
of bearings, levers, etc. We suppose, however, that there is such a thing — a 
reversible machine — which lowers one unit of weight (a pound or any other unit) 
by one unit of distance, and at the same time lifts a three-unit weight. Call this 
reversible machine, Machine A. Suppose this particular reversible machine lifts 
the three-unit weight a distance X. Then suppose we have another machine, Ma- 
chine B ' which is not necessarily reversible, which also lowers a unit weight a 
unit distance, but which lifts three units a distance Y. We can now prove that Y 
is not higher than X; that is, it is impossible to build a machine that will lift a 
weight any higher than it will be lifted by a reversible machine. Let us see why. 
Let us suppose that Y were higher than X. We take a one-unit weight and lower 
it one unit height with Machine B, and that lifts the three-unit weight up a distance 
Y. Then we could lower the weight from Y to X, obtaining free power , and use 
the reversible Machine A , running backwards, to lower the three-unit weight a 
distance X and lift the one-unit weight by one unit height. This will put the 
one-unit weight back where it was before, and leave both machines ready to be 
used again! We would therefore have perpetual motion if Y were higher than X , 
which we assumed was impossible. With those assumptions, we thus deduce that 
Y is not higher than X, so that of all machines that can be designed, the reversible 
machine is the best. 

We can also see that all reversible machines must lift to exactly the same height. 
Suppose that B were really reversible also. The argument that Y is not higher than 
X is, of course, just as good as it was before, but we can also make our argument 
the other way around, using the machines in the opposite order, and prove that 
X is not higher than Y. This, then, is a very remarkable observation because it 
permits us to analyze the height to which different machines are going to lift 
something without looking at the interior mechanism. We know at once that if 
somebody makes an enormously elaborate series of levers that lift three units a 
certain distance by lowering one unit by one unit distance, and we compare it 
with a simple lever which does the same thing and is fundamentally reversible, 
his machine will lift it no higher, but perhaps less high. If his machine is re- 
versible, we also know exactly how high it will lift. To summarize: every reversible 
machine, no matter how it operates, which drops one pound one foot and lifts 
a three-pound weight always lifts it the same distance, X. This is clearly a universal 
law of great utility. The next question is, of course, what is XI 

Suppose we have a reversible machine which is going to lift this distance X , 
three for one. We set up three balls in a rack which does not move, as shown in 
Fig. 4-2. One ball is held on a stage at a distance one foot above the ground. The 
machine can lift three balls, lowering one by a distance 1. Now, we have arranged 
that the platform which holds three balls has a floor and two shelves, exactly spaced 
at distance X, and further, that the rack which holds the balls is spaced at distance 
X, (a). First we roll the balls horizontally from the rack to the shelves, (b), and 
we suppose that this takes no energy because we do not change the height. The 
reversible machine then operates: it lowers the single ball to the floor, and it lifts 
the rack a distance X, (c). Now we have ingeniously arranged the rack so that 
these balls are again even with the platforms. Thus we unload the balls onto the 
rack, (d); having unloaded the balls, we can restore the machine to its original 
condition. Now we have three balls on the upper three shelves and one at the 
bottom. But the strange thing is that, in a certain way of speaking, we have not 
lifted two of them at all because, after all, there were balls on shelves 2 and. 3 
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Fig. 4-3. Inclined plane. 


before. The resulting effect has been to lift one ball a distance 3X. Now, if 3X 
exceeds one foot, then we can lower the ball to return the machine to the initial 
condition, (f), and we can run the apparatus again. Therefore 3 X cannot exceed 
one foot, for if 3 X exceeds one foot we can make perpetual motion. Likewise, 
we can prove that one foot cannot exceed 3X, by making the whole machine run 
the opposite way, since it is a reversible machine. Therefore 3X is neither greater 
nor less than afoot , and we discover then, by argument alone, the law that 
X = § foot. The generalization is clear: one pound falls a certain distance in 
operating a reversible machine; then the machine can lift p pounds this distance 
divided by p. Another way of putting the result is that three pounds times the 
height lifted, which in our problem was X, is equal to one pound times the distance 
lowered, which is one foot in this case. If we take all the weights and multiply 
them by the heights at which they are now, above the floor, let the machine operate, 
and then multiply all the weights by all the heights again, there will be no change. 
(We have to generalize the example where we moved only one weight to the case 
where when we lower one we lift several different ones — but that is easy.) 

We call the sum of the weights times the heights gravitational potential 
energy — the energy which an object has because of its relationship in space, rela- 
tive to the earth. The formula for gravitational energy, then, so long as we are 
not too far from the earth (the force weakens as we go higher) is 

/ gravitational \ 

I potential energy J = (weight) X (height). (4.3) 

\for one object / 

It is a very beautiful line of reasoning. The only problem is that perhaps it is not 
true. (After all, nature does not have to go along with our reasoning.) ,For example, 
perhaps perpetual motion is, in fact, possible. Some of the assumptions may be 

wrong, or we may have made a mistake in reasoning, so it is always necessary to 

check. It turns out experimentally , in fact, to be true. 

The general name of energy which has to do with location relative to some- 
thing else is called potential energy. In this particular case, of course, we call it 
gravitational potential energy. If it is a question of electrical forces against which 
we are working, instead of gravitational forces, if we are "lifting" charges away 
from other charges with a lot of levers, then the energy content is called electrical 
potential energy. The general principle is that the change in the energy is the force 
times the distance that the force is pushed, and that this is a change in energy in 
general: 

( change in\ . w /distance force\ .. .. 

energj, J * < for “> X U» throueh ) ■ (4 4) 

We will return to many of these other kinds of energy as we continue the course. 

The principle of the conservation of energy is very useful for deducing what 
will happen in a number of circumstances. In high school we learned a lot of laws 
about pulleys and levers used in different ways. We can now see that these "laws" 
are all the same thing , and that we did not have to memorize 75 rules to figure it out. 
A simple example is a smooth inclined plane which is, happily, a three-four-five 
triangle (Fig. 4 - 3 ). We hang a one-pound weight on the inclined plane with a 
pulley, and on the other side of the pulley, a weight W. We want to know how 
heavy W must be to balance the one pound on the plane. How can we figure that 
out? If we say it is just balanced, it is reversible and so can move up and down, 
and we can consider the following situation. In the initial circumstance, (a), 
the one pound weight is at the bottom and weight W is at the top. When W has 
slipped down in a reversible way, we have a one-pound weight at the top and the 
weight W the slant distance, (b), or five feet, from the plane in which it was before. 
We lifted the one-pound weight only three feet and we lowered W pounds by 
five feet. Therefore W = f of a pound. Note that we deduced this from the 
conservation of energy, and not from force components. Cleverness, however, is 
relative. It can be deduced in a way which is even more brilliant, discovered by 
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Stevinus and inscribed on his tombstone. Figure 4-4 explains that it has to be 
f of a pound, because the chain does not go around. It is evident that the lower 
part of the chain is balanced by itself, so that the pull of the five weights on one 
side must balance the pull of three weights on the other, or whatever the ratio of 
the legs. You see, by looking at this diagram, that W must be § of a pound. 
(If you get an epitaph like that on your gravestone, you are doing fine.) 

Let us now illustrate the energy principle with a more complicated problem, 
the screw jack shown in Fig. 4-5. A handle 20 inches long is used to turn the screw, 
which has 10 threads to the inch. We would like to know how much force would 
be needed at the handle to lift one ton (2000 pounds). If we want to lift the ton 
one inch, say, then we must turn the handle around ten times. When it goes around 
once it goes approximately 126 inches. The handle must thus travel 1260 inches, 
and if we used various pulleys, etc., we would be lifting our one ton with an un- 
known smaller weight W applied to the end of the handle. So we find out that W 
is about 1.6 pounds. This is a result of the conservation of energy. 

Take now the somewhat more complicated example shown in Fig. 4-6. A rod 
or bar, 8 feet long, is supported at one end. In the middle of the bar is a weight 
of 60 pounds, and at a distance of two feet from the support there is a weight of 
100 pounds. How hard do we have to lift the end of the bar in order to keep 
it balanced, disregarding the weight of the bar? Suppose we put a pulley at one 
end and hang a weight on the pulley. How big would the weight W have to be 
in order for it to balance? We imagine that the weight falls any arbitrary dis- 
tance — to make it easy for ourselves suppose it goes down 4 inches — how high 
would-the two load weights rise? The center rises 2 inches, and the point a quarter 
of the way from the fixed end lifts 1 inch. Therefore, the principle that the sum of 
the heights times the weights does not change tells us that the weight W times 
4 inches down, plus 60 pounds times 2 inches up, plus 100 pounds times 1 inch 
has to add up to nothing: 




-AW + (2)(60) + (1)(100) = 0, W = 551b. (4.5) 


Fig. 4-5. A screw jack. 


Thus we must have a 55-pound weight to balance the bar. In this way we can work 
out the laws of "balance" — the statics of complicated bridge arrangements, and so 
on. This approach is called the principle of virtual work , because in order to apply 
this argument we had to imagine that the structure moves a little — even though 
it is not really moving or even movable. We use the very small imagined motion 
to apply the principle of conservation of energy. 
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4-3 Kinetic energy 

To illustrate another type of energy we consider a pendulum (Fig. 4-7). 
If we pull the mass aside and release it, it swings back and forth. In its motion, 
it loses height in going from either end to the center. Where does the potential 
energy go? Gravitational energy disappears when it is down at the bottom; 
nevertheless, it will climb up again. The gravitational energy must have gone into 
another form. Evidently it is by virtue of its motion that it is able to climb up again, 
so we have the conversion of gravitational energy into some other form when it 
reaches the bottom. 

We must get a formula for the energy of motion. Now, recalling our arguments 
about reversible machines, we can easily see that in the motion at the bottom 
must be a quantity of energy which permits it to rise a certain height, and which 
has nothing to do with the machinery by which it comes up or the path by which 
it comes up. So we have an equivalence formula something like the one we wrote 
for the child's blocks. We have another form to represent the energy. It is easy to 
say what it is. The kinetic energy at the bottom equals the weight times the height 
that it could go, corresponding to its velocity: K.E. = WH. What we need is 
the formula which tells us the height by some rule that has to do with the motion 
of objects. If we start something out with a certain velocity, say straight up, it 
will reach a certain height; we do not know what it is yet, but it depends on the 
velocity — there is a formula for that. Then to find the formula for kinetic energy 
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Fig. 4-6. Weighted rod supported 
on one end. 



Fig. 4-7. Pendulum. 


for an object moving with velocity V, we must calculate the height that it could 
reach, and multiply by the weight. We shall soon find that we can write it this way: 

K.E. = WV 2 /2g. (4.6) 

Of course, the fact that motion has energy has nothing to do with the fact that 
we are in a gravitational field. It makes no difference where the motion came from. 
This is a general formula for various velocities. Both (4.3) and (4.6) are approxi- 
mate formulas, the first because it is incorrect when the heights are great, i.e., 
when the heights are so high that gravity is weakening; the second, because of the 
relativistic correction at high speeds. However, when we do finally get the exact 
formula for the energy, then the law of conservation of energy is correct. 

4-4 Other forms of energy 

We can continue in this way to illustrate the existence of energy in other forms. 
First, consider elastic energy. If we pull down on a spring, we must do some work, 
for when we have it down, we can lift weights with it. Therefore in its stretched 
condition it has a possibility of doing some work. If we were to evaluate the sums 
of weights times heights, it would not check out — we must add something else 
to account for the fact that the spring is under tension. Elastic energy is the 
formula for a spring when it is stretched. How much energy is it? If we let go, 
the elastic energy, as the spring passes through the equilibrium point, is converted 
to kinetic energy and it goes back and forth between compressing or stretching 
the spring and kinetic energy of motion. (There is also some gravitational energy 
going in and out, but we can do this experiment "sideways" if we like.) It keeps 
going until the losses — Aha! We have cheated all the way through by putting 
on little weights to move things or saying that the machines are reversible, or that 
they go on forever, but we can see that things do stop, eventually. Where is the 
energy when the spring has finished moving up and down? This brings in another 
form of energy: heat energy. 

Inside a spring or a lever there are crystals which are made up of lots of atoms, 
and with great care and delicacy in the arrangement of the parts one can try to 
adjust things so that as something rolls on something else, none of the atoms do 
any jiggling at all. But one must be very careful. Ordinarily when things roll, 
there is bumping and jiggling because of the irregularities of the material, and the 
atoms start to wiggle inside. So we lose track of that energy; we find the atoms are 
wiggling inside in a random and confused manner after the motion slows down. 
There is still kinetic energy, all right, but it is not associated with visible motion. 
What a dream! How do we know there is still kinetic energy? It turns out that 
with thermometers you can find out that, in fact, the spring or the lever is warmer, 
and that there is really an increase of kinetic energy by a definite amount. We call 
this form of energy heat energy, but we know that it is not really a new form, it 
is just kinetic energy — internal motion. (One of the difficulties with all these 
experiments with matter that we do on a large scale is that we cannot really 
demonstrate the conservation of energy and we cannot really make our reversible 
machines, because every time we move a large clump of stuff, the atoms do not 
remain absolutely undisturbed, and so a certain amount of random motion goes 
into the atomic system. We cannot see it, but we can measure it with thermom- 
eters, etc.) 

There are many other forms of energy, and of course we cannot describe them 
in any more detail just now. There is electrical energy, which has to do with push- 
ing and pulling by electric charges. There is radiant energy, the energy of light, 
which we know is a form of electrical energy because light can be represented as 
wigglings in the electromagnetic field. There is chemical energy, the energy which 
is released in chemical reactions. Actually, elastic energy is, to a certain extent, 
like chemical energy, because chemical energy is the energy of the attraction of 
the atoms, one for the other, and so is elastic energy. Our modern understanding 
is the following: chemical energy has two parts, kinetic energy of the electrons 
inside the atoms, so part of it is kinetic, and electrical energy of interaction of the 
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electrons and the protons — the rest of it, therefore, is electrical. Next we come to 
nuclear energy, the energy which is involved with the arrangement of particles 
inside the nucleus, and we have formulas for that, but we do not have the funda- 
mental laws. We know that it is not electrical, not gravitational, and not purely 
chemical, but we do not know what it is. It seems to be an additional form of 
energy. Finally, associated with the relativity theory, there is a modification of 
the laws of kinetic energy, or whatever you wish to call it, so that kinetic energy 
is combined with another thing called mass energy. An object has energy from its 
sheer existence. If I have a positron and an electron, standing still doing nothing 
— never mind gravity, never mind anything — and they come together and dis- 
appear, radiant energy will be liberated, in a definite amount, and the amount 
can be calculated. All we need know is the mass of the object. It does not depend 
on what it is — we make two things disappear, and we get a certain amount of 
energy. The formula was first found by Einstein; it is E = me 2 . 

It is obvious from our discussion that the law of conservation of energy is 
enormously useful in making analyses, as we have illustrated in a few examples 
without knowing all the formulas. If we had all the formulas for all kinds of 
energy, we could analyze how many processes should work without having to go 
into the details. Therefore conservation laws are very interesting. The question 
naturally arises as to what other conservation laws there are in physics. There 
are two other conservation laws which are analogous to the conservation of 
energy. One is called the conservation of linear momentum. The other is called 
the conservation of angular momentum. We will find out more about these later. 
In the last analysis, we do not-understand the conservation laws deeply. We do 
not understand the conservation of energy. We do not understand energy as a 
certain number of little blobs. You may have heard that photons come out in 
blobs and that the energy of a photon is Planck's constant times the frequency. 
That is true, but since the frequency of light can be anything, there is no law that 
says that energy has to be a certain definite amount. Unlike Dennis' blocks, there 
can be any amount of energy, at least as presently understood. So we do not under- 
stand this energy as counting something at the moment, but just as a mathematical 
quantity, which is an abstract and rather peculiar circumstance. In quantum 
mechanics it turns out that the conservation of energy is very closely related to 
another important property of the world, things do not depend on the absolute 
time. We can set up an experiment at a given moment and try it out, and then do 
the same experiment at a later moment, and it will behave in exactly the same 
way. Whether this is strictly true or not, we do not know. If we assume that it 
is true, and add the principles of quantum mechanics, then we can deduce Jhe 
principle of the conservation of energy. It is a rather subtle and interesting thing, 
and it is not easy to explain. The other conservation laws are also linked together. 
The conservation of momentum is associated in quantum mechanics with the 
proposition that it makes no difference where you do the experiment, the results 
will always be the same. As independence in space has to do with the conserva- 
tion of momentum, independence of time has to do with the conservation of 
energy, and finally, if we turn our apparatus, this too makes no difference, and so 
the invariance of the world to angular orientation is related to the conservation 
of angular momentum. Besides these, there are three other conservation laws, 
that are exact so far as we can tell today, which are much simpler to understand 
because they are in the nature of counting blocks. 

The first of the three is the conservation of charge , and that merely means 
that you count how many positive, minus how many negative electrical charges 
you have, and the number is never changed. You may get rid of a positive with 
a negative, but you do not create any net excess of positives over negatives. Two 
other laws are analogous to this one — one is called the conservation of baryons. 
There are a number of strange particles, a neutron and a proton are examples, 
which are called baryons. In any reaction whatever in nature, if we count how 
many baryons are coming into a process, the number of baryons* which come out 


* Counting antibaryons as — 1 baryon. 


4-7 



will be exactly the same. There is another law, the conservation of leptons. We 
can say that the group of particles called leptons are: electron, mu meson, and 
neutrino. There is an antielectron which is a positron, that is, a — 1 lepton. 
Counting the total number of leptons in a reaction reveals that the number in and 
out never changes, at least so far as we know at present. 

These are the six conservation laws, three of them subtle, involving space and 
time, and three of them simple, in the sense of counting something. 

With regard to the conservation of energy, we should note that available 
energy is another matter — there is a lot of jiggling around in the atoms of the 
water of the sea, because the sea has a certain temperature, but it is impossible 
to get them herded into a definite motion without taking energy from somewhere 
else. That is, although we know for a fact that energy is conserved, the energy 
available for human utility is not conserved so easily. The laws which govern 
how much energy is available are called the laws of thermodynamics and involve 
a concept called entropy for irreversible thermodynamic processes. 

Finally, we remark on the question of where we can get our supplies of energy 
today. Our supplies of energy are from the sun, rain, coal, uranium, and hydrogen. 
The sun makes the rain, and the coal also, so that all these are from the sun. 
Although energy is conserved, nature does not seem to be interested in it; she 
liberates a lot of energy from the sun, but only one part in two billion falls on the 
earth. Nature has conservation of energy, but does not really care; she spends 
a lot of it in all directions. We have already obtained energy from uranium; 
we can also get energy from hydrogen, but at present only in an explosive and 
dangerous condition. If it can be controlled in thermonuclear reactions, it turns 
out that the energy that can be obtained from 10 quarts of water per second is equal 
to all of the electrical power generated in the United States. With 150 gallons of 
running water a minute, you have enough fuel to supply all the energy which is 
used in the United States today! Therefore it is up to the physicist to figure out 
how to liberate us from the need for having energy. It can be done. 
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5 


Time and Distance 


5-1 Motion 

In this chapter we shall consider some aspects of the concepts of time and 
distance. It has been emphasized earlier that physics, as do all the sciences, de- 
pends on observation. One might also say that the development of the physical 
sciences to their present form has depended to a large extent on the emphasis 
which has been placed on the making of quantitative observations. Only with 
quantitative observations can one arrive at quantitative relationships, which are 
the heart of physics. 

Many people would like to place the beginnings of physics with the work 
done 350 years ago by Galileo, and to call him the first physicist. Until that time, 
the study of motion had been a philosophical one based on arguments that could 
be thought up in one's head. Most of the arguments had been presented by 
Aristotle and other Greek philosophers, and were taken as "proven." Galileo 
was skeptical, and did an experiment on motion which was essentially this: He 
allowed a ball to roll down an inclined trough and observed the motion. He did 
not, however, just look; he measured how far the ball went in how long a time. 

The way to measure a distance was well known long before Galileo, but there 
were no accurate ways of measuring time, particularly short times. Although he 
later devised more satisfactory clocks (though not like the ones we know), Galileo's 
first experiments on motion were done by using his pulse to count off equal in- 
tervals of time. Let us do the same. 

We may count offbeats of a pulse as the ball rolls down the track: "one .. . 
two ... three .. . four .. . five ... six ... seven . . . eight..." We ask a friend to 
make a small mark at the location of the ball at each count; we can then measure 
the distance the ball travelled from the point of release in one, or two, or three, 
etc., equal intervals of time. Galileo expressed the result of his observations in 
this way: if the location of the ball is marked at 1, 2, 3, 4,... units of time from 
the instant oflts release, those marks are distant from the starting point in propor- 
tion to the numbers 1, 4, 9, 16, ... Today we would say the distance is propor- 
tional to the square of the time: 


D oc t 2 . 

The study of motion, which is basic to all of physics, treats with the questions: 
where? and when? 


5-2 Time 

Let us consider first what we mean by time. What is time? It would be nice 
if we could find a good definition of time. Webster defines "a time" as "a period," 
and the latter as "a time," which doesn't seem to be very useful. Perhaps we should 
say: "Time is what happens when nothing else happens." Which also doesn't 
get us very far. Maybe it is just as well if we face the fact that time is one of the 
things we probably cannot define (in the dictionary sense), and just say that it 
is what we already know it to be: it is how long we wait! 

What really matters anyway is not how we define time, but how we measure 
it. One way of measuring time is to utilize something which happens over and 
over again in a regular fashion — something which is periodic. For example, a 
day. A day seems to happen over and over again. But when you begin to think 
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Fig. 5-1. A ball rolls down an 
dined track. 



about it, you might well ask: "Are days periodic; are they regular? Are all days 
the same length?" One certainly has the impression that days in summer are longer 
than days in winter. Of course, some of the days in winter seem to get awfully 
long if one is very bored. You have certainly heard someone say, "My, but this 
has been a long day!" 

It does seem, however, that days are about the same length on the average. 
Is there any way we can test whether the days are the same length — either from 
one day to the next, or at least on the average? One way is to make a comparison 
with some other periodic phenomenon. Let us see how such a comparison might 
be made with an hour glass. With an hour glass, we can "create" a periodic 
occurrence if we have someone standing by it day and night to turn it over when- 
ever the last grain of sand runs out. 

We could then count the turnings of the glass from each morning to the next. 
We would find, this time, that the number of "hours" (i.e., turnings of the glass) 
was not the same each "day." We should distrust the sun, or the glass, or both. 
After some thought, it might occur to us to count the "hours" from noon to noon. 
(Noon is here defined not as 12:00 o'clock, but that instant when the sun is at its 
highest point.) We would find, this time, that the number of "hours" each day 
is the same. 

We now have some confidence that both the "hour" and the "day" have a 
regular periodicity, i.e., mark off successive equal intervals of time, although we 
have not proved that either one is "really" periodic. Someone might question 
whether there might not be some omnipotent being who would slow down the 
flow of sand every night and speed it up during the day. Our experiment does not, 
of course, give us an answer to this sort of question. All we can say is that we find 
that a regularity of one kind fits together with a regularity of another kind. We 
can just say that we base our definition of time on the repetition of some apparently 
periodic event. 

5-3 Short times 

We should now notice that in the process of checking on the reproducibility 
of the day, we have received an important by-product. We have found a way of 
measuring, more accurately, fractions of a day. We have found a way of counting 
time in smaller pieces. Can we carry the process further, and leam to measure 
even smaller intervals of time? 

Galileo decided that a given pendulum always swings back and forth in equal 
intervals of time so long as the size of the swing is kept small. A test comparing 
the number of swings of a pendulum in one "hour" shows that such is indeed the 
case. We can in this way mark fractions of an hour. If we use a mechanical device 
to count the swings — and to keep them going — we have the pendulum clock of 
our grandfathers. 

Let us agree that if our pendulum oscillates 3600 times in one hour (and if 
there are 24 such hours in a day), we shall call each period of the pendulum one 
"second." We have then divided our original unit of time into approximately 
105 parts. We can apply the same principles to divide the second into smaller and 
smaller intervals. It is, you will realize, not practical to make mechanical pen- 
dulums which go arbitrarily fast, but we can now make electrical pendulums, 
called oscillators, which can provide a periodic occurrence with a very short 
period of swing. In these electronic oscillators it is an electrical current which 
swings to and fro, in a manner analogous to the swinging of the bob of the pendulum. 

We can make a series of such electronic oscillators, each with a period 10 
times shorter than the previous one. We may "calibrate" each oscillator against 
the next slower one by counting the number of swings it makes for one swing of 
the slower oscillator. When the period of oscillation of our clock is shorter than 
a fraction of a second, we cannot count the oscillations without the help of some 
device which extends our powers of observation. One such device is the electron- 
beam oscilloscope, which acts as a sort of microscope for short times. This device 
plots on a fluorescent screen a graph of electrical current (or voltage) versus time. 
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By connecting the oscilloscope to two of our oscillators in sequence, so that it 
plots a graph first of the current in one of our oscillators and then of the current 
in the other, we get two graphs like those shown in Fig. 5-2. We can readily 
determine the number of periods of the faster oscillator in one period of the 
slower oscillator. 

With modern electronic techniques, oscillators have been built with periods 
as short as about 10“ 12 second, and they have been calibrated (by comparison 
methods such as we have described) in terms of our standard unit of time, the 
second. With the invention and perfection of the "laser," or light amplifier, in 
the past few years, it has become possible to make oscillators with even shorter 
periods than 10“ 12 second, but it has not yet been possible to calibrate them by 
the methods which have been described, although it will no doubt soon be possible. 

Times shorter than 10 _1 2 second have been measured, but by a different tech- 
nique. In effect, a different definition of "time" has been used. One way has been 
to observe the distance between two happenings on a moving object. If, for 
example, the headlights of a moving automobile are turned on and then off, 
we can figure out how long the lights were on if we know where they were turned 
on and off and how fast the car was moving. The time is the distance over which 
the lights were on divided by the speed. 

Within the past few years, just such a technique was used to measure the 
lifetime of the 7T°-meson. By observing in a microscope the minute tracks left in 
a photographic emulsion in which 7t°-mesons had been created one saw that a 
7i°-meson (known to be travelling at a certain speed nearly that of light) went a 
distance of about 10" 7 meter, on the average, before disintegrating. It lived for 
only about 10" 16 sec. It should be emphasized that we have here used a some- 
what different definition of "time" than before. So long as there are no inconsist- 
encies in our understanding, however, we feel fairly confident that our definitions 
are sufficiently equivalent. 

By extending our techniques — and if necessary our definitions — still further 
we can infer the time duration of still faster physical events. We can speak of the 
period of a nuclear vibration. We can speak of the lifetime of the newly discovered 
strange resonances (particles) mentioned in Chapter 2. Their complete life occupies 
a time span of only 10" 24 second, approximately the time it would take light 
(which moves at the fastest known speed) to cross the nucleus of hydrogen (the 
smallest known object). 

What about still smaller times? Does "time" exist on a still smaller scale? 
Does it make any sense to speak of smaller times if we cannot measure — or 
perhaps even think sensibly about — something which happens in a shorter time? 
Perhaps not. These are some of the open questions which you will be asking and 
perhaps answering in the next twenty or thirty years. 


54 Long times 

Let us now consider times longer than one day. Measurement of longer times 
is easy; we just count the days — so long as there is someone around to do the- 
counting. First we find that there is another natural periodicity: the year, about 
365 days. We have also discovered that nature has sometimes provided a counter 
for the years, in the form of tree rings or river-bottom sediments. In some cases 
we can use these natural time markers to determine the time which has passed 
since some early event. 

When we cannot count the years for the measurement of long times, we must 
look for other ways to measure. One of the most successful is the use of radio- 
active material as a "clock." In this case we do not have a periodic occurrence, 
as for the day or the pendulum, but a new kind of "regularity." We find that the 
radioactivity of a particular sample of material decreases by the same fraction 
for successive equal increases in its age. If we plot a graph of the radioactivity 
observed as a function of time (say in days), we obtain a curve like that shown in 
Fig. 5-3. We observe that if the radioactivity decreases to one-half in T days 
(called the "half-life"), then it decreases to one-quarter in another T days, and so 

5-3 



(b) 


Fig. 5-2. Two views of an oscilloscope 
screen. In (a) the oscilloscope is connected 
to one oscillator, in (b) it is connected to an 
oscillator with a period one-tenth as long. 
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Fig. 5-3. The decrease with time of 
radioactivity. The activity decreases by 
one-half in each "half-life," T. 
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on. In an arbitrary time interval t there are t/T "half-lives," and the fraction left 
after this time t is (b) tjT - 

If we knew that a piece of material, say a piece of wood, had contained an 
amount A of radioactive material when it was formed, and we found out by a direct 
measurement that it now contains the amount B, we could compute the age of 
the object, t, by solving the equation 

{\f T = B/A. 

There are, fortunately, cases in which we can know the amount of radioactivity 
that was in an object when it was formed. We know, for example, that the carbon 
dioxide in the air contains a certain small fraction of the radioactive carbon 
isotope C 14 (replenished continuously by the action of cosmic rays). If we measure 
the total carbon content of an object, we know that a certain fraction of that amount 
was originally the radioactive C 14 ; we know, therefore, the starting amount A 
to use in the formula above. Carbon- 14 has a half-life of 5000 years. By careful 
measurements we can measure the amount left after 20 half-lives or so and can 
therefore "date" organic objects which grew as long as 100,000 years ago. 

We would like to know, and we think we do know, the life of still older things. 
Much of our knowledge is based on the measurements of other radioactive iso- 
topes which have different half-lives. If we make measurements with an isotope 
with a longer half-life, then we are able to measure longer times. Uranium, for 
example, has an isotope whose half-life is about 10 9 years, so that if some material 
was formed with uranium in it 10 9 years ago, only half the uranium would remain 
today. When the uranium disintegrates, it changes into lead. Consider a piece 
of rock which was formed a long time ago in some chemical process. Lead, being 
of a chemical nature different from uranium, would appear in one part of the rock 
and uranium would appear in another part of the rock. The uranium and lead 
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would be separate. If we look at that piece of rock today, where there should only 
be uranium we will how find a certain fraction of uranium and a certain fraction 
of lead. By comparing these fractions, we can tell what percent of the uranium 
disappeared and changed into lead. By this method, the age of certain rocks has 
been determined to be several billion years. An extension of this method, not 
using particular rocks but looking at the uranium and lead in the oceans and using 
averages over the earth, has been used to determine (within the past few years) 
that the age of the earth itself is approximately 5.5 billion years. 

It is encouraging that the age of the earth is found to be the same as the age 
of the meteorites which land on the earth, as determined by the uranium method. 
It appears that the earth was formed out of rocks floating in space, and that the 
meteorites are, quite likely, some of that material left over. At some time more than 
five billion years ago, the universe started. It is now believed that at least our part 
of the universe had its beginning about ten or twelve billion years ago. We do 
not know what happened before then. In fact, we may well ask again: Does the 
question make any sense? Does an earlier time have any meaning? 

5-5 Units and standards of time 

We have implied that it is convenient if we start with some standard unit of 
time, say a day or a second, and refer all other times to some multiple or fraction 
of this unit. What shall we take as our basic standard of time? Shall we take the 
human pulse? If we compare pulses, we find that they seem to vary a lot. On 
comparing two clocks, one finds they do not vary so much. You might then say, 
well, let us take a clock. But whose clock? There is a story of a Swiss boy who 
wanted all of the clocks in his town to ring noon at the same time. So he went 
around trying to convince everyone of the value of this. Everyone thought it was 
a marvelous idea so long as all of the other clocks rang noon when his did! It is 
rather difficult to decide whose clock we should take as a standard. Fortunately, 
we all share one clock — the earth. For a long time the rotational period of the 
earth has been taken as the basic standard of time. As measurements have been 
made more and more precise, however, it has been found that the rotation of the 
earth is not exactly periodic, when measured in terms of the best clocks. These 
"best" clocks are those which we have reason to believe are accurate because they 
agree with each other. We now believe that, for various reasons, some days are 
longer than others, some days are shorter, and on the average the period of the 
earth becomes a little longer as the centuries pass. 

Until very recently we had found nothing much better than the earth's 
period, so all clocks have been related to the length of the day, and the second 
has been defined as 1/86400 of an average day. Recently we have been gaining 
experience with some natural oscillators which we now believe would provide a 
more constant time reference than the earth, and which are also based on a natural 
phenomenon available to everyone. These are the so-called "atomic clocks." 
Their basic internal period is that of an atomic vibration which is very insensitive 
to the temperature or any other external effects. These clocks keep time to an 
accuracy of one part in Kb or better. Within the past two years an improved 
atomic clock which operates on the vibration of the hydrogen atom has been de- 
signed and built by Professor Norman Ramsey at Harvard University. He believes 
that this clock might be 100 times more accurate still. Measurements now in 
progress will show whether this is true or not. 

We may expect that since it has been possible to build clocks much more 
accurate than astronomical time, there will soon be an agreement among scientists 
to define the unit of time in terms of one of the atomic clock standards. 

5-6 Large distances 

Let us now turn to the question of distance . How far, or how big, are things? 
Everybody knows that the way you measure distance is to start with a stick and 
count. Or start with a thumb and count. You begin with a unit and count. How 
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Fig. 5-4. The height of a Sputnik is 
determined by triangulation 
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Fig. 5-5. The distance of nearby 
stars can be measured by triangulation, 
using the diameter of the earth's orbit as 
a baseline. 


does one measure smaller things? How does one subdivide distance? In the same 
way that we subdivided time: we take a smaller unit and count the number of 
such units it takes to make up the longer unit. So we can measure smaller and 
smaller lengths. 

But we do not always mean by distance what one gets by counting off with a 
meter stick. It would be difficult to measure the horizontal distance between two 
mountain tops using only a meter stick. We have found by experience that dis- 
tance can be measured in another fashion: by triangulation. Although this means 
that we are really using a different definition of distance, when they can both be 
used they agree with each other. Space is more or less what Euclid thought it 
was, so the two types of definitions of distance agree. Since they do agree on the 
earth it gives us some confidence in using triangulation for still larger distances. 
For example, we were able to use triangulation to measure the height of the first 
Sputnik. We found that it was roughly 5 X 10 5 meters high. By more careful 
measurements the distance to the moon can be measured in the same way. Two 
telescopes at different places on the earth can give us the two angles we need. 
It has been found in this way that the moon is 4 X 10 8 meters away. 

We cannot do the same with the sun, or at least no one has been able to yet. 
The accuracy with which one can focus on a given point on the sun and with which 
one can measure angles is not good enough to permit us to measure the distance 
to the sun. Then how can we measure the distance to the sun? We must invent 
an extension of the idea of triangulation. We measure the relative distances of 
all the planets by astronomical observations of where the planets appear to be, 
and we get a picture of the solar system with the proper relative distances of every- 
thing, but with no absolute distance. One absolute measurement is then required, 
which has been obtained in a number of ways. One of the ways, which was believed 
until recently to be the most accurate, was to measure the distance from the earth 
to Eros, one of the small planetoids which passes near the earth every now and then. 
By triangulation on this little object, one could get the one required scale measure- 
ment. Knowing the relative distances of the rest, we can then tell the distance, for 
example, from the earth to the sun, or from the earth to Pluto. 

Within the past year there has been a big improvement in our knowledge of 
the scale of the solar system. At the Jet Propulsion Laboratory the distance from 
the earth to Venus was measured quite accurately by a direct radar observation. 
This, of course, is a still different type of inferred distance. We say we know 
the speed at which light travels (and therefore, at which radar waves travel), 
and we assume that it is the same speed everywhere between the earth and Venus. 
We send the radio wave out, and count the time until the reflected wave comes 
back. From the time we infer a distance , assuming we know the speed. We have 
really another definition of a measurement of distance. 

How do we measure the distance to a star, which is much farther away? 
Fortunately, we can go back to our triangulation method, because the earth moving 
around the sun gives us a large baseline for measurements of objects outside the 
solar system. If we focus a telescope on a star in summer and in winter, we might 
hope to determine these two angles accurately enough to be able to measure the 
distance to a star. 

What if the stars are too far away for us to use triangulation? Astronomers 
are always inventing new ways of measuring distance. They find, for example, 
that they can estimate the size and brightness of a star by its color. The color 
and brightness of many nearby stars — whose distances are known by triangula- 
tion — have been measured, and it is found that there is a smooth relationship 
between the color and the intrinsic brightness of stars (in most cases). If one now 
measures the color of a distant star, one may use the color-brightness relationship 
to determine the intrinsic brightness of the star. By measuring how bright the star 
appears to us at the earth (or perhaps we should say how dim it appears), we can 
compute how far away it is. (For a given intrinsic brightness, the apparent bright- 
ness decreases with the square of the distance.) A nice confirmation of the correct- 
ness of this method of measuring stellar distances is given by the results obtained 
for groups of stars known as globular clusters. A photograph of such a group is 
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Fig. 5-6. A cluster of stars near the center of our galaxy. Their distance from the 
earth is 30,000 light-years, or about 3 X 1 0 20 meters. 

shown in Fig. 5-6. Just from looking at the photograph one is convinced that these 
stars are all together. The same result is obtained from distance measurements 
by the color-brightness method. 

A study of many globular clusters gives another important bit of informa- 
tion. It is found that there is a high concentration of such clusters in a certain 
part of the sky and that most of them are about the same distance from us. Cou- 
pling this information with other evidence, we conclude that this concentration of 
clusters marks the center of our galaxy. We then know the distance to the center 
of the galaxy — about 10 20 meters. 

Knowing the size of our own galaxy, we have a key to the measurement of 
still larger distances — the distances to other galaxies. Figure 5-7 is a photograph 
of a galaxy, which has much the same shape as our own. Probably it is the same 
size, too. (Other evidence supports the idea that galaxies are all about the same 
size.) If it is the same size as ours, we can tell its distance. We measure the angle 
it subtends in the sky; we know its diameter, and we compute its distance — 
tri angulation again! 



Fig. 5-7. A spiral galaxy like our own. Presuming that its diameter is similar to 
that of our own galaxy, we may compute its distance from its apparent size. It is 30 
million light-years (3 X 1 0 23 meters) from the earth. 
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Fig. 5-8. The most distant object, 3C295 in BOOTES (indicated by the arrow), 
measured by the 200-inch telescope to date (1960). 

Photographs of exceedingly distant galaxies have recently been obtained with 
the giant Palomar telescope. One is shown in Fig. 5-8. It is now believed that 
some of these galaxies are about halfway to the limit of the universe — IO 26 meters 
away — the largest distance we can contemplate! 

5-7 Short distances 

Now let's think about smaller distances. Subdividing the meter is easy. With- 
out much difficulty we can mark off one thousand equal spaces which add up to 
one meter. With somewhat more difficulty, but in a similar way (using a good 
microscope), we can mark off a thousand equal subdivisions of the millimeter to 
make a scale of microns (millionths of a meter). It is difficult to continue to smaller 
scales, because we cannot "see" objects smaller than the wavelength of visible 
light (about 5 X 10~ 7 meter). 

We need not stop, however, at what we can see. With an electron microscope, 
we can continue the process by making photographs on a still smaller scale, say 
down to 10~ 8 meter (Fig. 5-9). By indirect measurements — by a kind oftriangula- 
tion on a microscopic scale — we can continue to measure to smaller and smaller 
scales. First, from an observation of the way light of short wavelength (x-radiation) 
is reflected from a pattern of marks of known separation, we determine the wave- 
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length of the light vibrations. Then, from the pattern of the scattering of the same 
light from a crystal, we can determine the relative location of the atoms in the 
crystal, obtaining results which agree with the atomic spacings also determined 
by chemical means. We find in this way that atoms have a diameter of about 
lO-io meter. 

There is a large "gap" in physical sizes between the typical atomic dimension 
of about 10“ 10 meter and the nuclear dimensions 10“ 15 meter, 10~ 5 times smaller. 
For nuclear sizes, a different way of measuring size becomes convenient. We meas- 
ure the apparent area, a, called the effective cross section. If we wish the radius, 
we can obtain it from a == 7 rr 2 , since nuclei are nearly spherical. 

Measurement of a nuclear cross section can be made by passing a beam of 
high-energy particles through a thin slab of material and observing the number 
of particles which do not get through. These high-energy particles will plow right 
through the thin cloud of electrons and will be stopped or deflected only if they 
hit the concentrated weight of a nucleus. Suppose we have a piece of material 
1 centimeter thick. There will be about 10 8 atomic layers. But the nuclei are so 
small that there is little chance that any nucleus will lie behind another. We might 
imagine that a highly magnified view of the situation — looking along the particle 
beam — would look like Fig. 5-10. 

The chance that a very small particle will hit a nucleus on the trip through 
is just the total area covered by the profiles of the nuclei divided by the total 
area in the picture. Suppose that we know that in an area A of our slab of material 
there are Af atoms (each with one nucleus, of course). Then the total area "covered" 
by the nuclei is Na/A . Now let the number of particles of our beam which arrive 
at the slab be n x and the number which come out the other side be /z 2 * The frac- 
tion which do not get through is (ni -n 2 )/n u which should just equal the 
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Fig. 5-10. Imagined view through a 
block of carbon 1 cm thick if only the 
nuclei were observed. 



fraction of the area covered. We can obtain the radius of the nucleus from the 
equation* 

_„2 _ A n x - n 2 

TTT = <T = -rz. 

N n x 

From such an experiment we find that the radii of the nuclei are from about 1 to 6 
times 10- 15 meter. The length unit 10 15 meter is called fhefermi, in honor of 
Enrico Fermi (1901-1958). 

What do we find if we go to smaller distances? Can we measure smaller 
distances ? Such questions are not yet answerable. It has been suggested that the 
still unsolved mystery of nuclear forces may be unravelled only by some modifica- 
tion of our idea of space, or measurement, at such small distances. 

It might be thought that it would be a good idea to use some natural length 
as our unit of length — say the radius of the earth or some fraction of it. The 
meter was originally intended to be such a unit and was defined to be (71/2) X 10 -7 
times the earth's radius. It is neither convenient nor very accurate to determine 
the unit of length in this way. For a long time it has been agreed internationally 
that the meter would be defined as the distance between two scratches on a bar 
kept in a special laboratory in France. More recently, it has been realized that 
this definition is neither as precise as would be useful, nor as permanent or universal 
as one would like. It is currently being considered that a new definition be adopted, 
an agreed-upon (arbitrary) number of wavelengths of a chosen spectral line. 


Measurements of distance and of time give results which depend on the ob- 
server. Two observers moving with respect to each other will not measure the same 
distances and times when measuring what appear to be the same things. Distances 
and time intervals have different magnitudes, depending on the coordinate system 
(or "frame of reference") used for making the measurements. We shall study this 
subject in more detail in a later chapter. 

Perfectly precise measurements of distances or times are not permitted by the 
laws of nature. We have mentioned earlier that the errors in a measurement of 
the position of an object must be at least as large as 

Ax = h/Ap , 

where h is a small quantity called "Planck's constant" and Ap is the error in 
our knowledge of the momentum (mass times velocity) of the object whose posi- 
tion we are measuring. It was also mentioned that the uncertainty in position 
measurements is related to the wave nature of particles. 

The relativity of space and time implies that time measurements have also a 
minimum error, given in fact by 


At = h/AE, 

where A E is the error in our knowledge of the energy of the process whose time 
period we are measuring. If we wish to know more precisely when something 
happened we must know less about what happened, because our knowledge of 
the energy involved will be less. The time uncertainty is also related to the wave 
nature of matter. 

* This equation is right only if the area covered by the nuclei is a small fraction of the 
total, i.e., if («i — « 2 )/«i is much less than 1. Otherwise we must make a correction 
for the fact that some nuclei will be partly obscured by the nuclei in front of them. 
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Probability 


“The true logic of this world is in the calculus of probabilities.” 

— James Clerk Maxwell 


6-1 Chance and likelihood 

“Chance” is a word which is in common use in everyday living. The radio 
reports speaking of tomorrow’s weather may say: “There is a sixty percent chance 
of rain.” You might say: “There is a small chance that I shall live to be one 
hundred years old.” Scientists also use the word chance. A seismologist may be 
interested in the question: “What is the chance that there will be an earthquake 
of a certain size in Southern California next year?” A physicist might ask the ques- 
tion: “What is the chance that a particular geiger counter will register twenty 
counts in the next ten seconds?” A politician or statesman might be interested 
in the question: “What is the chance that there will be a nuclear war within the 
next ten years?” You may be interested in the chance that you will learn some- 
thing from this chapter. 

By chance , we mean something like a guess. Why do we make guesses? 
We make guesses when we wish to make a judgment but have incomplete infor- 
mation or uncertain knowledge. We want to make a guess as tp what things are, 
or what things are likely to happen. Often we wish to make a guess because we 
have to make a decision. For example: Shall I take my raincoat with me tomorrow? 
For what earth movement should I design a new building? Shall I build myself 
a fallout shelter? Shall I change my stand in international negotiations? Shall 
I go to class today? 

Sometimes we make guesses because we wish, with our limited knowledge, 
to say as much as we can about some situation. Really, any generalization is in 
the nature of a guess. Any physical theory is a kind of guesswork. There are good 
guesses and there are bad guesses. The theory of probability is a system for making 
better guesses. The language of probability allows us to speak quantitatively 
about some situation which may be highly variable, but which does have some 
consistent average behavior. 

Let us consider the flipping of a coin. If the toss — and the coin — are “honest,” 
we have no way of knowing what to expect for the outcome of any particular toss. 
Yet we would feel that in a large number of tosses there should be about equal 
numbers of heads and tails. We say: “The probability that a toss will land heads 
is 0.5.” 

We speak of probability only for observations that we contemplate being made 
in the future. By the “ probability ” of a particular outcome of an observation we 
mean our estimate for the most likely fraction of a number of repeated observa- 
tions that will yield that particular outcome . If we imagine repeating an observa- 
tion — such as looking at a freshly tossed coin — N times, and jf we call N A our 
estimate of the most likely number of our observations that will give some specified 
result A , say the result “heads,” then by P(A), the probability of observing A, 
we mean 

P(A) = Na/N . (6.1) 

Our definition requires several comments. First of all, we may speak of a 
probability of something happening only if the occurrence is a possible outcome 
of some repeatable observation. It is not clear that it would make any sense to 
ask: “What is the probability that there is a ghost in that house?” 
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You may object that no situation is exactly repeatable. That is right. Every 
different observation must at least be at a different time or place. All we can say 
is that the “repeated” observations should, for our intended purposes, appear 
to be equivalent. We should assume, at least, that each observation was made 
from an equivalently prepared situation, and especially with the same degree of 
ignorance at the start. (If we sneak a look at an opponent’s hand in a card game, 
our estimate of our chances of winning are different than if we do not’) 

We should emphasize that N and N a in Eq. (6.1) are not intended to represent 
numbers based on actual observations. Na is our best estimate of what would 
occur in N imagined observations. Probability depends, therefore, on our knowledge 
and on our ability to make estimates. In effect, on our common sense! Fortunately, 
there is a certain amount of agreement in the common sense of many things, so 
that different people will make the same estimate. Probabilities need not, however, 
be “absolute” numbers. Since they depend on our ignorance, they may become 
different if our knowledge changes. 

You may have noticed another rather “subjective” aspect of our definition 
of probability. We have referred to N A as “oUr estimate of the most likely num- 
ber ...” We do not mean that we expect to observe exactly Na, but that we expect 
a number near Na, and that the number N A is more likely than any other number 
in the vicinity. If we toss a coin, say, 30 times, we should expect that the number 
of heads would not be very likely to be exactly 15, but rather only some number 
near to 15, say 12, 13, 14, 15, 16, or 17. However, if we must choose, we would 
decide that 15 heads is more likely than any other number. We would write 
P(heads) = 0.5. 

Why did we choose 15 as more likely than any other number? We must 
have argued with ourselves in the following manner: If the most likely number of 
heads is N H in a total number of tosses N, then the most likely number of tails 
Nt is (N — Nff). (We are assuming that every toss gives either heads or tails, 
and no “other” result!) But if the coin is “honest,” there is no preference for heads 
or tails. Until we have some reason to think the coin (or toss) is dishonest, we must 
give equal likelihoods for heads and tails. So we must set N T = N H . It follows 
that N t ~ Nh = N/ 2, or P(H) = p(T) = 0.5. 

We can generalize our reasoning to any situation in which there are m dif- 
ferent but “equivalent” (that is, equally likely) possible results of an observation. 
If an observation can yield m different results, and we have reason to believe that 
any one of them is as likely as any other, then the probability of a particular 
outcome A is P(A) = l/m. 

If there are seven different-colored balls in an opaque box and we pick one 
out “at random” (that is, without looking), the probability of getting a ball of a 
particular color is -f. The probability that a “blind draw” from a shuffled deck 
of 52 cards will show the ten of hearts is The probability of throwing a double- 
one with dice is 

In Chapter 5 we described the size of a nucleus in terms of its apparent area, or 
“cross section.” When we did so we were really talking about probabilities. When we 
shoot a high-energy particle at a thin slab of material, there is some chance that it will 
pass right through and some chance that it will hit a nucleus. (Since the nucleus is so 
small that we cannot see it, we cannot aim right at a nucleus. We must “shoot blind.”) 
If there are n atoms in our slab and the nucleus of each atom has a cross-sectional area 
a, then the total area “shadowed” by the nuclei is na . In a large number N of random 
shots, we expect that the number of hits N c of some nucleus will be in the ratio to N as 
the shadowed area is to the total area of the slab : 

NcjN = na/A. (6.2) 

We may say, therefore, that the probability that any one projectile particle will suffer 
a collision in passing through the slab is 

_ n 

Pc = - o', 

A 

where n/A is the number of atoms per unit area in our slab. 
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6-2 Fluctuations 
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We would like now to use our ideas about probability to consider in some 
greater detail the question: “How many heads do I really expect to get if I toss 
a coin N times?” Before answering the question, however, let us look at what 
does happen in such an “experiment.” Figure 6-1 shows the results obtained in 
the first three “runs” of such an experiment in which N = 30. The sequences of 
“heads” and “tails” are shown just as they were obtained. The first game gave 
11 heads; the second also 11 ; the third 16. In three trials we did not once get 15 
heads. Should we begin to suspect the coin? Or were we wrong in thinking that 
the most likely number of “heads” in such a game is 15? Ninety-seven more runs 
were made to obtain a total of 100 experiments of 30 tosses each. The results 
of the experiment are given in Table 6-1.* 
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Fig. 6-1. Observed sequences of 
heads and tails in three games of 30 
tosses each. 


Table 6-1 

Number of heads in successive trials of 30 tosses of a coin. 
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Fig. 6-2. Summary of the results of 100 games of 30 tosses each. The vertical 
bars show the number of games in which a score of k heads was obtained. The dashed 
curve shows the expected numbers of games wirh the score k obtained by a proba- 
bility computation. 


Looking at the numbers in Table 6-1, we see that most of the results are 
“near” 15, in that they are between 12 and 18. We can get a better feeling for the 
details of these results if we plot a graph of the distribution of the results. We 
count the number of games in which a score of k was obtained, and plot this 
number for each k. Such a graph is shown in Fig. 6-2. A score of 15 heads was 
obtained in 13 games. A score of 14 heads was also obtained 13 times. Scores of 
16 and 17 were each obtained more than 13 times. Are we to conclude that there 
is some bias toward heads? Was our “best estimate” not good enough? Should 


* After the first three games, the experiment was actually done by shaking 30 pennies 
violently in a box and then counting the number of heads that showed. 
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Fig. 6-3. A diagram for showing the 
number of ways a score of 0, 1 , 2, or 3 
heads can be obtained in a game of 3 
tosses. 


SCORE 



Fig. 6-4. A diagram like that of Fig. 
6-3, for a game of 6 tosses. 


we conclude now that the “most likely” score for a run of 30 tosses is really 16 
heads? But wait! In all the games taken together, there were 3000 tosses. And the 
total number of heads obtained was 1492. The fraction of tosses that gave heads 
is 0.497, very nearly, but slightly less than half. We should certainly not assume 
that the probability of throwing heads is greater than 0.5! The fact that one par- 
ticular set of observations gave 16 heads most often, is a fluctuation. We still expect 
that the most likely number of heads is 15. 

We may ask the question: “What is the probability that a game of 30 tosses 
will yield 15 heads — or 16, or any other number?” We have said that in a game 
of one toss, the probability of obtaining one head is 0.5, and the probability of 
obtaining no head is 0.5. In a game of two tosses there are four possible outcomes: 
HH , HT , TH , TT. Since each of these sequences is equally likely, we conclude 
that (a) the probability of a score of two heads is £, (b) the probability of a score 
of one head is f, (c) the probability of a zero score is £. There are two ways of 
obtaining one head, but only one of obtaining either zero or two heads. 

Consider now a game of 3 tosses. The third toss is equally likely to be heads 
or tails. There is only one way to obtain 3 heads: we must have obtained 2 heads 
on the first two tosses, and then heads on the last. There are, however, three ways 
of obtaining 2 heads. We could throw tails after having thrown two heads (one 
way) or we could throw heads after throwing only one head in the first two tosses 
(two ways). So for scores of 3 -//, 2-/7, 1-//, 0-/7 we have that the number of equally 
likely ways is 1, 3, 3, 1, with a total of 8 different possible sequences. The prob- 
abilities are £, f , f , 

The argument we have been making can be summarized by a diagram like that 
in Fig. 6-3. It is clear how the diagram should be continued for games with a larger 
number of tosses. Figure 6-4 shows such a diagram for a game of 6 tosses. The 
number of “ways” to any point on the diagram is just the number of different 
“paths” (sequences of heads and tails) which can be taken from the starting point. 
The vertical position gives us the total number of heads thrown. The set of num- 
bers which appears in such a diagram is known as Pascal's triangle. The numbers 
are also known as the binomial coefficients , because they also appear in the ex- 
pansion of (a + b) n . If we call n the number of tosses and k the number of heads 
thrown, then the numbers in the diagram are usually designated by the symbol 
(J). We may remark in passing that the binomial coefficients can also be com- 
puted from 

O-HSHTiT- M 

where n!, called “^-factorial,” represents the product (n)(n — l)(n — 2) ... (3) 

( 2 )( 1 ). 

We are now ready to compute the probability P(k , n) of throwing k heads in 
n tosses, using our definition Eq. (6.1). The total number of possible sequences 
is 2 n (since there are 2 outcomes for each toss), and the number of ways of obtain- 
ing k heads is (*), all equally likely, so we have 


P(k,n) = 


© 

2 » 


(6.5) 


Since P(k , n) is the fraction of games which we expect to yield k heads, then 
in 100 games we should expect to find k heads 100 • P(k , n) times. The dashed 
curve in Fig. 6-2 passes through the points computed from 100 ■ P(k> 30). We 
see that we expect to obtain a score of 15 heads in 14 or 15 games, whereas this 
score was observed in 13 games. We expect a score of 16 in 13 or 14 games, but 
we obtained that score in 16 games. Such fluctuations are “part of the game.” 

The method we have just used can be applied to the most general situation 
in which there are only two possible outcomes of a single observation. Let us 
designate the two outcomes by W (for “win”) and L (for “lose”). In the general 
case, the probability of W or L in a single event need not be equal. Let p be the 
probability of obtaining the result W. Then q , the probability of L, is necessarily 
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(1 — p). In a set of n trials, the probability P(k 9 n) that W will be obtained k 
times is 

P(k,n) = (j.)p k q n ~ k . ( 6 . 6 ) 

This probability function is called the Bernoulli or, also, the binomial probability. 


6-3 The random walk 

There is another interesting problem in which the idea of probability is re- 
quired. It is the problem of the “random walk.” In its simplest version, we imagine 
a “game” in which a “player” starts at the point jc = 0 and at each “move” 
is required to take a step either forward (toward +x) or backward (toward — x). 
The choice is to be made randomly , determined, for example, by the toss of a coin. 
How shall we describe the resulting motion? In its general form the problem is 
related to the motion of atoms (or other particles) in a gas — called Brownian 
motion — and also to the combination of errors in measurements. You will see 
that the random-walk problem is closely related to the coin-tossing problem we 
have already discussed. 

First, let us look at a few examples of a random walk. We may characterize 
the walker’s progress by the net distance D N traveled in N steps. We show in the 
graph of Fig. 6-5 three examples of the path of a random walker. (We have used 
for the random sequence of choices the results of the coin tosses shown in Fig. 
6 - 1 .) 



Fig. 6-5. The progress made in a random walk. The horizontal coordinate N 
is the total number of steps taken; the vertical coordinate D(N) is the net distance 
moved from the starting position. 

What can we say about such a motion? We might first ask: “How far does 
he get on the average?” We must expect that his average progress will be zero, 
since he is equally likely to go either forward or backward. But we have the feel- 
ing that as N increases, he is more likely to have strayed farther from the starting 
point. We might, therefore, ask what is his average distance travelled in absolute 
value , that is, what is the average of |D|. It is, however, more convenient to deal 
with another measure of “progress,” the square of the distance: D 2 is positive 
for either positive or negative motion, and is therefore a reasonable measure of 
such random wandering. 

We can show that the expected value of D§ is just N , the number of steps 
taken. By “expected value” we mean the probable value (our best guess), which 
we can think of as the expected average behavior in many repeated sequences. 

We represent such an expected value by { D §), and may refer to it also as the “mean 
square distance.” After one step, D 2 is always +1, so we have certainly (D\) — 1. 

(All distances will be measured in terms of a unit of one step. We shall not continue 
to write the units of distance.) 
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The expected value of D# for N > 1 can be obtained from D^-i. If, after 
(N — 1) steps, we have D^-u then after N steps we have D N = Z^-i + 1 or 
D n — D n ~i — 1. For the squares, 


d% = < 


Dn- i + 2 Z>jv_i + I, 
or 

l D%—x — 2 D n _i + 1. 


(6.7) 


In a number of independent sequences, we expect to obtain each value one-half 
of the time, so our average expectation is just the average of the two possible 
values. The expected value of D N is then Av-i + 1* In general , we should 
expect for D^-i its “expected value” (D^_ x ) (by definition!). So 

(D%) = (Dn- i) + 1* (6.8) 

We have already shown that (D\) = 1 ; it follows then that 

D% = N, (6.9) 

a particularly simple result! 

If we wish a number like a distance, rather than a distance squared, to repre- 
sent the “progress made away from the origin” in a random walk, we can use the 
“root-mean-square distance” D r ms : 

D rme = VW) = Vn- (6.10) 


We have pointed out that the random walk is closely similar in its mathe- 
matics to the coin-tossing game we considered at the beginning of the chapter. 
If we imagine the direction of each step to be in correspondence with the appearance 
of heads or tails in a coin toss, then D is just N H — N T , the difference in the num- 
ber of heads and tails. Since N H + N T = N, the total number of steps (and tosses), 
we have D = 2N H — N. We have derived earlier an expression for the expected 
distribution of N H (also called k) and obtained the result of Eq. (6.5). Since 
N is just a constant, we have the corresponding distribution for D. (Since for 
every head more than N/2 there is a tail “missing,” we have the factor of 2 between 
N h and D.) The graphs of Fig. 6-2 represent the distribution of distances we might 
get in 30 random steps (where A; = 15 is to be read D = 0;A: = 16, D = 2; etc.). 

The variation of N H from its expected value N/2 is 

( 6 . 11 ) 

The rms deviation is 

(n h - //) = WN. (6.12) 

\ Z /rms 

According to our result for Z> rmg , we expect that the “typical” distance in 
30 steps ought to be \/30 = 5.5, or a typical k should be about 5.5/2 = 2.8 
units from 15. We see that the “width” of the curve in Fig. 6-2, measured from 
the center, is just about 3 units, in agreement with this result. 

We are now in a position to consider a question we have avoided until now. 
How shall we tell whether a coin is “honest” or “loaded”? We can give now 
at least a partial answer. For an honest coin, we expect the fraction of the times 
heads appears to be 0.5, that is, 


= 0.5. (6.13) 

We also expect an actual Nh to deviate from N/2 by about y/N/2, or the fraction 
to deviate by 

Vn i_ 

ivT' 2Vn' 

The larger N is, the closer we expect the fraction N H /N to be to one-half. 
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Fig. 6-6. The fraction of the tosses 
that gave heads in a particular sequence 
of N tosses of a penny. 



In Fig. 6-6 we have plotted the fraction N H /N for the coin tosses reported 
earlier in this chapter. We see the tendency for the fraction of heads to approach 
0.5 for large N . Unfortunately, for any given run or combination of runs there is 
no guarantee that the observed deviation will be even near the expected deviation. 
There is always the finite chance that a large fluctuation — a long string of heads 
or tails — will give an arbitrarily large deviation. All we can say is that if the 
deviation is near the expected 1/2%/N (say within a factor of 2 or 3), we have no 
reason to suspect the honesty of the coin. If it is much larger, we may be suspi- 
cious, but cannot prove, that the coin is loaded (or that the tosser is clever!). 

We have also not considered how we should treat the case of a “coin” or 
some similar “chancy” object (say a stone that always lands in either of two posi- 
tions) that we have good reason to believe should have a different probability for 
heads and tails. We have defined P(H) = {N H )/N. How shall we know what to 
expect for In some cases, the best we can do is to observe the number of 

heads obtained in large numbers of tosses. For want of anything better, we must 
set(ATff) = N h (observed). (How could we expect anything else?) We must under- 
stand, however, that in such a case a different experiment, or a different observer, 
might conclude that P(H) was different. We would expect , however, that the various 
answers should agree within the deviation l/2\/A[if P{H) is near one-half]. An 
experimental physicist usually says that an “experimentally determined” probability 
has an “error,” and writes 


- tF * 57s < 614 > 

There is an implication in such an expression that there is a “true” or “correct” 
probability which could be computed if we knew enough, and that the observation 
may be in “error” due to a fluctuation. There is, however, no way to make such 
thinking logically consistent. It is probably better to realize that the probability 
concept is in a sense subjective, that it is always based on uncertain knowledge, 
and that its quantitative evaluation is subject to change as we obtain more 
information. 


6-4 A probability distribution 

Let us return now to the random walk and consider a modification of it. 
Suppose that in addition to a random choice of the direction (+ or — ) of each 
step, the length of each step also varied in some unpredictable way, the only condi- 
tion being that on ihe average the step length was one unit. This case is more 
representative of something like the thermal motion of a molecule in a gas. If 
we call the length of a step S 9 then S may have any value at all, but most often 
will be “near” 1. To be specific, we shall let ( S 2 ) = 1 or, equivalently, S Tm8 = 1. 
Our derivation for {D 2 } would proceed as before except that Eq. (6.8) would be 
changed now to read 

(&n) ~ {D%~ i) + (S 2 ) = (Dn- i) + 1. (6.15) 

We have, as before, that 

{D%) = N. (6.16) 
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D(x ) 



Fig. 6-8. The probability that the 
distance D traveled in a random walk is 
between xi and X 2 is the area under the 
curve of p(x) from xi to x 2 . 



Fig. 6-7. The probability density for ending up at the distance D from the starting 
place in a random walk of N steps. (0 is measured in units of the rms step length.) 


What would we expect now for the distribution of distances D1 What is, 
for example, the probability that D ~ 0 after 30 steps? The answer is zero! 
The probability is zero that D will be any particular value, since there is no chance 
at all that the sum of the backward steps (of varying lengths) would exactly equal 
the sum of forward steps. We cannot plot a graph like that of Fig. 6-2. 

We can, however, obtain a representation similar to that of Fig. 6-2, if we 
ask, not what is the probability of obtaining D exactly equal to 0, 1, or 2, but 
instead what is the probability of obtaining D near 0, 1, or 2. Let us define P(x, Ax) 
as the probability that D will lie in the interval Ax located at x (say from x to 
x + Ax ). We expect that for small Ax the chance of D landing in the interval 
is proportional to Ax, the width of the interval. So we can write 

P(x, Ax) — p(x) Ax. (6.17) 

The function p(x) is called the probability density. 

The form of p(x) will depend on N, the number of steps taken, and also on 
the distribution of individual step lengths. We cannot demonstrate the proofs 
here, but for large N, p(x) is the same for all reasonable distributions in individual 
step lengths, and depends only on N. We plot p(x) for three values of N in Fig. 
6-7. You will notice that the “half-widths” (typical spread from x = 0) of these 
curves is \/N, as we have shown it should be. 

You may notice also that the value of p(x) near zero is inversely proportional 
to \/N. This comes about because the curves are all of a similar shape and their 
areas under the curves must all be equal. Since p(x) Ax is the probability of find- 
ing D in Ax when Ax is small, we can determine the chance of finding D somewhere 
inside an arbitrary interval from Xi to x 2 , by cutting the interval in a number of 
small increments Ax and evaluating the sum of the terms p(x) Ax for each incre- 
ment. The probability that D lands somewhere between xi and x 2 , which we may 
write P(xi < D < x 2 ), is equal to the shaded area in Fig. 6-8. The smaller we 
take the increments Ax, the more correct is our result. We can write, therefore, 

P(x x < D < x 2 ) = L p(x)Ax = [ X2 p(x)dx. (6.18) 

Jx 1 

The area under the whole curve is the probability that D lands somewhere 
(that is, has some value between x = — 00 and x = + 00 ). That probability is 
surely 1. We must have that 


j — 00 


dx 


1 . 


(6.19) 
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Since the curves in Fig. 6-7 get wider in proportion to y/N, their heights must be 
proportional to 1 /y/N to maintain the total area equal to 1. 

The probability density function we have been describing is one that is en- 
countered most commonly. It is known as the normal or gaussian probability 
density. It has the mathematical form 

p(x) = -4= (6.20) 


where <r is called the standard deviation and is given, in our case, by a = y/N 
or, if the rms step size is different from 1, by <r — y/N S Tm8 . 

We remarked earlier that the motion of a molecule, or of any particle, in a gas 
is like a random walk. Suppose we open a bottle of an organic compound and let 
some of its vapor escape into the air. If there are air currents, so that the air is 
circulating, the currents will also carry the vapor with them. But even in perfectly 
still air , the vapor will gradually spread out — will diffuse — until it has penetrated 
throughout the room. We might detect it by its color or odor. The individual 
molecules of the organic vapor spread out in still air because of the molecular 
motions caused by collisions with other molecules. If we know the average “step” 
size, and the number of steps taken per second, we can find the probability that 
one, or several, molecules will be found at some distance from their starting point 
after any particular passage of time. As time passes, more steps are taken and the 
gas spreads out as in the successive curves of Fig. 6-7. In a later chapter, we shall 
find out how the step sizes and step frequencies are related to the temperature and 
pressure of a gas. 

Earlier, we said that the pressure of a gas is due to the molecules bouncing 
against the walls of the container. When we come later to make a more quantita- 
tive description, we will wish to know how fast the molecules are going when they 
bounce, since the impact they make will depend on that speed. We cannot, how- 
ever, speak of the speed of the molecules. It is necessary to use a probability 
description. A molecule may have any speed, but some speeds are more likely 
than others. We describe what is going on by saying that the probability that any 
particular molecule will have a speed between v and v + Av is piy) Av, where 
p(v ), a probability density, is a given function of the speed v. We shall see later 
how Maxwell, using common sense and the ideas of probability, was able to 
find a mathematical expression for p(v ). The form* of the function p(y) is shown 
in Fig. 6-9. Velocities may have any value, but are most likely to be near the 
most probable or expected value (v). 

We often think of the curve of Fig. 6-9 in a somewhat different way. If we 
consider the molecules in a typical container (with a volume of, say, one liter), 
then there are a very large number N of molecules present ( N « 10 2 2 ), Since 
p(y) Av is the probability that one molecule will have its velocity in Av , by our 
definition of probability we mean that the expected number (AN) to be found with 
a velocity in the interval Av is given by 



Fig. 6-9. The distribution of velocities 
of the molecules in a gas. 


(AN) = Np(v) Av. 


( 6 . 21 ) 


We call Np(v) the “distribution in velocity.” The area under the curve between 
two velocities v x and v 2 , for example the shaded area in Fig. 6-9, represents 
[for the curve Np(v)] the expected number of molecules with velocities between 
v i and v 2 . Since with a gas we are usually dealing with large numbers of molecules, 
we expect the deviations from the expected numbers to be small (like 1 /y/N), so 
we often neglect to say the “expected” number, and say instead: “The number of 
molecules with velocities between v\ and v 2 is the area under the curve.” We 
should remember, however, that such statements are always about probable 
numbers. 


* Maxwell’s expression is p{v) = Cv 2 e~ av2 , where a is a constant related to the tem- 
perature and C is chosen so that the total probability is one. 
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Fig. 6-10. Probability densities for 
observation of the position and velocity 
of a particle. 


6-5 The uncertainty principle 

The ideas of probability are certainly useful in describing the behavior of 
the 10 2 2 or so molecules in a sample of a gas, for it is clearly impractical even 
to attempt to write down the position or velocity of each molecule. When prob- 
ability was first applied to such problems, it was considered to be a convenience — 
a way of dealing with very complex situations. We now believe that the ideas of 
probability are essential to a description of atomic happenings. According to 
quantum mechanics, the mathematical theory of particles, there is always some 
uncertainty in the specification of positions and velocities. We can, at best, say 
that there is a certain probability that any particle will have a position near some 
coordinate x. 

We can give a probability density pi(x\ such that pi(x) Ax is the probability 
that the particle will be found between x and x + Ax. If the particle is reasonably 
well localized, say near xi, the function pi(x) might be given by the graph of 
Fig, 6-1 0(a). Similarly, we must specify the velocity of the particle by means of 
a probability density p 2 (y), with Ay the probability that the velocity will 
be found between v and v + Av. 

It is one of the fundamental results of quantum mechanics that the two func- 
tions /?i(x) and p 2 {v) cannot be chosen independently and, in particular, cannot 
both be made arbitrarily narrow. If we call the typical “width” of the />i(x) 
curve [Ax], and that of the curve [Ay] (as shown in the figure), nature demands 
that the product of the two widths be at least as big as the number h/m y where m 
is the mass of the particle and A is a fundamental physical constant called Planck's 
constant. We may write this basic relationship as 

[Ax] • [Ay] > h/m. (6.22) 

This equation is a statement of the Heisenberg uncertainty principle that we 
mentioned earlier. 

Since the right-hand side of Eq. (6.22) is a constant, this equation says that 
if we try to “pin down” a particle by forcing it to be at a particular place, it ends 
up by having a high speed. Or if we try to force it to go very slowly, or at a 
precise velocity, it “spreads out” so that we do not know very well just where 
it is. Particles behave in a funny way! 

The uncertainty principle describes an inherent fuzziness that must exist in 
any attempt to describe nature. Our most precise description of nature must 
be in terms of probabilities. There are some people who do not like this way of 
describing nature. They feel somehow that if they could only tell what is really 
going on with a particle, they could know its speed and position simultaneously. 
In the early days of the development of quantum mechanics, Einstein was quite 
worried about this problem. He used to shake his head and say, “But, surely God 
does not throw dice in determining how electrons should go!” He worried about 
that problem for a long time and he probably never really reconciled himself to 
the fact that this is the best description of nature that one can give. There are 
still one or two physicists who are working on the problem who have an intuitive 
conviction that it is possible somehow to describe the world in a different way 
and that all of this uncertainty about the way things are can be removed. No one 
has yet been successful. 

The necessary uncertainty in our specification of the position of a particle 
becomes most important when we wish to describe the structure of atoms. In 
the hydrogen atom, which has a nucleus of one proton with one electron outside 
of the nucleus, the uncertainty in the position of the electron is as large as the atom 
itself! We cannot, therefore, properly speak of the electron moving in some “orbit” 
around the proton. The most we can say is that there is a certain chance p{r) AV t 
of observing the electron in an element of volume AV at the distance r from the 
proton. The probability density p(r) is given by quantum mechanics. For an 
undisturbed hydrogen atom p{r) = Ae~ r2/a2 t which is a bell-shaped function like 
that in Fig. 6-8. The number a is the “typical” radius, where the function is decreas- 
ing rapidly. Since there is a small probability of finding the electron at distances 
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Fig. 6-1 1 . A way of visualizing a hy- 
drogen atom. The density (whiteness) of 
the cloud represents the probability 
density for observing the electron. 



from the nucleus much greater than a, we may think of a as “the radius of the 
atom,” about 10~ 10 meter. 

We can form an image of the hydrogen atom by imagining a “cloud” whose 
density is proportional to the probability density for observing the electron. 
A sample of such a cloud is shown in Fig. 6-11. Thus our best “picture” of a 
hydrogen atom is a nucleus surrounded by an “electron cloud” (although we really 
mean a “probability cloud”). The electron is there somewhere, but nature per- 
mits us to know only the chance of finding it at any particular place. 

In its efforts to learn as much as possible about nature, modern physics has 
found that certain things can never be “known” with certainty. Much of our 
knowledge must always remain uncertain. The most we can know is in terms of 
probabilities. 
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The Theory of Gravitation 


7-1 Planetary motions 

In this chapter we shall discuss one of the most far-reaching generalizations 
of the human mind. While we are admiring the human mind, we should take some 
time off to stand in awe of a nature that could follow with such completeness and 
generality such an elegantly simple principle as the law of gravitation. What is 
this law of gravitation? It is that every object in the universe attracts every 
other object with a force which for any two bodies is proportional to the mass of 
each and varies inversely as the square of the distance between them. This state- 
ment can be expressed mathematically by the equation 

mm ' 

F — G — — • 
r 2 

If to this we add the fact that an object responds to a force by accelerating in the 
direction of the force by an amount that is inversely proportional to the mass of 
the object, we shall have said everything required, for a sufficiently talented 
mathematician could then deduce all the consequences of these two principles. 
However, since you are not assumed to be sufficiently talented yet, we shall dis- 
cuss the consequences in more detail, and not just leave you with only these two 
bare principles. We shall briefly relate the story of the discovery of the law of 
gravitation and discuss some of its consequences, its effects on history, the mys- 
teries that such a law entails, and some refinements of the law made by Einstein; 
we shall also discuss the relationships of the law to the other laws of physics. 
All this cannot be done in one chapter, but these subjects will be treated in due 
time in subsequent chapters. 

The story begins with the ancients observing the motions of planets among the 
stars, and finally deducing that they went around the sun, a fact that was redis- 
covered later by Copernicus. Exactly how the planets went around the sun, 
with exactly what motion , took a little more work to discover. In the beginning of 
the fifteenth century there were great debates as to whether they really went around 
the sun or not. Tycho Brahe had an idea that was different from anything pro- 
posed by the ancients: his idea was that these debates about the nature of the 
motions of the planets would best be resolved if the actual positions of the planets 
in the sky were measured sufficiently accurately. If measurement showed exactly 
how the planets moved, then perhaps it would be possible to establish one or 
another viewpoint. This was a tremendous idea — that to find something out, it 
is better to perform some careful experiments than to carry on deep philosophical 
arguments. Pursuing this idea, Tycho Brahe studied the positions of the planets 
for many years in his observatory on the island of Hven, near Copenhagen. He 
made voluminous tables, which were then studied by the mathematician Kepler, 
after Tycho’s death. Kepler discovered from the data some very beautiful and 
remarkable, but simple, laws regarding planetary motion. 
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7-2 Kepler’s laws 

First of all, Kepler found that each planet goes around the sun in a curve 
called an ellipse , with the sun at a focus of the ellipse. An ellipse is not just an 
oval, but is a very specific and precise curve that can be obtained by using two 
tacks, one at each focus, a loop of string, and a pencil ; more mathematically, it 
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Fig. 7-1. An ellipse. 



Fig. 7-2. Kepler's law of areas. 


is the locus of all points the sum of whose distances from two fixed points (the foci) 
is a constant. Or, if you will, it is a foreshortened circle (Fig. 7-1). 

Kepler’s second observation was that the planets do not go around the sun 
at a uniform speed, but move faster when they are nearer the sun and more 
slowly when they are farther from the sun, in precisely this way: Suppose a planet 
is observed at any two successive times, let us say a week apart, and that the radius 
vector* is drawn to the planet for each observed position. The orbital arc traversed 
by the planet during the week, and the two radius vectors, bound a certain plane 
area, the shaded area shown in Fig. 7-2. If two similar observations are made a 
week apart, at a part of the orbit farther from the sun (where the planet moves 
more slowly), the similarly bounded area is exactly the same as in the first case. 
So, in accordance with the second law, the orbital speed of each planet is such that 
the radius “sweeps out” equal areas in equal times. 

Finally, a third law was discovered by Kepler much later; this law is of a 
different category from the other two, because it deals not with only a single planet, 
but relates one planet to another. This law says that when the orbital period and 
orbit size of any two planets are compared, the periods are proportional to the 
3/2 power of the orbit size. In this statement the period is the time interval it 
takes a planet to go completely around its orbit, and the size is measured by the 
length of the greatest diameter of the elliptical orbit, technically known as the 
major axis. More simply, if the planets went in circles, as they nearly do, the 
time required to go around the circle would be proportional to the 3/2 power of 
the diameter (or radius). Thus Kepler’s three laws are: 

I. Each planet moves around the sun in an ellipse, with the sun at one focus. 

II. The radius vector from the sun to the planet sweeps out equal areas in 
equal intervals of time. 

III. The squares of the periods of any two planets are proportional to the 
cubes of the semimajor axes of their respective orbits: T ~ a 3/2 . 


7-3 Development of dynamics 

While Kepler was discovering these laws, Galileo was studying the laws of 
motion. The problem was, what makes the planets go around? (In those days, 
one of the theories proposed was that the planets went around because behind 
them were invisible angels, beating their wings and driving the planets forward. 
You will see that this theory is now modified! It turns out that in order to keep 
the planets going around, the invisible angels must fly in a different direction and 
they have no wings. Otherwise, it is a somewhat similar theory!) Galileo dis- 
covered a very remarkable fact about motion, which was essential for under- 
standing these laws. That is the principle of inertia — if something is moving, with 
nothing touching it and completely undisturbed, it will go on forever, coasting at 
a uniform speed in a straight line. ( Why does it keep on coasting? We do not 
know, but that is the way it is.) 

Newton modified this idea, saying that the only way to change the motion 
of a body is to use force . If the body speeds up, a force has been applied in the 
direction of motion . On the other hand, if its motion is changed to a new direc- 
tion , a force has been applied sideways . Newton thus added the idea that a force 
is needed to change the speed or the direction of motion of a body. For example, 
if a stone is attached to a string and is whirling around in a circle, it takes a force 
to keep it in the circle. We have to pull on the string. In fact, the law is that the 
acceleration produced by the force is inversely proportional to the mass, or the 
force is proportional to the mass times the acceleration. The more massive a 
thing is, the stronger the force required to produce a given acceleration. (The 
mass can be measured by putting other stones on the end of the same string and 
making them go around the same circle at the same speed. In this way it is found 
that more or less force is required, the more massive object requiring more force.) 


* A radius vector is a line drawn from the sun to any point in a planet’s orbit. 
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The brilliant idea resulting from these considerations is that no tangential force 
is needed to keep a planet in its orbit (the angels do not have to fly tangentially) 
because the planet would coast in that direction anyway. If there were nothing 
at all to disturb it, the planet would go off in a straight line . But the actual motion 
deviates from the line on which the body would have gone if there were no force, 
the deviation being essentially at right angles to the motion, not in the direction 
of the motion. In other words, because of the principle of inertia, the force needed 
to control the motion of a planet around the sun is not a force around the sun 
but toward the sun. (If there is a force toward the sun, the sun might be the angel, 
of course!) 


7-4 Newton’s law of gravitation 

From his better understanding of the theory of motion, Newton appreciated 
that the sun could be the seat or organization of forces that govern the motion of 
the planets. Newton proved to himself (and perhaps we shall be able to prove it 
soon) that the very fact that equal areas are swept out in equal times is a precise 
sign post of the proposition that all deviations are precisely radial — that the law of 
areas is a direct consequence of the idea that all of the forces are directed exactly 
toward the sun . 

Next, by analyzing Kepler’s third law it is possible to show that the farther 
away the planet, the weaker the forces. If two planets at different distances from 
the sun are compared, the analysis shows that the forces are inversely propor- 
tional to the squares of the respective distances. With the combination of the 
two laws, Newton concluded that there must be a force, inversely as the square 
of the distance, directed in a line between the two objects. 

Being a man of considerable feeling for generalities, Newton supposed, of 
course, that this relationship applied more generally than just to the sun holding 
the planets. It was already known, for example, that the planet Jupiter had moons 
going around it as the moon of the earth goes around the earth, and Newton 
felt certain that each planet held its moons with a force. He already knew of the 
force holding us on the earth, so he proposed that this was a universal force — 
that everything pulls everything else. 

The next problem was whether the pull of the earth on its people was the 
“same” as its pull on the moon, i.e., inversely as the square of the distance. If an 
object on the surface of the earth falls 16 feet in the first second after it is released 
from rest, how far does the moon fall in the same time? We might say that the 
moon does not fall at all. But if there were no force on the moon, it would go off 
in a straight line, whereas it goes in a circle instead, so it really falls in from where 
it would have been if there were no force at all. We can calculate from the radius 
of the moon’s orbit (which is about 240,000 miles) and how long it takes to go 
around the earth (approximately 29 days), how far the moon moves in its orbit 
in 1 second, and can then calculate how far it falls in one second.* This distance 
turns out to be roughly 1/20 of an inch in a second. That fits very well with the 
inverse square law, because the earth’s radius is 4000 miles, and if something which 
is 4000 miles from the center of the earth falls 16 feet in a second, something 
240,000 miles, or 60 times as far away, should fall only 1/3600 of 16 feet, which also 
is roughly 1/20 of an inch. Wishing to put this theory of gravitation to a test by 
similar calculations, Newton made his calculations very carefully and found a 
discrepancy so large that he regarded the theory as contradicted by facts, and did 
not publish his results. Six years later a new measurement of the size of the earth 
showed that the astronomers had been using an incorrect distance to the moon. 
When Newton heard of this, he made the calculation again, with the corrected 
figures, and obtained beautiful agreement. 

This idea that the moon “falls” is somewhat confusing, because, as you see, 
it does not come any closer. The idea is sufficiently interesting to merit further 


* That is, how far the circle of the moon’s orbit falls below the straight line tangent 
to it at the point where the moon was one second before. 
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Fig. 7-3. Apparatus for showing the 
independence of vertical and horizontal 
motions. 
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explanation; the moon falls in the sense that it falls away from the straight line 
that it would pursue if there were no forces . Let us take an example on the surface 
of the earth. An object released near the earth’s surface will fall 16 feet m the first 
second. An object shot out horizontally will also fall 16 feet; even though it is 
moving horizontally, it still falls the same 16 feet in the same time. Figure 7-3 
shows an apparatus which demonstrates this. On the horizontal track is a ball 
which is going to be driven forward a little distance away. At the same height 
is a ball which is going to fall vertically, and there is an electrical switch arranged 
so that at the moment the first ball leaves the track, the second ball is released. 
That they come to the same depth at the same time is witnessed by the fact that 
they collide in midair. An object like a bullet, shot horizontally, might go a long 
way in one second — perhaps 2000 feet — but it will still fall 16 feet if it is aimed 
horizontally. What happens if we shoot a bullet faster and faster? Do not forget 
that the earth’s surface is curved. If we shoot it fast enough, then when it falls 
16 feet it may be at just the same height above the ground as it was before. How 
can that be? It still falls, but the earth curves away, so it falls “around” the earth. 
The question is, how far does it have to go in one second so that the earth is 
16 feet below the horizon? In Fig. 7-4 we see the earth with its 4000-mile radius, 
and the tangential, straightline path that the bullet would take if there were no 
force. Now, if we use one of those wonderful theorems in geometry, which says 
that our tangent is the mean proportional between the two parts of the diameter 
cut by an equal chord, we see that the horizontal distance travelled is the mean 
proportional between the 16 feet fallen and the 8000-mile diameter of the earth. 
The square root of (16/5280) X 8000 comes out very close to 5 miles. Thus 
we see that if the bullet moves at 5 miles a second, it then will continue to fall 
toward the earth at the same rate of 16 feet each second, but will never get any 
closer because the earth keeps curving away from it. Thus it was that Mr. Gagarin 
maintained himself in space while going 25,000 miles around the earth at approxi- 
mately 5 miles per second. (He took a little longer because he was a little higher.) 

Any great discovery of a new law is useful only if we can take more out than 
we put in. Now, Newton used the second and third of Kepler’s laws to deduce 
his law of gravitation. What did he predict ? First, his analysis of the moon’s 
motion was a prediction because it connected the falling of objects on the earth’s 
surface with that of the moon. Second, the question is, is the orbit an ellipse? 
We shall see in a later chapter how it is possible to calculate the motion exactly, 
and indeed one can prove that it should be an ellipse,* so no extra fact is needed 
to explain Kepler’s first law. Thus Newton made his first powerful prediction. 

The law of gravitation explains many phenomena not previously understood. 
For example, the pull of the moon on the earth causes the tides, hitherto mysterious. 
The moon pulls the water up under it and makes the tides — people had thought 
of that before, but they were not as clever as Newton, and so they thought there 
ought to be only one tide during the day. The reasoning was that the moon pulls 
the water up under it, making a high tide and a low tide, and since the earth spins 
underneath, that makes the tide at one station go up and down every 24 hours. 
Actually the tide goes up and down in 12 hours. Another school of thought 
claimed that the high tide should be on the other side of the earth because, so they 
argued, the moon pulls the earth away from the water! Both of these theories 
are wrong. It actually works like this: the pull of the moon for the earth and for 
the water is “balanced” at the center. But the water which is closer to the moon is 
pulled more than the average and the water which is farther away from it is pulled 
less than the average. Furthermore, the water can flow while the more rigid earth 
cannot. The true picture is a combination of these two things. 

What do we mean by “balanced”? What balances? If the moon pulls the 
whole earth toward it, why doesn’t the earth fall right “up” to the moon? Because 
the earth does the same trick as the moon, it goes in a circle around a point which 
is inside the earth but not at its center. The moon does not just go around the 


* The proof is not given in this course. 
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earth, the earth and the moon both go around a central position, each falling 
toward this common position, as shown in Fig. 7-5. This motion around the 
common center is what balances the fall of each. So the earth is not going in a 
straight line either; it travels in a circle. The water on the far side is “unbalanced’ 1 
because the moon's attraction there is weaker than it is at the cenier of ihe earth, 
where it just balances the “centrifugal force. 11 The result of this imbalance is that 
the water rises up, away from the center of the earth. On the near side, the attrac- 
tion from the moon is stronger, and the imbalance is in the opposite direction in 
space, but again away from the center of the earth. The net result is that we get 
two tidal bulges. 


7-5 Universal gravitation 
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What else can we understand when we understand gravity? Everyone knows 
the earth is round. Why is the earth round? That is easy; it is due to gravitation. 
The earth can be understood to be round merely because everything attracts 
everything else and so it has attracted itself together as far as it can! If we go even 
further, the earth is not exactly a sphere because it is rotating** and this brings in 
centrifugal effects which tend to oppose gravity near the equator. It turns out that 
the earth should be elliptical, and we even get the right shape for the ellipse. 
We can thus deduce that the sun, the moon, and the earth should be (nearly) 
spheres, just from the law of gravitation. 

What else can you do with the law of gravitation? If we look at the moons 
of Jupiter we can understand everything about the way they move around that 
planet. Incidentally, there was once a certain difficulty with the moons of Jupiter 
that is worth remarking on. These satellites were studied very carefully by Roemer, 
who noticed that the moons sometimes seemed to be ahead of schedule, and some- 
times behind. (One can find their schedules by waiting a very long time and finding 
out how long it takes on the average for the moons to go around.) Now they were 
ahead when Jupiter was particularly close to the earth and they were behind when 
Jupiter was farther from the earth. This would have been a very difficult thing to 
explain according to the law of gravitation — it would have been, in fact, the death 
of this wonderful theory if there were no other explanation. If a law does not work 
even in one place where it ought to, it is just wrong. But the reason for this dis- 
crepancy was very simple and beautiful: it takes a little while to see the moons of 
Jupiter because of the time it takes light to travel from Jupiter to the earth. When 
Jupiter is closer to the earth the time is a little less, and when it is farther from the 
earth, the time is more. This is why moons appear to be, on the average, a little 
ahead or a little behind, depending on whether they are closer to or farther from 
the earth. This phenomenon showed that light does not travel instantaneously, 
and furnished the first estimate of the speed of light. This was done in 1656. 

If all of the planets push and pull on each other, the force which controls, 
let us say, Jupiter in going around the sun is not just the force from the sun; 
there is also a pull from, say, Saturn. This force is not really strong, since the sun 
is much more massive than Saturn, but there is some pull, so the orbit of Jupiter 
should not be a perfect ellipse, and it is not; it is slightly off, and “wobbles” around 
the correct elliptical orbit. Such a motion is a little more complicated. Attempts 
were made to analyze the motions of Jupiter, Saturn, and Uranus on the basis 
of the law of gravitation. The effects of each of these planets on each other were 
calculated to see whether or not the tiny deviations and irregularities in these 
motions could be completely understood from this one law. Lo and behold, for 
Jupiter and Saturn, all was well, but Uranus was “weird.” It behaved in a very 
peculiar manner. It was not travelling in an exact ellipse, but that was under- 
standable, because of the attractions of Jupiter and Saturn. But even if allowance 
were made for these attractions, Uranus still was not going right, so the laws of 
gravitation were in danger of being overturned, a possibility that could not be 
ruled out. Two men, Adams and Leverrier, in England and France, independently, 
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Fig. 7-6. A double-star system. 


arrived at another possibility: perhaps there is another planet, dark and invisible, 
which men had not seen. This planet, N, could pull on Uranus. They calculated 
where such a planet would have to be in order to cause the observed perturba- 
tions. They sent messages to the respective observatories, saying, “Gentlemen, 
point your telescope to such and such a place, and you will see a new planet.” 
It often depends on with whom you are working as to whether they pay any atten- 
tion to you or not. They did pay attention to Leverrier; they looked, and there 
planet N was! The other observatory then also looked very quickly in the next 
few days and saw it too. 

This discovery shows that Newton’s laws are absolutely right in the solar 
system; but do they extend beyond the relatively small distances of the nearest 
planets? The first test lies in the question, do stars attract each other as well as 
planets? We have definite evidence that they do in the double stars . Figure 7-6 
shows a double star — two stars very close together (there is also a third star in 
the picture so that we will know that the photograph was not turned). The stars 
are also shown as they appeared several years later. We see that, relative to the 
“fixed” star, the axis of the pair has rotated, i.e., the two stars are going around 
each other. Do they rotate according to Newton’s laws? Careful measurements 
of the relative positions of one such double star system are shown in Fig. 7-7. 
There we see a beautiful ellipse, the measures starting in 1862 and going all the 
way around to 1904 (by now it must have gone around once more). Everything 
coincides with Newton’s laws, except that the star Sirius A is not at the focus. 
Why should that be? Because the plane of the ellipse is not in the “plane of the 
sky.” We are not looking at right angles to the orbit plane, and when an ellipse 
is viewed at a tilt, it remains an ellipse but the focus is no longer at the same place. 
Thus we can analyze double stars, moving about each other, according to the 
requirements of the gravitational law. 
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Fig. 7-8. A globular star cluster. 

That the law of gravitation is true at even bigger distances is indicated in 
Fig. 7-8. If one cannot see gravitation acting here, he has no soul. This figure 
shows one of the most beautiful things in the sky — a globular star cluster. All of 
the dots are stars. Although they look as if they are packed solid toward the center, 
that is due to the fallibility of our instruments. Actually, the distances between 
even the centermost stars are very great and they very rarely collide. There are 
more stars in the interior than farther out, and as we move outward there are 
fewer and fewer. It is obvious that there is an attraction among these stars. 
It is clear that gravitation exists at these enormous dimensions, perhaps 100,000 
times the size of the solar system. Let us now go further, and look at an entire 
galaxy, shown in Fig. 7-9. The shape of this galaxy indicates an obvious tendency 
for its matter to agglomerate. Of course we cannot prove that the law here is 
precisely inverse square, only that there is still an attraction, at this enormous 
dimension, that holds the whole thing together. One may say, “Well, that is all 
very clever but why is it not just a ball?” Because it is spinning and has angular 
momentum which it cannot give up as it contracts; it must contract mostly in a 
plane. (Incidentally, if you are looking for a good problem, the exact details of 
how the arms are formed and what determines the shapes of these galaxies has 
not been worked out.) It is, however, clear that the shape of the galaxy is due to 
gravitation even though the complexities of its structure have not yet allowed 



Fig. 7-9. A galaxy. 
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us to analyze it completely. In a galaxy we have a scale of perhaps 50,000 to 
100,000 light years. The earth’s distance from the sun is light minutes , so you 
can see how large these dimensions are. 

Gravity appears to exist at even bigger dimensions, as indicated by Fig. 7-10, 
which shows many “little” things clustered together. This is a cluster of galaxies , 
just like a star cluster. Thus galaxies attract each other at such distances that they 
too are agglomerated into clusters. Perhaps gravitation exists even over distances 
of tens of millions of light years; so far as we now know, gravity seems to go out 
forever inversely as the square of the distance. 

Not only can we understand the nebulae, but from the law of gravitation we 
can even get some ideas about the origin of the stars. If we have a big cloud of dust 
and gas, as indicated in Fig. 7-11, the gravitational attractions of the pieces of 
dust for one another might make them form little lumps. Barely visible in the figure 
are “little” black spots which may be the beginning of the accumulations of dust 
and gases which, due to their gravitation, begin to form stars. Whether we have 
ever seen a star form or not is still debatable. Figure 7-12 shows the one piece of 
evidence which suggests that we have. At the left is a picture of a region of gas 
with some stars in it taken in 1947, and at the right is another picture, taken only 
7 years later, which shows two new bright spots. Has gas accumulated, has gravity 
acted hard enough and collected it into a ball big enough that the stellar nuclear 
reaction starts in the interior and turns it into a star? Perhaps, and perhaps not. 
It is unreasonable that in only seven years we should be so lucky as to see a star 
change itself into visible form; it is much less probable that we should see two ! 



Fig. 7-1 1. An interstellar dust cloud. Fig. 7-12. The formation of new stars? 
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7-6 Cavendish’s experiment 

Gravitation, therefore, extends over enormous distances. But if there is a 
force between any pair of objects, we ought to be able to measure the force between 
our own objects. Instead of having to watch the stars go around each other, 
why can we not take a ball of lead and a marble and watch the marble go toward 
the ball of lead? The difficulty of this experiment when done in such a simple 
manner is the very weakness or delicacy of the force. It must be done with extreme 
care, which means covering the apparatus to keep the air out, making sure it is 
not electrically charged, and so on; then the force can be measured. It was first 
measured by Cavendish with an apparatus which is schematically indicated in 
Fig. 7-13. This first demonstrated the direct force between two large, fixed balls 
of lead and two smaller balls of lead on the ends of an arm supported by a very 
fine fiber, called a torsion fiber. By measuring how much the fiber gets twisted, 
one can measure the strength of the force, verify that it is inversely proportional 
to the square of the distance, and determine how strong it is. Thus, one may 
accurately determine the coefficient G in the formula 

F=G ~*~ 

All the masses and distances are known. You say, 44 We knew it already for the 
earth.” Yes, but we did not know the mass of the earth. By knowing G from this 
experiment and by knowing how strongly the earth attracts, we can indirectly 
learn how great is the mass of the earth! This experiment has been called “weighing 
the earth.” Cavendish claimed he was weighing the earth, but what he was meas- 
uring was the coefficient G of the gravity law. This is the only way in which the 
mass of the earth can be determined. G turns out to be 

6.670 X 10~ u newton • m 2 /kg 2 * 

It is hard to exaggerate the importance of the effect on the history of science 
produced by this great success of the theory of gravitation. Compare the confu- 
sion, the lack of confidence, the incomplete knowledge that prevailed in the earlier 
ages, when there were endless debates and paradoxes, with the clarity and simplic- 
ity of this law — this fact that all the moons and planets and stars have such a 
simple rule to govern them, and further that man could understand it and deduce 
how the planets should move! This is the reason for the success of the sciences in 
following years, for it gave hope that the other phenomena of the world might also 
have such beautifully simple laws. 


7-7 What is gravity? 

But is this such a simple law? What about the machinery of it? All we have 
done is to describe how the earth moves around the sun, but we have not said 
what makes it go. Newton made no hypotheses about this; he was satisfied to 
find what it did without getting into the machinery of it. No one has since given 
any machinery. It is characteristic of the physical laws that they have this abstract 
character. The law of conservation of energy is a theorem concerning quantities 
that have to be calculated and added together, with no mention of the machinery, 
and likewise the great laws of mechanics are quantitative mathematical laws for 
which no machinery is available. Why can we use mathematics to describe nature 
without a mechanism behind it? No one knows. We have to keep going because 
we find out more that way. 

Many mechanisms for gravitation have been suggested. It is interesting to con- 
sider one of these, which many people have thought of from time to time. At 
first, one is quite excited and happy when he “discovers” it, but he soon finds that 
it is not correct. It was first discovered about 1750. Suppose there were many 
particles moving in space at a very high speed in all directions and being only slightly 
absorbed in going through matter. When they are absorbed, they give an impulse 
to the earth. However, since there are as many going one way as another, the 
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Fig. 7-1 3. A simplified diagram of 
the apparatus used by Cavendish to 
verify the law of universal gravitation for 
small objects and to measure the gravita- 
tional constant G. 
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Fig. 7-14. The relative strengths of 
electrical and gravitational interactions 
between two electrons. 


impulses all balance. But when the sun is nearby, the particles coming toward the 
earth through the sun are partially absorbed, so fewer of them are coming from 
the sun than are coming from the other side. Therefore, the earth feels a net im- 
pulse toward the sun and it does not take one long to see that it is inversely as the 
square of the distance — because of the variation of the solid angle that the sun 
subtends as we vary the distance. What is wrong with that machinery? It in- 
volves some new consequences which are not true . This particular idea has the 
following trouble: the earth, in moving around the sun, would impinge on more 
particles which are coming from its forward side than from its hind side (when 
you run in the rain, the rain in your face is stronger than that on the back of your 
head!). Therefore there would be more impulse given the earth from the front, 
and the earth would feel a resistance to motion and would be slowing up in its orbit. 
One can calculate how long it would take for the earth to stop as a result of this 
resistance, and it would not take long enough for the earth to still be in its orbit, so 
this mechanism does not work. No machinery has ever been invented that “explains” 
gravity without also predicting some other phenomenon that does not exist. 

Next we shall discuss the possible relation of gravitation to other forces. 
There is no explanation of gravitation in terms of other forces at the present time. 
It is not an aspect of electricity or anything like that, so we have no explanation. 
However, gravitation and other forces are very similar, and it is interesting to 
note analogies. For example, the force of electricity between two charged objects 
looks just like the law of gravitation : the force of electricity is a constant, with a minus 
sign, times the product of the charges, and varies inversely as the square of the 
distance. It is in the opposite direction — likes repel. But is it still not very remark- 
able that the two laws involve the same function of distance? Perhaps gravitation 
and electricity are much more closely related than we think. Many attempts have 
been made to unify them; the so-called unified field theory is only a very elegant 
attempt to combine electricity and gravitation; but, in comparing gravitation and 
electricity, the most interesting thing is the relative strengths of the forces. Any 
theory that contains them both must also deduce how strong the gravity is. 

If we take, in some natural units, the repulsion of two electrons (nature’s 
universal charge) due to electricity, and the attraction of two electrons due to their 
masses, we can measure the ratio of electrical repulsion to the gravitational 
attraction. The ratio is independent of the distance and is a fundamental constant 
of nature. The ratio is shown in Fig. 7-14. The gravitational attraction relative 
to the electrical repulsion between two electrons is 1 divided by 4.17 x 10 42 ! 
The question is, where does such a large number come from? It is not accidental, 
like the ratio of the volume of the earth to the volume of a flea. We have considered 
two natural aspects of the same thing, an electron. This fantastic number is a 
natural constant, so it involves something deep in nature. Where could such a 
tremendous number come from? Some say that we shall one day find the “universal 
equation,” and in it, one of the roots will be this number. It is very difficult to 
find an equation for which such a fantastic number is a natural root. Other pos- 
sibilities have been thought of; one is to relate it to the age of the universe. Clearly, 
we have to find another large number somewhere. But do we mean the age of the 
universe in years? No, because years are not “natural”; they were devised by men. 
As an example of something natural, let us consider the time it takes light to go 
across a proton, 10 -24 second. If we compare this time with the age of the universe , 
2 X 10 10 years, the answer is 10“ 42 . It has about the same number of zeros going 
off it, so it has been proposed that the gravitational constant is related to the age 
of the universe. If that were the case, the gravitational constant would change with 
time, because as the universe got older the ratio of the age of the universe to the 
time which it takes for light to go across a proton would be gradually increasing. 
Is it possible that the gravitational constant is changing with time? Of course 
the changes would be so small that it is quite difficult to be sure. 

One test which we can think of is to determine what would have been the effect 
of the change during the past 10 9 years, which is approximately the age from 
the earliest life on the earth to now, and one-tenth of the age of the universe. 
In this time, the gravity constant would have increased by about 10 percent. It 
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turns out that if we consider the structure of the sun — the balance between the 
weight of its material and the rate at which radiant energy is generated inside it — 
we can deduce that if the gravity were 10 percent stronger, the sun would be much 
more than 10 percent brighter — by the sixth power of the gravity constant! If we 
calculate what happens to the orbit of the earth when the gravity is changing, we 
find that the earth was then closer in . Altogether, the earth would be about 100 
degrees centigrade hotter, and all of the water would not have been in the sea, but 
vapor in the air, so life would not have started in the sea. So we do not now believe 
that the gravity constant is changing with the age of the universe. But such argu- 
ments as the one we have just given are not very convincing, and the subject is 
not completely closed. 

It is a fact that the force of gravitation is proportional to the mass , the quantity 
which is fundamentally a measure of inertia — of how hard it is to hold something 
which is going around in a circle. Therefore two objects, one heavy and one light, 
going around a larger object in the same circle at the same speed because of gravity, 
will stay together because to go in a circle requires a force which is stronger for 
a bigger mass. That is, the gravity is stronger for a given mass in just the right 
proportion so that the two objects will go around together. If one object were inside 
the other it would stay inside; it is a perfect balance. Therefore, Gagarin or Titov 
would find things “weightless” inside a space ship; if they happened to let go 
of a piece of chalk, for example, it would go around the earth in exactly the same 
way as the whole space ship, and so it would appear to remain suspended before 
them in space. It is very interesting that this force is exactly proportional to the 
mass with great precision, because if it were not exactly proportional there would 
be some effect by which inertia and weight would differ. The absence of such an 
effect has been checked with great accuracy by an experiment done first by 
Eotvos in 1909 and more recently by Dicke. For all substances tried, the masses 
and weights are exactly proportional within 1 part m 1,000,000,000, or less. This 
is a remarkable experiment. 


7-8 Gravity and relativity 

Another topic deserving discussion is Einstein’s modification of Newton's 
law of gravitation. In spite of all the excitement it created, Newton’s law of gravi- 
tation is not correct! It was modified by Einstein to take into account the theory 
of relativity. According to Newton, the gravitational effect is instantaneous, that 
is, if we were to move a mass, we would at once feel a new force because of the 
new position of that mass; by such means we could send signals at infinite speed. 
Einstein advanced arguments which suggest that we cannot send signals faster 
than the speed of light , so the law of gravitation must be wrong. By correcting it 
to take the delays into account, we have a new law, called Einstein’s law of gravi- 
tation. One feature of this new law which is quite easy to understand is this: 
In the Einstein relativity theory, anything which has energy has mass — mass in 
the sense that it is attracted gravitationally. Even light, which has an energy, 
has a “mass.” When a light beam, which has energy in it, comes past the sun there 
is an attraction on it by the sun. Thus the light does not go straight, but is de- 
flected. During the eclipse of the sun, for example, the stars which are around the 
sun should appear displaced from where they would be if the sun were not there, 
and this has been observed. 

Finally, let us compare gravitation with other theories. In recent years we 
have discovered that all mass is made of tiny particles and that there are several 
kinds of interactions, such as nuclear forces, etc. None of these nuclear or electrical 
forces has yet been found to explain gravitation. The quantum-mechanical aspects 
of nature have not yet been carried over to gravitation. When the scale is so small 
that we need the quantum effects, the gravitational effects are so weak that the 
need for a quantum theory of gravitation has not yet developed. On the other hand, 
for consistency in our physical theories it would be important to see whether 
Newton’s law modified to Einstein’s law can be further modified to be consistent 
with the uncertainty principle. This last modification has not yet been completed. 
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8 


Motion 


8-1 Description of motion 

In order to find the laws governing the various changes that take place in 
bodies as time goes on, we must be able to describe the changes and have some way 
to record them. The simplest change to observe in a body is the apparent change 
in its position with time, «which we call motion. Let us consider some solid object 
with a permanent mark, which we shall call a point, that we can observe. We 
shall discuss the motion of the little marker, which might be the radiator cap of an 
automobile or the center of a falling ball, and shall try to describe the fact that it 
moves and how it moves. 

These examples may sound trivial, but many subtleties enter into the descrip- 
tion of change. Some changes are more difficult to describe than the motion of 
a point on a solid object, for example the speed of drift of a cloud that is drifting 
very slowly, but rapidly forming or evaporating, or the change of a woman’s 
mind. We do not know a simple way to analyze a change of mind, but since the 
cloud can be represented or described by many molecules, perhaps we can describe 
the motion of the cloud in principle by describing the motion of all its individual 
molecules. Likewise, perhaps even the changes in the mind may have a parallel 
in changes of the atoms inside the brain, but we have no such knowledge yet. 

At any rate, that is why we begin with the motion of points; perhaps we should 
think of them as atoms, but it is probably better to be more rough in the begin- 
ning and simply to think of some kind of small objects — small, that is, compared 
with the distance moved. For instance, in describing the motion of a car that is 
going a hundred miles, we do not have to distinguish between the front and the 
back of the car. To be sure, there are slight differences, but for rough purposes we 
say “the car,” and likewise it does not matter that our points are not absolute 
points; for our present purposes it is not necessary to be extremely precise. Also, 
while we take a first look at this subject we are going to forget about the three 
dimensions of the world. We shall just concentrate on moving in one direction, 
as in a car on one road. We shall return to three dimensions after we see how to 
describe motion in one dimension. Now, you may say, “This is all some kind of 
trivia,” and indeed it is. How can we describe such a one-dimensional motion — 
let us say, of a car? Nothing could be simpler. Among many possible ways, one 
would be the following. To determine the position of the car at different times, 
we measure its distance from the starting point and record all the observations. 
In Table 8-1, s represents the distance of the car, in feet, from the starting point, 
and t represents the time in minutes. The first line in the table represents zero 
distance and zero time — the car has not started yet. After one minute it has started 
and has gone 1200 feet. Then in two minutes, it goes farther— notice that it picked 
up more distance in the second minute — it has accelerated; but something hap- 
pened between 3 and 4 and even more so at 5 — it stopped at a light perhaps? Then 
it speeds up again and goes 13,000 feet by the end of 6 minutes, 18,000 feet at the 
end of 7 minutes, and 23,500 feet in 8 minutes; at 9 minutes it has advanced to 
only 24,000 feet, because in the last minute it was stopped by a cop. 

That is one way to describe the motion. Another way is by means of a graph. 
If we plot the time horizontally and the distance vertically, we obtain a curve some- 
thing like that shown in Fig. 8-1. As the time increases, the distance increases, 
at first very slowly and then more rapidly, and very slowly again for a little while 
at 4 minutes; then it increases again for a few minutes and finally, at 9 minutes, 
appears to have stopped increasing. These observations can be made from the 
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Table 8-1 


t (min) 

^ (ft) 

0 

i 

0 

1 

1200 

2 

4000 

3 

9000 

4 

9500 

5 

9600 

6 

13000 

7 

18000 

8 

23500 

9 

24000 



Fig. 8-1. Graph of distance versus 
time for the car. 



Table 8-2 


t (sec) 

-S (ft) 

0 

0 

1 

16 

2 

64 

3 

144 

4 

256 

5 

400 

6 

576 



Fig. 8-2. Graph of distance versus 
rime for a failing body. 


graph, without a table. Obviously, for a complete description one would have to 
know where the car is at the half-minute marks, too, but we suppose that the graph 
means something, that the car has some position at all the intermediate times. 

The motion of a car is complicated. For another example we take something 
that moves in a simpler manner, following more simple laws: a falling ball. 
Table 8-2 gives the time in seconds and the distance in feet for a falling body. 
At zero seconds the ball starts out at zero feet, and at the end of 1 second it has 
fallen 16 feet. At the end of 2 seconds, it has fallen 64 feet, at the end of 3 
seconds, 144 feet, and so on; if the tabulated numbers are plotted, we get the 
nice parabolic curve shown in Fig. 8-2. The formula for this curve can be written 
as 

j - 16/ 2 . (8.1) 

This formula enables us to calculate the distances at any time. You might say 
there ought to be a formula for the first graph too. Actually, one may write such 
a formula abstractly, as 

S = m (8.2) 

meaning that s is some quantity depending on t or, in mathematical phraseology, 
s is a function of t. Since we do not know what the function is, there is no way we 
can write it in definite algebraic form. 

We have now seen two examples of motion, adequately described with very 
simple ideas, no subtleties. However, there are subtleties — several of them. In 
the first place, what do we mean by time and space ? It turns out that these deep 
philosophical questions have to be analyzed very carefully in physics, and this 
is not so easy to do. The theory of relativity shows that our ideas of space and 
time are not as simple as one might think at first sight. However, for our present 
purposes, for the accuracy that we need at first, we need not be very careful about 
defining things precisely. Perhaps you say, “That’s a terrible thing — I learned that 
in science we have to define everything precisely.” We cannot define anything 
precisely! If we attempt to, we get into that paralysis of thought that comes to 
philosophers, who sit opposite each other, one saying to the other, “You don’t 
know what you are talking about!” The second one says, “What do you mean 
by know ? What do you mean by talking ? What do you mean by you ?,” and so on. 
In order to be able to talk constructively, we just have to agree that we are talking 
about roughly the same thing. You know as much about time as we need for the 
present, but remember that there are some subtleties that have to be discussed; 
we shall discuss them later. 

Another subtlety involved, and already mentioned, is that it should be possible 
to imagine that the moving point we are observing is always located somewhere. 
(Of course when we are looking at it, there it is, but maybe when we look away 
it isn’t there.) It turns out that in the motion of atoms, that idea alsd is false — 
we cannot find a marker on an atom and watch it move. That subtlety we shall 
have to get around in quantum mechanics. But we are first going to learn what the 
problems are before introducing the complications, and then we shall be in a better 
position to make corrections, in the light of the more recent knowledge of the 
subject. We shall, therefore, take a simple point of view about time and space. 
We know what these concepts are in a rough way, and those who have driven a 
car know what speed means. 


8-2 Speed 


Even though we know roughly what “speed” means, there are still some 
rather deep subtleties; consider that the learned Greeks were never able to adequately 
describe problems involving velocity. The subtlety comes when we try to compre- 
hend exactly what is meant by “speed.” The Greeks got very confused about this, 
and a new branch of mathematics had to be discovered beyond the geometry and 
algebra of the Greeks, Arabs, and Babylonians. As an illustration of the diffi- 
culty, try to solve this problem by sheer algebra: A balloon is being inflated so 
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that the volume of the balloon is increasing at the rate of 100 cm 3 per second; 
at what speed is the radius increasing when the volume is 1000 cm 3 ? The Greeks 
were somewhat confused by such problems, being helped, of course, by some very 
confusing Greeks. To show that there were difficulties in reasoning about speed 
at the time, Zeno produced a large number of paradoxes, of which we shall men- 
tion one to illustrate his point that there are obvious difficulties in thinking about 
motion. “Listen/’ he says, “to the following argument: Achilles runs 10 times as 
fast as a tortoise, nevertheless he can never catch the tortoise. For, suppose that 
they start in a race where the tortoise is 100 meters ahead of Achilles; then when 
Achilles has run the 100 meters to the place where the tortoise was, the tortoise has 
proceeded 10 meters, having run one-tenth as fast. Now, Achilles has to run 
another 10 meters to catch up with the tortoise, but on arriving at the end of that 
run, he finds that the tortoise is still 1 meter ahead of him ; running another meter, 
he finds the tortoise 10 centimeters ahead, and so on, ad infinitum . Therefore, at 
any moment the tortoise is always ahead of Achilles and Achilles can never catch , 
up with the tortoise.” What is wrong with that? It is that a finite amount of time 
can be divided into an infinite number of pieces, just as a length of line can be 
divided into an infinite number of pieces by dividing repeatedly by two. And so, 
although there are an infinite number of steps (in the argument) to the point at 
which Achilles reaches the tortoise, it doesn't mean that there is an infinite amount 
of time . We can see from this example that there are indeed some subtleties in 
reasoning about speed. 

In order to get to the subtleties in a clearer fashion, we remind you of a joke 
which you surely must have heard. At the point where the lady in the car is caught 
by a cop, the cop comes up to her and says, “Lady, you were going 60 miles an 
hour!” She says, “That’s impossible, sir, I was travelling for only seven minutes. 
It is ridiculous — how can I go 60 miles an hour when I wasn’t going an hour?” 
How would you answer her if you were the cop? Of course, if you were really the 
cop, then no subtleties are involved; it is very simple: you say, “Tell that to the 
judge!” But let us suppose that we do not have that escape and we make a more 
honest, intellectual attack on the problem, and try to explain to this lady what 
we mean by the idea that she was going 60 miles an hour. Just what do we mean? 
We say, “What we mean, lady, is this: if you kept on going the same way as you 
are going now, in the next hour you would go 60 miles.” She could say, “Well, 
my foot was off the accelerator and the car was slowing down, so if I kept on going 
that way it would not go 60 miles.” Or consider the falling ball and suppose we 
want to know its speed at the time three seconds if the ball kept on going the way 
it is going. What does that mean — kept on accelerating , going faster? No — kept 
on going with the same velocity . But that is what we are trying to define! For if 
the ball keeps on going the way it is going, it will just keep on going the way it is 
going. Thus we need to define the velocity better. What has to be kept the same? 
The lady can also argue this way: “If I kept on going the way I’m going for one 
more hour, I would run into that wall at the end of the street!” It is not so easy to 
say what we mean. 

Many physicists think that measurement is the only definition of anything. 
Obviously, then, we should use the instrument that measures the speed — the 
speedometer — and say, “Look, lady, your speedometer reads 60.” So she says, 
“My speedometer is broken and didn’t read at all.” Does that mean the car is 
standing still? We believe that there is something to measure before we build 
the speedometer. Only then can we say, for example, “The speedometer isn’t 
working right,” or “the speedometer is broken.” That would be a meaningless 
sentence if the velocity had no meaning independent of the speedometer. So we 
have in our minds, obviously, an idea that is independent of the speedometer, 
and the speedometer is meant only to measure this idea. So let us see if we can get 
a better definition of the idea. We say, “Yes, of course, before you went an hour, 
you would hit that wall, but if you went one second, you would go 88 feet; lady, 
you were going 88 feet per second, and if you kept on going, the next second it 
would be 88 feet, and the wall down there is farther away than that.” She says, 
“Yes, but there’s no law against going 88 feet per secohd! There is only a law 
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against going 60 miles an hour.” “But,” we reply, “it’s the same thing.” If it is 
the same thing, it should not be necessary to go into this circumlocution about 
88 feet per second. In fact, the falling ball could not keep going the same way 
even one second because it would be changing speed, and we shall have to define 
speed somehow. 

Now we seem to be getting on the right track; it goes something like this: 
If the lady kept on going for another 1/1000 of an hour, she would go 1/1000 of 
60 miles. In other words, she does not have to keep on going for the whole hour; 
the point is that for a moment she is going at that speed. Now what that means 
is that if she went just a little bit more in time, the extra distance she goes would 
be the same as that of a car that goes at a steady speed of 60 miles an hour. Per- 
haps the idea of the 88 feet per second is right; we see how far she went in the last 
second, divide by 88 feet, and if it comes out 1 the speed was 60 miles an hour. 
In other words, we can find the speed in this way: We ask, how far do we go in a 
very short time? We divide that distance by the time, and that gives the speed. 
But the time should be made as short as possible, the shorter the better, because 
some change could take place during that time. If we take the time of a falling 
body as an hour, the idea is ridiculous. If we take it as a second, the result is 
pretty good for a car, because there is not much change in speed, but not for a 
falling body; so in order to get the speed more and more accurately, we should 
take a smaller and smaller time interval. What we should do is take a millionth 
of a second, and divide that distance by a millionth of a second. The result gives 
the distance per second, which is what we mean by the velocity, so we can define 
it that way. That is a successful answer for the lady, or rather, that is the definition 
that we are going to use. 

The foregoing definition involves a new idea, an idea that was not available 
to the Greeks in a general form. That idea was to take an infinitesimal distance 
and the corresponding infinitesimal time , form the ratio, and watch what happens 
to that ratio as the time that we use gets smaller and smaller and smaller. In other 
words, take a limit of the distance travelled divided by the time required, as the 
time taken gets smaller and smaller, ad infinitum. This idea was invented by 
Newton and by Leibnitz, independently, and is the beginning of a new branch 
of mathematics, called the differential calculus . Calculus was invented in order to 
describe motion, and its first application was to the problem of defining what is 
meant by going “60 miles an hour.” 

Let us try to define velocity a little better. Suppose that in a short time, 
€, the car or other body goes a short distance x\ then the velocity, v 9 is defined as 

v = x/e, 

an approximation that becomes better and better as the e is taken smaller and 
smaller. If a mathematical expression is desired, we can say that the velocity 
equals the limit as the e is made to go smaller and smaller in the expression x/e, or 

v = lim - * (8.3) 

We cannot do the same thing with the lady in the car, because the table is in- 
complete. We know only where she was at intervals of one minute; we can get 
a rough idea that she was going 5000 ft/min during the 7th minute, but we do not 
know, at exactly the moment 7 minutes, whether she had been speeding up and the 
speed was 4900 ft/min at the beginning of the 6th minute, and is now 5100 ft/min, 
or something else, because we do not have the exact details in between. So only 
if the table were completed with an infinite number of entries could we really 
calculate the velocity from such a table. On the other hand, when we have a com- 
plete mathematical formula, as in the case of a falling body (Eq. 8.1), then it is 
possible to calculate the velocity, because we can calculate the position at any time 
whatsoever. 

Let us take as an example the problem of determining the velocity of the 
falling ball at the particular time 5 seconds. One way to do this is to see from 
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Table 8-2 what it did in the 5th second; it went 400 — 256 = 144 ft, so it is going 
144 ft/sec; however, that is wrong, because the speed is changing; on the average 
it is 144 ft/sec during this interval, but the ball is speeding up and is really going 
faster than 144 ft/sec. We want to find out exactly how fast . The technique in- 
volved in this process is the following: We know where the ball was at 5 sec. 
At 5.1 sec, the distance that it has gone all together is 16(5. 1) 2 = 416.16 ft (see 
Eq. 8.1). At 5 sec it had already fallen 400 ft; in the last tenth of a second it fell 
416.16 — 400 = 16.16 ft. Since 16.16 ft in 0.1 sec is the same as 161.6 ft/sec, 
that is the speed more or less, but it is not exactly correct. Is that the speed at 
5, or at 5.1, or halfway between at 5.05 sec, or when is that the speed? Never mind 
— the problem was to find the speed at 5 seconds , and we do not have exactly 
that; we have to do a better job. So, we take one-thousandth of a second more than 
5 sec, or 5.001 sec, and calculate the total fall as 

5 = 16(5.001) 2 - 16(25.010001) = 400.160016 ft. 

In the last 0.001 sec the ball fell 0.160016 ft, and if we divide this number by 0.001 
sec we obtain the speed as 160.016 ft/sec. That is closer, very close, but it is 
still not exact . It should now be evident what we must do to find the speed exactly. 
To perform the mathematics we state the problem a little more abstractly: to 
find the velocity at a special time, t 0 , which in the original problem was 5 sec. 
Now the distance at t 0i which we call s 0i is 16ro, or 400 ft in this case. In order 
to find the velocity, we ask, “At the time t 0 + (a little bit), or t 0 + c, where is 
the body?” The new position is 16(/ 0 + «) 2 = 16*o + 32 t 0 e + 16e 2 . So it is 
farther along than it was before, because before it was only 16/ 2 . This distance 
we shall call s 0 + (a little bit more), or s 0 + .x: (if a: is the extra bit). Now if we 
subtract the distance at t 0 from the distance at t 0 + e, we get x , the extra distance 
gone, as x = 32 t G ■ e + 16e 2 . Our first approximation to the velocity is 

V = ~ = 32*0 + 16e. (8.4) 

The true velocity is the value of this ratio, x/e, when e becomes vanishingly small. 
In other words, after forming the ratio, we take the limit as e gets smaller and 
smaller, that is, approaches 0. The equation reduces to, 

v (at time / 0 ) = 32* 0 - 

In our problem, t 0 = 5 sec, so the solution is v — 32 X 5 = 160 ft/sec. A few 
lines above, where we took e as 0.1 and 0.01 sec successively, the value we got for 
v was a little more than this, but now we see that the actual velocity is precisely 
160 ft/sec. 


8-3 Speed as a derivative 


The procedure we have just carried out is performed so often in mathematics 
that for convenience special notations have been assigned to our quantities e and x . 
In this notation, the e used above becomes At and x becomes As. This At means 
“an extra bit of tf and carries an implication that it can be made smaller. The 
prefix A is not a multiplier, any more than sin 6 means s * i * n • 8 — it simply 
defines a time increment, and reminds us of its special character. As has an 
analogous meaning for the distance s. Since A is not a factor, it cannot be can- 
celled in the ratio As/ At to give s/t, any more than the ratio sin 6 / sin 26 can be 
reduced to 1/2 by cancellation. In this notation, velocity is equal to the limit of 
As/ At when At gets smaller, or 


v 


lim ^ • 
0 A/ 


(8.5) 


This is really the same as our previous expression (8.3) with e and x , but it has the 
advantage of showing that something is changing, and it keeps track of what is 
changing. 
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Incidentally, to a good approximation we have another law, which says that 
the change in distance of a moving point is the velocity times the time interval, 
or As = v At. This statement is true only if the velocity is not changing during 
that time interval, and this condition is true only in the limit as At goes to 0. 
Physicists like to write it ds = v dt , because by dt they mean At in circumstances 
in which it is very small; with this understanding, the expression is valid to a close 
approximation. If At is too long, the velocity might change during the interval, 
and the approximation would become less accurate. For a time dt , approaching 
zero, ds = v dt precisely. In this notation we can write (8.5) as 

„ = lim = *. 

At dt 

The quantity ds/dt which we found above is called the “derivative of s with 
respect to f* (this language helps to keep track of what was changed), and the com- 
plicated process of finding it is called finding a derivative, or differentiating. 
The ds’s and df s which appear separately are called differentials. To familiarize 
you with the words, we say we found the derivative of the function 16/ 2 , or the 
derivative (with respect to t) of 16/ 2 is 32 1. When we get used to the words, the 
ideas are more easily understood. For practice, let us find the derivative of a more 
complicated function. We shall consider the formula s = At 3 + Bt + C, which 
might describe the motion of a point. The letters A, B, and C represent constant 
numbers, as in the familiar general form of a quadratic equation. Starting from 
the formula for the motion, we wish to find the velocity at any time. To find 
the velocity in the more elegant manner, we change t to t + At and note that 
s is then changed to s + some As; then we find the As in terms of At. That is to 
say, 


s -J- As = A (t - b A/) 3 + 3(t + At) -f- C 

= At 3 ~b Bt + C + 3 At 2 At + B At + 3^/(A/) 2 + A(At) 3 y 

but since 

s = At 3 T~ Bt -f- C, 

we find that 

As = 3At 2 At + B At + 3At(At) 2 + A (At) 3 . 

But we do not want As — we want As divided by At. We divide the preceding equa- 
tion by A/, getting 


=- = 3 At 2 + B + 3At(At) + A(At) 2 . 


Table 8-3. A Short Table of Derivatives 

s, u y v y w are arbitrary functions of r; a, b, c, and n are arbitrary constants 


Function 


s 


= t 


ds 

~dt 


Derivative 


nt 


n — 1 


s — cu 


S = C 


s 


a b c 
U V W 


ds 

di 

ds 

di 

ds 

di 

ds 

di 


c 


du 

~di 


du dv 

~di^di^ 



0 


( a du . b dv . c dw 
\w dt v dt w dt 
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As At goes toward 0 the limit of As/ At is ds/dt and is equal to 


a - 3AI> + * 

This is the fundamental process of calculus, differentiating functions. The process 
is even more simple than it appears. Observe that when these expansions con- 
tain any term with a square or a cube or any higher power of At, such terms may be 
dropped at once, since they will go to 0 when the limit is taken. After a little prac- 
tice the process gets easier because one knows what to leave out. There are many 
rules or formulas for differentiating various' types of functions. These can be 
memorized, or can be found in tables. A short list is found in Table 8-3. 

8-4 Distance as an integral 

Now we have to discuss the inverse problem. Suppose that instead of a table of 
distances, we have a table of speeds at different times, starting from zero. For the 
falling ball, such speeds and times are shown in Table 8-4. A similar table could 
be constructed for the velocity of the car, by recording the speedometer reading 
every minute or half-minute. If we know how fast the car is going at any time, can 
we determine how far it goes? This problem is just the inverse of the one solved 
above; we are given the velocity and asked to find the distance. How can we find 
the distance if we know the speed ? If the speed of the car is not constant, and the 
lady goes sixty miles an hour for a moment, then slows down, speeds up, and so 
on, how can we determine how far she has gone? That is easy. We use the same 
idea, and express the distance in terms of infinitesimals. Let us say, “In the first 
second her speed was such and such, and from the formula As = v At we can 
calculate how far the car went the first second at that speed.” Now in the next 
second her speed is nearly the same, but slightly different; we can calculate how 
far she went in the next second by taking the new speed times the time. We pro- 
ceed similarly for each second, to the end of the run. We now have a number 
of little distances, and the total distance will be the sum of all these little pieces. 
That is, the distance will be the sum of the velocities times the times, or s = 

At , where the Greek letter £ (sigma) is used to denote addition. To be more 
precise, it is the sum of the velocity at a certain time, let us say the i-th time, 
multiplied by At. 

s — ^v(t t )At. (8.6) 

% 

The rule for the times is that t 1+x — t % + At. However, the distance we obtain 
by this method will not be correct, because the velocity changes during the time 
interval At. If we take the times short enough, the sum is precise, so we take them 
smaller and smaller until we obtain the desired accuracy. The true s is 

5 = lim £ v(ti) At. (8.7) 

A t ~ >0 i 

The mathematicians have invented a symbol for this limit, analogous to the symbol 
for the differential. The A turns into a d to remind us that the time is as small as 
it can be; the velocity is then called v at the time /, and the addition is written 
as a sum with a great “s,” J (from the Latin summa ), which has become distorted 
and is now unfortunately just called an integral sign. Thus we write 

s= (8.8) 

This process of adding all these terms together is called integration, and it is the 
opposite process to differentiation. The derivative of this integral is v 9 so one 
operator (d) undoes the other (J). One can get formulas for integrals by taking 
the formulas for derivatives and running them backwards, because they are re- 
lated to each other inversely. Thus one can work out his own table of integrals 
by differentiating all sorts of functions. For every formula with a differential, 
we get an integral formula if we turn it around. 


Table 8-4 


Velocity of a Falling Ball 


t (sec) 

v (ft/ sec) 

0 

0 

1 

32 

2 

64 

3 

96 

4 

128 
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Every function can be differentiated analytically, i.e., the process can be carried 
out algebraically, and leads to a definite function. But it is not possible in a simple 
manner to write an analytical value for any integral at will. You can calculate it, 
for instance, by doing the above sum, and then doing it again with a finer interval 
At and again with a finer interval until you have it nearly right. In general, given 
some particular function, it is not possible to find, analytically, what the integral 
is. One may always try to find a function which, when differentiated, gives some 
desired function ; but one may not find it, and it may not exist, in the sense of being 
expressible in terms of functions that have already been given names. 


8-5 Acceleration 

The next step in developing the equations of motion is to introduce another 
idea which goes beyond the concept of velocity to that of change of velocity, 
and we now ask, “How does the velocity change ?” In previous chapters we have 
discussed cases in which forces produce changes in velocity. You may have heard 
with great excitement about some car that can get from rest to 60 miles an hour 
in ten seconds flat. From such a performance we can see how fast the speed 
changes, but only on the average. What we shall now discuss is the next level of 
complexity, which is how fast the velocity is changing. In other words, by how 
many feet per second does the velocity change in a second, that is, how many feet 
per second, per second? We previously derived the formula for the velocity of 
a falling body as v = 32/, which is charted in Table 8-4, and now we want to 
find out how much the velocity changes per second; this quantity is called the 
acceleration. 

Acceleration is defined as the time rate of change of velocity. From the 
preceding discussion we know enough already to write the acceleration as the 
derivative dv/dt, in the same way that the velocity is the derivative of the distance. 
If we now differentiate the formula v = 32 1 we obtain, for a falling body, 

« = % = 32. (8.9) 

[To differentiate the term 32/ we can utilize the result obtained in a previous 
problem, where we found that the derivative of Bt is simply B (a constant). So 
by letting B = 32, we have at once that the derivative of 32/ is 32.] This means 
that the velocity of a falling body is changing by 32 feet per second, per second 
always. We also see from Table 8-4 that the velocity increases by 32 ft/sec in 
each second. This is a very simple case, for accelerations are usually not constant. 
The reason the acceleration is constant here is that the force on the falling body 
is constant, and Newton’s law says that the acceleration is proportional to the force. 

As a further example, let us find the acceleration in the problem we have 
already solved for the velocity. Starting with 

s = At 3 + Bt + C 


we obtained, for v — ds/dt , 

v = 3 At 2 + B . 

Since acceleration is the derivative of the velocity with respect to the time, we need 
to differentiate the last expression above. Recall the rule that the derivative of the 
two terms on the right equals the sum of the derivatives of the individual terms. 
To differentiate the first of these terms, instead of going through the fundamental 
process again we note that we have already differentiated a quadratic term when 
we differentiated 16/ 2 , and the effect was to double the numerical coefficient and 
change the t 2 to /; let us assume that the same thing will happen this time, and you 
can check the result yourself. The derivative of 3^4/ 2 will then be 6 At. Next we 
differentiate B y a constant term; but by a rule stated previously, the derivative of 
B is zero; hence this term contributes nothing to the acceleration. The final 
result, therefore, is a = dv/dt = 6 At. 
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For reference, we state two very useful formulas, which can be obtained by 
integration. If a body starts from rest and moves with a constant acceleration, 
g , its velocity v at any time t is given by 


v = gt. 


The distance it covers in the same time is 

5 = igt 2 . 

Various mathematical notations are used in writing derivatives. Since velocity 
is ds/dt and acceleration is the time derivative of the velocity, we can also write 



which are common ways of writing a second derivative. 

We have another law that the velocity is equal to the integral of the accelera- 
tion. This is just the opposite of a — dvjdt\ we have already seen that distance is 
the integral of the velocity, so distance can be found by twice integrating the ac- 
celeration. 

In the foregoing discussion the motion was in only one dimension, and space 
permits only a brief discussion of motion in three dimensions. Consider a particle 
P which moves in three dimensions in any manner whatsoever. At the beginning 
of this chapter, we opened our discussion of the one-dimensional case of a moving 
car by observing the distance of the car from its starting point at various times. 
We then discussed velocity in terms of changes of these distances with time, and 
acceleration in terms of changes in velocity. We can treat three-dimensional motion 
analogously. It will be simpler to illustrate the motion on a two-dimensional 
diagram, and then extend the ideas to three dimensions. We establish a pair of 
axes at right angles to each other, and determine the position of the particle at any 
moment by measuring how far it is from each of the two axes. Thus each position 
is given in terms of an x-distance and a y-distance, and the motion can be described 
by constructing a table in which both these distances are given as functions of time. 
(Extension of this process to three dimensions requires only another axis, at right 
angles to the first two, and measuring a third distance, the z-distance. The dis- 
tances are now measured from coordinate planes instead of lines.) Having con- 
structed a table with x- and y-distances, how can we determine the velocity? 
We first find the components of velocity in each direction. The horizontal part of 
the velocity, or x-component, is the derivative of the x-distance with respect to 
the time, or 

v x = dx/dt. (8.11) 


Similarly, the vertical part of the velocity, or y-component, is 


v y = dy/dt. (8.12) 

In the third dimension, 

v 3 = dz/dt . (8.13) 

Now, given the components of velocity, how can we find the velocity along the 

actual path of motion? In the two-dimensional case, consider two successive 
positions of the particle, separated by a short distance As and a short time in- 
terval / 2 — ti = A/. In the time At the particle moves horizontally a distance 
Ax ~ v x At, and vertically a distance Ay ~ v y A t. (The symbol is read 
“is approximately.”) The actual distance moved is approximately 

As ~ x/(Ax) 2 + (Ay) 2 , (8.14) 

as shown in Fig. 8-3. The approximate velocity during this interval can be obtained 
by dividing by At and by letting At go to 0, as at the beginning of the chapter. 
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Fig. 8-3. Description of the motion 
of a body in two dimensions and the 
computation of its velocity. 



We then get the velocity as 


V = ~ = V(dx/dt)* + (dy/dt*) = s/vl + vl (8.15) 

For three dimensions the result is 

v = y/v\ + v$ + v\. (8.16) 


In the same way as we defined velocities, we can define accelerations: we have 
an ^-component of acceleration a x , which is the derivative of v x , the ^-component 
of the velocity (that is, a x = d 2 x/dt 2 , the second derivative of x with respect to 
t ), and so on. 

Let us consider one nice example of compound motion in a plane. We shall 
take a motion in which a ball moves horizontally with a constant velocity u , and 
at the same time goes vertically downward with a constant acceleration — g; 
what is the motion? We can say dx/dt = v x = w. Since the velocity v x is constant, 

x — wr, (8.17) 



Fig. 8-4. The parabola described by 
a falling body with an initial horizontal 
velocity. 


and since the downward acceleration —g is constant, the distance y the object 
falls can be written as 


y = -\gt 2 . 


(8.18) 


What is the curve of its path, i.e., what is the relation between y and xl We can 
eliminate t from Eq. (8.18), since t = x/u. When we make this substitution we 
find that 


y = 


- S - * 2 

2 a* x * 


(8.19) 


This relation between y and x may be considered as the equation of the path of 
the moving ball. When this equation is plotted we obtain a curve that is called a 
parabola; any freely falling body that is shot out in any direction will travel in 
a parabola, as shown in Fig. 8-4. 
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Newton 9 s laws of Mfynamies 


9-1 Momentum and force 

The discovery of the laws of dynamics, or the laws of motion, was a dramatic 
moment in the history of science. Before Newton’s time, the motions of things 
like the planets were a mystery, but after Newton there was complete under- 
standing. Even the slight deviations from Kepler’s laws, due to the perturbations 
of the planets, were computable. The motions of pendulums, oscillators with 
springs and weights in them, and So on, could all be analyzed completely after 
Newton’s laws were enunciated. So it is with this chapter: before this chapter we 
could not calculate how a mass on a spring would move; much less could we 
calculate the perturbations on the planet Uranus due to Jupiter and Saturn. After 
this chapter we will be able to compute not only the motion of the oscillating mass, 
but also the perturbations on the planet Uranus produced by Jupiter and Saturn! 

Galileo made a great advance in the understanding of motion when he 
discovered the principle of inertia: if an object is left alone, is not disturbed, it 
continues to move with a constant velocity in a straight line if it was originally 
moving, or it continues to stand still if it was just standing still. Of course this 
never appears to be the case in nature, for if we slide a block across a table it stops, 
but that is because it is not left to itself — it is rubbing against the table. It required 
a certain imagination to find the right rule, and that imagination was supplied 
by Galileo. 

Of course, the next thing which is needed is a rule for finding how an object 
changes its speed if something is affecting it. That is the contribution of Newton. 
Newton wrote down three laws: The First Law was a mere restatement of the 
Galilean principle of inertia just described. The Second Law gave a specific way 
of determining how the velocity changes under different influences called forces. 
The Third Law describes the forces to some extent, and we shall discuss that at 
another time. Here we shall discuss only the Second Law, which asserts that the 
motion of an object is changed by forces in this way: the time-r at e-of -change of a 
quantity called momentum is proportional to the force . We shall state this mathe- 
matically shortly, but let us first explain the idea. 

Momentum is not the same as velocity. A lot of words are used in physics, 
and they all have precise meanings in physics, although they may not have such 
precise meanings in everyday language. Momentum is an example, and we must 
define it precisely. If we exert a certain push with our arms on an object that is 
light, it moves easily; if we push just as hard on another object that is much heavier 
in the usual sense, then it moves much less rapidly. Actually, we must change the 
words from “light*’ and “heavy” to less massive and more massive , because there 
is a difference to be understood between the weight of an object and its inertia. 
(How hard it is to get it going is one thing, and how much it weighs is something 
else.) Weight and inertia are proportional , and on the earth’s surface are often 
taken to be numerically equal, which causes a certain confusion to the student. 
On Mars, weights would be different but the amount of force needed to overcome 
inertia would be the same. 

We use the term mass as a quantitative measure of inertia, and we may 
measure mass, for example, by swinging an object in a circle at a certain speed and 
measuring how much force we need to keep it in the circle. In this way we find a 
certain quantity of mass for every object. Now the momentum of an object is a 
product of two parts: its mass and its velocity. Thus Newton’s Second Law may 
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be written mathematically this way: 


F = j t (mv). (9.1) 

Now there are several points to be considered. In writing down any law such as 
this, we use many intuitive ideas, implications, and assumptions which are at 
first combined approximately into our “law.” Later we may have to come back 
and study in greater detail exactly what each term means, but if we try to do this 
too soon we shall get confused. Thus at the beginning we take several things for 
granted. First, that the mass of an object is constant ; it isn’t really, but we shall 
start out with the Newtonian approximation that mass is constant, the same all 
the time, and that, further, when we put two objects together, their masses add. 
These ideas were of course implied by Newton when he wrote his equation, for 
otherwise it is meaningless. For example, suppose the mass varied inversely as the 
velocity ; then the momentum would never change in any circumstance, so the law 
means nothing unless you know how the mass changes with velocity. At first 
we say, it does not change. 

Then there are some implications concerning force. As a rough approximation 
we think of force as a kind of push or pull that we make with our muscles, but 
we can define it more accurately now that we have this law of motion. The most 
important thing to realize is that this relationship involves not only changes in 
the magnitude of the momentum or of the velocity but also in their direction. 
If the mass is constant, then Eq. (9.1) can also be written as 

F = m = ma. (9.2) 



object 


The acceleration a is the rate of change of the velocity, and Newton’s Second 
Law says more than that the effect of a given force varies inversely as the mass; 
it says also that the direction of the change in the velocity and the direction of the 
force are the same. Thus we must understand that a change in a velocity, or an 
acceleration, has a wider meaning than in common language: The velocity of a 
moving object can change by its speeding up, slowing down (when it slows down, 
we say it accelerates with a negative acceleration), or changing its direction of 
motion. An acceleration at right angles to the velocity was discussed in Chapter 7. 
There we saw that an object moving in a circle of radius R with a certain speed v 
along the circle falls away from a straightline path by a distance equal to %(v 2 /R)t 2 
if t is very small. Thus the formula for acceleration at right angles to the motion is 

a = i > 2 /R, (9.3) 

and a force at right angles to the velocity will cause an object to move in a curved 
path whose radius of curvature can be found by dividing the force by the mass to 
get the acceleration, and then using (9.3). 


9-2 Speed and velocity 

In order to make our language more precise, we shall make one further 
definition in our use of the words speed and velocity. Ordinarily we think of speed 
and velocity as being the same, and in ordinary language they are the same. But in 
physics we have taken advantage of the fact that there are two words and have 
chosen to use them to distinguish two ideas. We carefully distinguish velocity, 
which has both magnitude and direction, from speed, which we choose to mean 
the magnitude of the velocity, but which does not include the direction. We can 
formulate this more precisely by describing how the x-, y-, and z-coordinates of 
an object change with time. Suppose, for example, that at a certain instant an 
object is moving as shown in Fig. 9-1. In a given small interval of time At it 
will move a certain distance Ax in the x-direction, Ay in the y-direction, and Az in 
the z-direction. The total effect of these three coordinate changes is a displacement 
As along the diagonal of a parallelepiped whose sides are Ax, Ay, and Az. In terms 
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of the velocity, the displacement Ax is the ^-component of the velocity times At, 
and similarly for Ay and Az: 

Ax = v x At, Ay = v y At, Az = v z At. (9.4) 

9-3 Components of velocity, acceleration, and force 

In Eq. (9.4) we have resolved the velocity into components by telling how fast the 
object is moving in the ^-direction, the y-direction, and the z-direction, The 
velocity is completely specified, both as to magnitude and direction, if we give the 
numerical values of its three rectangular components: 

v x ~ dx/dt , v y — dy/dt, v z — dz/dt . (9.5) 

On the other hand, the speed of the object is 

ds/dt = \v\ = (9.6) 

Next, suppose that, because of the action of a force, the velocity changes to 
some other direction and a different magnitude, as shown in Fig. 9-2. We can 
analyze this apparently complex situation rather simply if we evaluate the changes 
in the x-, y-, and z-components of velocity. The change in the component of the 
velocity in the ^-direction in a time At is Av x = a x At, where a x is what we call the 
x-component of the acceleration. Similarly, we see that Av y = a v At and Av z = 
a z A?. In these terms, we see that Newton’s Second Law, in saying that the force 
is in the same direction as the acceleration, is really three laws, in the sense that 
the component of the force in the x-, y- 9 or z-direction is equal to the mass times 
the rate of change of the corresponding component of velocity: 

F x — m(dv x /dt) — m(d 2 x/dt 2 ) = ma x , 

F v = m(dv y /dt ) = m(d 2 y/dt 2 ) = ma y , (9.7) 

F z = m{dv z /dt) = m{d 2 z/dt 2 ) = ma z . 

Just as the velocity and acceleration have been resolved into components by 

projecting a line segment representing the quantity and its direction onto three 

coordinate axes, so, in the same way, a force in a given direction is represented 
by certain components in the x-, y-, and z-directions: 

F x = Fcos (x, F), 

Fy = F cos (y,F), (9.8) 

F z = F cos (z, F), 

where F is the magnitude of the force and (x, F) represents the angle between the 
x-axis and the direction of F, etc. 

Newton’s Second Law is given in complete form in Eq. (9.7). If we know the 
forces on an object and resolve them into x-, y-, and z-components, then we can 
find the motion of the object from these equations. Let us consider a simple 
example. Suppose there are no forces in the y- and z-directions, the only force 
being in the x-direction, say vertically. Equation (9.7) tells us that there would be 
changes in the velocity in the vertical direction, but no changes in the horizontal 
direction. This was demonstrated with a special apparatus in Chapter 7 (see 
Fig. 7-3). A falling body moves horizontally without any change in horizontal 
motion, while it moves vertically the same way as it would move if the horizontal 
motion were zero. In other words, motions in the x-, y-, and z-directions are 
independent if the forces are not connected. 

9-4 What is the force? 

In order to use Newton’s laws, we have to have some formula for the force; 
these laws say pay attention to the forces. If an object is accelerating, some agency 
is at work; find it. Our program for the future of dynamics must be t o find the 
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Fig. 9-2. A change in velocity in 
which both the magnitude and direction 
change. 



Fig. 9- 


laws for the force. Newton himself went on to give some examples. In the case 
of gravity he gave a specific formula for the force. In the case of other forces he 
gave some part of the information in his Third Law, which we will study in the 
next chapter, having to do with the equality of action and reaction. 

Extending our previous example, what are the forces on objects near the 
earth’s surface? Near the earth’s surface, the force in the vertical direction due 
to gravity is proportional to the mass of the object and is nearly independent of 
height for heights small compared with the earth’s radius R : F = GmM/R 2 = mg , 
where g — GM/R 2 is called the acceleration of gravity. Thus the law of gravity 
tells us that weight is proportional to mass; the force is in the vertical direction 
and is the mass times g. Again we find that the motion in the horizontal direction 
is at constant velocity. The interesting motion is in the vertical direction, and 
Newton’s Second Law tells us 

mg = m(d 2 x/dt 2 ). (9.9) 

Cancelling the m’s, we find that the acceleration in the x-direction is constant and 
equal to g. This is of course the well known law of free fall under gravity, which 
leads to the equations 

Vx = Vo + gt , 

X = x 0 + V 0 t + \gt 2 . (9.10) 

As another example, let us suppose that we have been able to build a gadget 
(Fig. 9-3) which applies a force proportional to the distance and directed oppositely 
— a spring. If we forget about gravity, which is of course balanced out by the 
initial stretch of the spring, and talk only about excess forces, we see that if we 
pull the mass down, the spring pulls up, while if we push it up the spring pulls 
down. This machine has been designed carefully so that the force is greater, the 
more we pull it up, in exact proportion to the displacement from the balanced 
condition, and the force upward is similarly proportional to how far we pull down. 
If we watch the dynamics of this machine, we see a rather beautiful motion — up, 
down, up, down, . . . The question is, will Newton’s equations correctly describe 
this motion ? Let us see whether we can exactly calculate how it moves with this 
periodic oscillation, by applying Newton’s law (9.7). In the present instance, 
the equation is 

— kx = m(dv x /dt). (9.11) 

Here we have a situation where the velocity in the x-direction changes at a rate 
proportional to x. Nothing will be gained by retaining numerous constants, so 
we shall imagine either that the scale of time has changed or that there is an 
accident in the units, so that we happen to have k/m = 1. Thus we shall try to 
solve the equation 

dv x /dt = —x. (9.12) 

To proceed, we must know what v x is, but of course we know that the velocity is 
the rate of change of the position. 

9-5 Meaning of the dynamical equations 

Now let us try to analyze just what Eq. (9.12) means. Suppose that at a 
given time t the object has a certain velocity v x and position x. What is the velocity 
and what is the position at a slightly later time / + e? If we can answer this 
question our problem is solved, for then we can start with the given condition and 
compute how it changes for the first instant, the next instant, the next instant, and 
so on, and in this way we gradually evolve the motion. To be specific, let us suppose 
that at the time / = 0 we are given that x = 1 and v x = 0. Why does the object 
move at all? Because there is a force on it when it is at any position except x — 0. 
If x > 0, that force is upward. Therefore the velocity which is zero starts to 
change, because of the law of motion. Once it starts to build up some velocity 
the object starts to move up, and so on. Now at any time f, if e is very small, 
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we may express the position at time t + e in terms of the position at time t and 
the velocity at time t to a very good approximation as 

*(/ + e) = x(t) 4- ev x (t). (9.13) 

The smaller the e, the more accurate this expression is, but it is still usefully accurate 
even if e is not vanishingly small. Now what about the velocity? In order to get 
the velocity later, the velocity at the time / + e, we need to know how the velocity 
changes, the acceleration. And how are we going to find the acceleration? That 
is where the law of dynamics comes in. The law of dynamics tells us what the 
acceleration is. It says the acceleration is — *. 

v x (t + e) = v x (t) + ea x (t) (9.14) 

= v x (t) — €x(t). (9.15) 

Equation (9.14) is merely kinematics; it says that a velocity changes because of 
the presence of acceleration. But Eq. (9.15) is dynamics , because it relates the 
acceleration to the force; it says that at this particular time for this particular 
problem, you can replace the acceleration by — *(/). Therefore, if we know both 
the x and v at a given time, we know the acceleration, which tells us the new 
velocity, and we know the new position — this is how the machinery works. The 
velocity changes a little bit because of the force, and the position changes a little 
bit because of the velocity. 


9-6 Numerical solution of the equations 

Now let us really solve the problem. Suppose that we take e = 0.100 sec. 
After we do all the work if we find that this is not small enough we may have to 
go back and do it again with € = 0.010 sec. Starting with our initial value *(0) = 
1.00, what is *(0.1)? It is the old position *(0) plus the velocity (which is zero) 
times 0.10 sec. Thus *(0.1) is still 1.00 because it has not yet started to move. 
But the new velocity at 0.10 sec will be the old velocity v(0) = 0 plus € times the 
acceleration. The acceleration is — *(0) = —1.00. Thus 


Now at 0.20 sec 


and 


K0.1) - 0.00 - 0.10 X 1.00 = -0.10. 

*(0.2) = *(0.1) + ei>(0.1) 

= 1.00 - 0.10 X 0.10 = 0.99 

r(0.2) = ^(0.1) + ea(0.1) 

= -0.10 - 0.10 X 1.00 = -0.20. 


And so, on and on and on, we can calculate the rest of the motion, and that is 
just what we shall do. However, for practical purposes there are some little tricks 
by which we can increase the accuracy. If we continued this calculation as we have 
started it, we would find the motion only rather crudely because e — 0.100 sec 
is rather crude, and we would have to go to a very small interval, say e = 0.01. 
Then to go through a reasonable total time interval would take a lot of cycles of 
computation. So we shall organize the work in a way that will increase the pre- 
cision of our calculations, using the same coarse interval e — 0.10 sec. This can 
be done if we make a subtle improvement in the technique of the analysis. 

Notice that the new position is the old position plus the time interval e times 
the velocity. But the velocity when? The velocity at the beginning of the time 
interval is one velocity and the velocity at the end of the time interval is another 
velocity. Our improvement is to use the velocity halfway between. If we know 
the speed now, but the speed is changing, then we are not going to get the right 
answer by going at the same speed as now. We should use some speed between 
the “now” speed and the “then” speed at the end of the interval. The same 
considerations also apply to the velocity: to compute the velocity changes, we 
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Table 9-1 

Solution of dv x ldt = — x 
Interval: e = 0.10 sec 



Fig. 9-4. Graph of the motion of a 
mass on a spring. 


should use the acceleration midway between the two times at which the velocity 
is to be found. Thus the equations that we shall actually use will be something 
like this: the position later is equal to the position before plus e times the velocity 
at the time in the middle of the interval Similarly, the velocity at this halfway point 
is the velocity at a time € before (which is in the middle of the previous interval) 
plus e times the acceleration at the time t. That is, we use the equations 

x(t + e) = x(0 + ev(t + e/2), 
v(t + e/2) = v(t - e/2) + ea(t), (9.16) 

a(t) = - x{t ) . 

There remains only one slight problem: what is v(e/2)? At the start, we are given 
f(0), not v(— e/2). To get our calculation" started, we shall use a special equation, 
namely, v(e/2) = ^(0) + (e/2)a(0). 

Now we are ready to carry through our calculation. For convenience, we 
may arrange the work in the form of a table, with columns for the time, the position, 
the velocity, and the acceleration, and the in-between lines for the velocity, as 
shown in Table 9-1 . Such a table is, of course, just a convenient way of representing 
the numerical values obtained from the set of equations (9.16), and in fact the 
equations themselves need never be written. We just fill in the various spaces in 
the table one by one. This table now gives us a very good idea of the motion: 
it starts from rest, first picks up a little upward (negative) velocity and it loses 
some of its distance. The acceleration is then a little bit less but it is still gaining 
speed. But as it goes on it gains speed more and more slowly, until as it passes 
x = 0 at about t = 1.50 sec we can confidently predict that it will keep going, 
but now it will be on the other side; the position x will become negative, the ac- 
celeration therefore positive. Thus the speed decreases. It is interesting to compare 
these numbers with the function x = cos t , which is done in Fig. 9-4. The agree- 
ment is within the three significant figure accuracy of our calculation! We shall 
see later that x — cos t is the exact mathematical solution of our equation of 
motion, but it is an impressive illustration of the power of numerical analysis that 
such an easy calculation should give such precise results. 

9-7 Planetary motions 

The above analysis is very nice for the motion of an oscillating spring, but 
can we analyze the motion of a planet around the sun? Let us see whether we 
can arrive at an approximation to an ellipse for the orbit. We shall suppose that 
the sun is infinitely heavy, in the sense that we shall not include its motion. Suppose 
a planet starts at a certain place and is moving with a certain velocity; it goes 
around the sun in some curve, and we shall try to analyze, by Newton’s laws of 
motion and his law of gravitation, what the curve is. How? At a given moment 
it is at some position in space. If the radial distance from the sun to this position 
is called r , then we know that there is a force directed inward which, according to 
the law of gravity, is equal to a constant times the product of the sun’s mass and 
the planet’s mass divided by the square of the distance. To analyze this further 
we must find out what acceleration will be produced by this force. We shall need 
the components of the acceleration along two directions, which we call x and y. 
Thus if we specify the position of the planet at a given moment by giving x and y 
(we shall suppose that z is always zero because there is no force in the z-direction 
and, if there is no initial velocity v Z9 there will be nothing to make z other than 
zero), the force is directed along the line joining the planet to the sun, as shown 
in Fig. 9-5. 

From this figure we see that the horizontal component of the force is related 
to the complete force in the same manner as the horizontal distance x is to the 
complete hypotenuse r, because the two triangles are similar. Also, if x is positive, 
F x is negative. That is, F X /|F| = — x/r, or F x = — |F|x/r = —GMmx/r' s . Now 
we use the dynamical law to find that this force component is equal to the mass of 
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the planet times the rate of change of its velocity in the ^-direction. Thus we find 
the following laws: 

m(dv x /dt) = — GMmx/r 3 , 

m(dv y /dt) = - GMmy/r 3 , (9.17) 

r = Vx 2 + y 2 . 


This, then, is the set of equations we must solve. Again, in order to simplify the 
numerical work, we shall suppose that the unit of time, or the mass of the sun, has 
been so adjusted (or luck is with us) that GM = 1. For our specific example we 
shall suppose that the initial position of the planet is at x = 0.500 and y == 0.000, 
and that the velocity is all in the y-direction at the start, and is of magnitude 
1.6300. Now how do we make the calculation? We again make a table with 
columns for the time, the ^-position, the ^-velocity v Xi and the ^-acceleration a x \ 
then, separated by a double line, three columns for position, velocity, and accelera- 
tion in the y-direction. In order to get the accelerations we are going to need 
Eq. (9.17); it tells us that the acceleration in the x-direction is — x/r 3 , and the 
acceleration in the y-direction is — y/r 3 , and that r is the square root of x 2 + y 2 . 
Thus, given x and y, we must do a little calculating on the side, taking the square 
root of the sum of the squares to find r and then, to get ready to calculate the two 
accelerations, it is useful also to evaluate 1/r 3 . This work can be done rather 
easily by using a table of squares, cubes, and reciprocals: then we need only 
multiply x by 1/r 3 , which we do on a slide rule. 

Our calculation thus proceeds by the following steps, using time intervals 
e = 0.100: Initial values at t ~ 0: 



Fig. 9-5. The force of gravity on a 
planet. 


From these we find: 


x(0) = 0.500 
v x (0) = 0.000 


r(0) = 0.500 

a x = -4.000 


y( 0) = 0.000 

v v (0) = +1.630 

l/r 3 (0) - 8.000 
a v = 0.000 


Thus we may calculate the velocities ^(0.05) and ^(0.05): 

v x (0.05) - 0.000 - 4.000 X 0.050 = -0.200; 
^(0.05) = 1.630 + 0.000 X 0.100 = 1.630. 

Now our main calculations begin: 


x(0.1) = 0.500 - 0.20 X 0.1 

= 

0.480 

7(0.1) = 0.0 + 1.63 X 0.1 

= 

0.163 

r = V0.480 2 + 0.163 2 

= 

0.507 

1/r 3 = 7.67 



a*(0.1) = 0.480 X 7.67 

= 

-3.68 

a„(0.1) = -0.163 X 7.67 

— 

-1.256 

y x (0.15) = -0.200 - 3.68 X 0.1 

= 

-0.568 

tv(0.15) = 1.630 - 1.26 X 0.1 

= 

1.505 

x(0.2) = 0.480 - 0.568 X 0.1 

— 

0.423 

7(0.2) = 0.163 + 1.50 X 0.1 

= 

0.313 


etc. 



Fig. 9-6. The calculated motion of a 
planet around the sun. 


In this way we obtain the values given in Table 9-2, and in 20 steps or so we have 
chased the planet halfway around the sun! In Fig. 9-6 are plotted the x- and 
y-coordinates given in Table 9-2. The dots represent the positions at the succession 
of times a tenth of a unit apart; we see that at the start the planet moves rapidly 
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Table 9-2 





Solution of dv x /dt = —x/. 

r 3 , dv y /dt 

= -y/r\ 

II 

* 

+ y 2 . 





Interval: e = 

0.100 





Orbit 

Vy = 

1.63 v x - 

= 0 x = 

0.5 y = 

0 at / = 

= 0 


t 

X 

V x 

a x 

y 


a y 

r 

l/r 3 

0.0 

0.500 

- 0.200 

- 4.00 

0.000 

1.630 

0.00 

0.500 

8.000 

0.1 

0.480 

- 0.568 

- 3.68 

0.163 

1.505 

- 1.25 

0.507 

7.675 

0.2 

0.423 

- 0.859 

- 2.91 

0.313 

1.290 

- 2.15 

0.526 

6.873 

0.3 

0.337 

- 1.055 

- 1.96 

0.442 

1.033 

- 2.57 

0.556 

5.824 

0.4 

0.232 

- 1.166 

- 1.11 

0.545 

0.771 

- 2.62 

0.592 

4.81 

0.5 

0.115 


- 0.453 

0.622 


- 2.45 

0.633 

3.942 


— 

- 1 . 211 - 



- 0 . 526 - 




0.6 

- 0.006 

- 1.209 

+ 0.020 

0.675 

0.306 

- 2.20 

0.675 

3.252 

0.7 

- 0.127 

- 1.175 

+ 0.344 

0.706 

0.115 

- 1.91 

0.717 

2.712 

0.8 

- 0.245 

- 1.119 

+ 0.562 

0.718 

- 0.049 

- 1.64 

0.758 

2.296 

0.9 

- 0.357 

- 1.048 

+ 0.705 

0.713 

- 0.190 

- 1.41 

0.797 

1.975 

1.0 

- 0.462 


+ 0.796 

0.694 


- 1.20 

0.834 

1.723 


— 

- 0 . 968 - 



- - 0 . 310 - 




1.1 

- 0.559 

- 0.882 

+ 0.858 

0.663 

- 0.412 

- 1.02 

0.867 

1.535 

1.2 

- 0.647 

- 0.792 

+ 0.90 

0.622 

- 0.499 

- 0.86 

0.897 

1.385 

1.3 

- 0.726 

- 0.700 

+ 0.92 

0.572 

- 0.570 

- 0.72 

0.924 

1.267 

1.4 

- 0.796 

- 0.607 

+ 0.93 

0.515 

- 0.630 

- 0.60 

0.948 

1.173 

1.5 

- 0.857 


+ 0.94 

0.452 


- 0.50 

0.969 

1.099 


— 

- 0 . 513 - 



- - 0 . 680 - 




1.6 

- 0.908 

- 0.418 

+ 0.95 

0.384 

- 0.720 

- 0.40 

0.986 

1.043 

1.7 

- 0.950 

- 0.323 

+ 0.95 

0.312 

- 0.751 

- 0.31 

1.000 

1.000 

1.8 

- 0.982 

- 0.228 

+ 0.95 

0.237 

- 0.773 

- 0.23 

1.010 

0.970 

1.9 

- 1.005 

- 0.113 

+ 0.95 

0.160 

- 0.778 

- 0.15 

1.018 

0.948 

2.0 

- 1.018 


+ 0.96 

0.081 


- 0.08 

1.021 

0.939 




- 0 . 037 - 



- - 0 . 796 - 




2.1 

- 1.022 

+ 0.058 

+ 0.95 

0.001 

- 0.796 

0.00 

1.022 

0.936 

2.2 

- 1.016 


+ 0.96 

- 0.079 

- 0.789 

+ 0.07 

1.019 

0.945 

2.3 










Crossed x-axis at 2.101 sec, .*. 

period = 

4.20 sec. 





v x = 0 at 2.086 sec. 








Cross x at 1.022, 

. . . 1.022 + 0.500 

semimajor axis = — 0.761. 



v y = 0.796. 









Predicted time tt(0.761) 3/2 = 

tt(0.663) 

= 2.082. 
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and at the end it moves slowly, and so the shape of the curve is determined. Thus 
we see that we really do know how to calculate the motion of planets! 

Now let us see how we can calculate the motion of Neptune, Jupiter, Uranus, 
or any other planet. If we have a great many planets, and let the sun move too, 
can we do the same thing? Of course we can. We calculate the fprce on a particular 
planet, let us say planet number U which has a position x u y u z t ( i = 1 may repre- 
sent the sun, i = 2 Mercury, i = 3 Venus, and so on). We must know the positions 
of all the planets. The force acting on one is due to all the other bodies which 
are located, let us say, at positions x 3 , y 3 , z 3 . Therefore the equations are 


dv lx 

1 dt 

dl) iy 

' l dt 


E 


3=1 


E 


j = i 




dv , 


dt 


E 


3 = 1 


Gm l m 3 {x l — x 3 ) 
^ ’ 

Gm l m ) (y l - y 3 ) > 

r 3 

r l3 

Gm t m 3 (z t — Zj) 


(9.18) 


Further, we define r %3 as the distance between the two planets i and j; this is equal to 
r% 3 = V (x t - x 3 ) 2 + O, - y 3 ) 2 + (z % - z 3 ) 2 . (9.19) 

Also, £ means a sum over all values of j — all other bodies — except, of course, 
for j = L Thus all we have to do is to make more columns, lots more columns. 
We need nine columns for the motions of Jupiter, nine for the motions of Saturn, 
and so on. Then when we have all initial positions and velocities we can calculate 
all the accelerations from Eq. (9.18) by first calculating all the distances, using 
Eq. (9.19). How long will it take to do it? If you do it at home, it will take a 
very long time! But in modern times we have machines which do arithmetic very 
rapidly; a very good computing machine may take 1 microsecond, that is, a 
millionth of a second, to do an addition. To do a multiplication takes longer, 
say 10 microseconds. It may be that in one cycle of calculation, depending on 
the problem, we may have 30 multiplications, or something like that, so one cycle 
will take 300 microseconds. That means that we can do 3000 cycles of computation 
per second. In order to get an accuracy, of, say, one part in a billion, we would 
need 4 X 10 5 cycles to correspond to one revolution of a planet around the sun. 
That corresponds to a computation time of 130 seconds or about two minutes. 
Thus it take only two minutes to follow Jupiter around the sun, with all the 
perturbations of all the planets correct to one part in a billion, by this method! 
(It turns out that the error varies about as the square of the interval e. If we make 
the interval a thousand times smaller, it is a million times more accurate. So, let 
us make the interval 10,000 times smaller.) 

So, as we said, we began this chapter not knowing how to calculate even the 
motion of a mass on a spring. Now, armed with the tremendous power of Newton’s 
laws, we can not only calculate such simple motions but also, given only a machine 
to handle the arithmetic, even the tremendously complex motions of the planets, 
to as high a degree of precision as we wish! 
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10 


Conservation of Momentum 


10-1 Newton’s Third Law 

On the basis of Newton’s second law of motion, which gives the relation 
between the acceleration of any body and the force acting on it, any problem in 
mechanics can be solved in principle. For example, to determine the motion of a 
few particles, one can use the numerical method developed in the preceding chapter. 
But there are good reasons to make a further study of Newton’s laws. First, there 
are quite simple cases of motion which can be analyzed not only by numerical 
methods, but also by direct mathematical analysis. For example, although we 
know that the acceleration of a falling body is 32 ft/sec 2 , and from this fact could 
calculate the motion by numerical methods, it is much easier and more satisfactory 
to analyze the motion and find the general solution, s = s 0 + M + 16/ 2 . In 
the same way, although we can work out the positions of a harmonic oscillator by 
numerical methods, it is also possible to show analytically that the general solution 
is a simple cosine function of /, and so it is unnecessary to go to all that arithmetical 
trouble when there is a simple and more accurate way to get the result. In the 
same manner, although the motion of one body around the sun, determined by 
gravitation, can be calculated point by point by the numerical methods of Chapter 
9, which show the general shape of the orbit, it is nice also to get the exact shape, 
which analysis reveals as a perfect ellipse. 

Unfortunately, there are really very few problems which can be solved exactly 
by analysis. In the case of the harmonic oscillator, for example, if the spring force 
is not proportional to the displacement, but is something more complicated, one 
must fall back on the numerical method. Or if there are two bodies going around 
the sun, so that the total number of bodies is three, then analysis cannot produce a 
simple formula for the motion, and in practice the problem must be done numeri- 
cally. That is the famous three-body problem, which so long challenged human 
powers of analysis; it is very interesting how long it took people to appreciate 
the fact that perhaps the powers of mathematical analysis were limited and it 
might be necessary to use the numerical methods. Today an enormous number of 
problems that cannot be done analytically are solved by numerical methods, and 
the old three-body problem, which was supposed to be so difficult, is solved as a 
matter of routine in exactly the same manner that was described in the preceding 
chapter, namely, by doing enough arithmetic. However, there are also situations 
where both methods fail: the simple problems we can do by analysis, and the 
moderately difficult problems by numerical, arithmetical methods, but the very 
complicated problems we cannot do by either method. A complicated problem is, 
for example, the collision of two automobiles, or even the motion of the molecules 
of a gas. There are countless particles in a cubic millimeter of gas, and it would 
be ridiculous to try to make calculations with so many variables (about 10 17 — 
a hundred million billion). Anything like the motion of the molecules or atoms of 
a gas or a block or iron, or the motion of the stars in a globular cluster, instead of 
just two or three planets going around the sun — such problems we cannot do 
directly, so we have to seek other means. 

In the situations in which we cannot follow details, we need to know some 
general properties, that is, general theorems or principles which are consequences 
of Newton’s laws. One of these is the principle of conservation of energy, which 
was discussed in Chapter 4. Another is the principle of conservation of momentum, 
the subject of this chapter. Another reason for studying mechanics further is that 
there are certain patterns of motion that are repeated in many different circum- 
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stances, so it is good to study these patterns in one particular circumstance. For 
example, we shall study collisions; different kinds of collisions have much in 
common. In the flow of fluids, it does not make much difference what the fluid is, 
the laws of the flow are similar. Other problems that we shall study are vibrations 
and oscillations and, in particular, the peculiar phenomena of mechanical waves — 
sound, vibrations of rods, and so on. 

In our discussion of Newton’s laws it was explained that these laws are a kind 
of program that says “Pay attention to the forces,” and that Newton told us only 
two things about the nature of forces. In the case of gravitation, he gave us the 
complete law of the force. In the case of the very complicated forces between 
atoms, he was not aware of the right laws for the forces; however, he discovered 
one rule, one general property of forces, which is expressed in his Third Law, and 
that is the total knowledge that Newton had about the nature of forces — the law 
of gravitation and this principle, but no other details. 

This principle is that action equals reaction. 

What is meant is something of this kind: Suppose we have two small bodies, 
say particles, and suppose that the first one exerts a force on the second one, 
pushing it with a certain force. Then, simultaneously, according to Newton’s 
Third Law, the second particle will push on the first with an equal force, in the 
opposite direction; furthermore, these forces effectively act in the same line. 
This is the hypothesis, or law, that Newton proposed, and it seems to be quite 
accurate, though not exact (we shall discuss the errors later). For the moment 
we shall take it to be true that action equals reaction. Of course, if there is a third 
particle, not on the same line as the other two, the law does not mean that the total 
force on the first one is equal to the total force on the second, since the third particle, 
for instance, exerts its own push on each of the other two. The result is that the 
total effect on the first two is in some other direction, and the forces on the first 
two particles are, in general, neither equal nor opposite. However, the forces on 
each particle can be resolved into parts, there being one contribution or part due 
to each other interacting particle. Then each pair of particles has corresponding 
components of mutual interaction that are equal in magnitude and opposite in 
direction. 


10-2 Conservation of momentum 

Now what are the interesting consequences of the above relationship? Sup- 
pose, for simplicity, that we have just two interacting particles, possibly of different 
mass, and numbered 1 and 2. The forces between them are equal and opposite; 
what are the consequences? According to Newton’s Second Law, force is the time 
rate of change of the momentum, so we conclude that the rate of change of momen- 
tum p i of particle 1 is equal to minus the rate of change of momentum p 2 of particle 
2, or 

dpi/dt = ~dp 2 /dt. (10.1) 

Now if the rate of change is always equal and opposite, it follows that the total 
change in the momentum of particle 1 is equal and opposite to the total change in 
the momentum of particle 2; this means that if we add the momentum of particle 
1 to the momentum of particle 2, the rate of change of the sum of these, due to 
the mutual forces (called internal forces) between particles, is zero; that is 

d(pi + p 2 )/ dt = 0. (10.2) 

There is assumed to be no other force in the problem. If the rate of change of this 
sum is always zero, that is just another way of saying that the quantity (pi + p 2 ) 
does not change. (This quantity is also written m^i + m 2 v 2 , and is called the 
total momentum of the two particles.) We have now obtained the result that the 
total momentum of the two particles does not change because of any mutual 
interactions between them. This statement expresses the law of conservation of 
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momentum in that particular example. We conclude that if there is any kind of 
force, no matter how complicated, between two particles, and we measure or 
calculate m\Vi + m 2 v 2 , that is, the sum of the two momenta, both before and 
after the forces act, the results should be equal, i.e., the total momentum is a 
constant. 

If we extend the argument to three or more interacting particles in more com- 
plicated circumstances, it is evident that so far as internal forces are concerned, the 
total momentum of all the particles stays constant, since an increase in momentum 
of one, due to another, is exactly compensated by the decrease of the second, 
due to the first. That is, all the internal forces will balance out, and therefore 
cannot change the total momentum of the particles. Then if there are no forces 
from the outside (external forces), there are no forces that can change the total 
momentum ; hence the total momentum is a constant. 

It is worth describing what happens if there are forces that do not come from 
the mutual actions of the particles in question: suppose we isolate the interacting 
particles. If there are only mutual forces, then, as before, the total momentum of 
the particles does not change, no matter how complicated the forces. On the other 
hand, suppose there are also forces coming from the particles outside the isolated 
group. Any force exerted by outside bodies on inside bodies, we call an external 
force. We shall later demonstrate that the sum of all external forces equals the rate 
of change of the total momentum of all the particles inside, a very useful theorem. 

The conservation of the total momentum of a number of interacting particles 
can be expressed as 

miVi + m 2 v 2 + m^v 3 + • • ■ = a constant, (10.3) 

if there are no net external forces. Here the masses and corresponding velocities 
of the particles are numbered 1, 2, 3, 4, . . . The general statement of Newton’s 
Second Law for each particle, 

/ = j t {mo), (10.4) 

is true specifically for the components of force and momentum in any given direc- 
tion ; thus the ^-component of the force on a particle is equal to the x-component 
of the rate of change of momentum of that particle, or 

f* = j t (mv x ), (10.5) 

and similarly for the y- and z-directions. Therefore Eq. (10.3) is really three 
equations, one for each direction. 

In addition to the law of conservation of momentum, there is another inter- 
esting consequence of Newton’s Second Law, to be proved later, but merely stated 
now. This principle is that the laws of physics will look the same whether we are 
standing still or moving with a uniform speed in a straight line. For example, a 
child bouncing a ball in an airplane finds that the ball bounces the same as though 
he were bouncing it on the ground. Even though the airplane is moving with a 
very high velocity, unless it changes its velocity, the laws look the same to the 
child as they do when the airplane is standing still. This is the so-called relativity 
principle . As we use it here we shall call it “Galilean relativity” to distinguish it 
from the more careful analysis made by Einstein, which we shall study later. 

We have just derived the law of conservation of momentum from Newton’s 
laws, and we could go on from here to find the special laws that describe impacts 
and collisions. But for the sake of variety, and also as an illustration of a kind of 
reasoning that can be used in physics in other circumstances where, for example, 
one might not know Newton’s laws and might take a different approach, we shall 
discuss the laws of impacts and collisions from a completely different point of 
view. We shall base our discussion on the principle of Galilean relativity, stated 
above, and shall end up with the law of conservation of momentum. 

We shall start by assuming that nature would look the same if we run along 
at a certain speed and watch it as it would if we were standing still. Before dis- 
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cussing collisions in which two bodies collide and stick together, or come together 
and bounce apart, we shall first consider two bodies that are held together by a 
spring or something else, and are then suddenly released and pushed by the spring 
or perhaps by a little explosion. Further, we shall consider motion in only one 
direction. First, let us suppose that the two objects are exactly the same, are nice 
symmetrical objects, and then we have a little explosion between them. After the 
explosion, one of the bodies will be moving, let us say toward the right, with a 
velocity v. Then it appears reasonable that the other body is moving toward the 
left with a velocity v, because if the objects are alike there is no reason for right or 
left to be preferred and so the bodies would do something that is symmetrical. This 
is an illustration of a kind of thinking that is very useful in many problems but 
would not be brought out if we just started with the formulas. 

The first result from our experiment is that equal objects will have equal 
speed, but now suppose that we have two objects made of different materials, 
say copper and aluminum, and we make the two masses equal. We shall now 
suppose that if we do the experiment with two masses that are equal, even though 
the objects are not identical, the velocities will be equal. Someone might object: 
“But you know, you could do it backwards, you did not have to suppose that. 
You could define equal masses to mean two masses that acquire equal velocities 
in this experiment.” We follow that suggestion and make a little explosion between 
the copper and a very large piece of aluminum, so heavy that the copper flies out 
and the aluminum hardly budges. That is too much aluminum, so we reduce the 
amount until there is just a very tiny piece, then when we make the explosion the 
aluminum goes flying away, and the copper hardly budges. That is not enough alu- 
minum. Evidently there is some right amount in between; so we keep adjusting 
the amount until the velocities come out equal. Very well then — let us turn it 
around, and say that when the velocities are equal, the masses are equal. This 
appears to be just a definition, and it seems remarkable that we can transform 
physical laws into mere definitions. Nevertheless, there are some physical laws 
involved, and if we accept this definition of equal masses, we immediately find one 
of the laws, as follows. 

Suppose we know from the foregoing experiment that two pieces of matter, 
A and B (of copper and aluminum), have equal masses, and we compare a third 
body, say a piece of gold, with the copper in the same manner as above, making 
sure that its mass is equal to the mass of the copper. If we now make the experiment 
between the aluminum and the gold, there is nothing in logic that says these masses 
must be equal; however, the experiment shows that they actually are. So now, by 
experiment, we have found a new law. A statement of this law might be: If two 
masses are each equal to a third mass (as determined by equal velocities in this 
experiment), then they are equal to each other. (This statement does not follow 
at all from a similar statement used as a postulate regarding mathematical quanti- 
ties.) From this example we can see how quickly we start to infer things if we are 
careless. It is not just a definition to say the masses are equal when the velocities 
are equal, because to say the masses are equal is to imply the mathematical laws 
of equality, which in turn makes a prediction about an experiment. 

As a second example, suppose that A and B are found to be equal by doing 
the experiment with one strength of explosion, which gives a certain velocity; if 
we then use a stronger explosion, will it be true or not true that the velocities now 
obtained are equal? Again, in logic there is nothing that can decide this question, 
but experiment shows that it is true. So, here is another law, which might be 
stated: If two bodies have equal masses, as measured by equal velocities at one 
velocity, they will have equal masses when measured at another velocity. From 
these examples we see that what appeared to be only a definition really involved 
some laws of physics. 

In the development that follows we shall assume it is true that equal masses 
have equal and opposite velocities when an explosion occurs between them. We 
shall make another assumption in the inverse case: If two identical objects, moving 
in opposite directions with equal velocities, collide and stick together by some kind 
of glue, then which way will they be moving after the collision? This is again a 
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symmetrical situation, with no preference between right and left, so we assume 
that they stand still. We shall also suppose that any two objects of equal mass, 
even if the objects are made of different materials, which collide and stick together, 
when moving with the same velocity in opposite directions will come to rest after 
the collision. 

10-3 Momentum is conserved! 

We can verify the above assumptions experimentally: first, that if two station- 
ary objects of equal mass are separated by an explosion they will move apart with 
the same speed, and second, if two objects of equal mass, coming together with the 
same speed, collide and stick together they will stop. This we can do by means of 
a marvelous invention called an air trough,* which gets rid of friction, the thing 
which continually bothered Galileo (Fig. 10-1). He could not do experiments by 
sliding things because they do not slide freely, but, by adding a magic touch, we 
can today get rid of friction. Our objects will slide without difficulty, on and on at 
a constant velocity, as advertised by Galileo. This is done by supporting the objects 
on air. Because air has very low friction, an object glides along with practically 
constant velocity when there is no applied force. First, we use two glide blocks 
which have been made carefully to have the same weight, or mass (their weight 
was measured really, but we know that this weight is proportional to the mass), 
and we place a small explosive cap in a closed cylinder between the two blocks 
(Fig. 10-2). We shall start the blocks from rest at the center point of the track and 
force them apart by exploding the cap with an electric spark. What should happen? 
If the speeds are equal when they fly apart, they should arrive at the ends of the 
trough at the same time. On reaching the ends they will both bounce back with 
practically opposite velocity, and will come together and stop at the center where 
they started. It is a good test; when it is actually done the result is just as we 
have described (Fig. 10-3). 

Now the next thing we would like to figure out is what happens in a less simple 
situation. Suppose we have two equal masses, one moving with velocity v and the 
other standing still, and they collide and stick; what is going to happen? There 
is a mass 2m altogether when we are finished, drifting with an unknown velocity. 
What velocity? That is the problem. To find the answer, we make the assumption 
that if we ride along in a car, physics will look the same as if we are standing still. 
We start with the knowledge that two equal masses, moving in opposite directions 
with equal speeds v, will stop dead when they collide. Now suppose that while 
this happens, we are riding by in an automobile, at a velocity — v. Then what does 
it look like? Since we are riding along with one of the two masses which are coming 
together, that one appears to us to have zero velocity. The other mass, however, 
going the other way with velocity v, will appear to be coming toward us at a velocity 
2v (Fig. 10-4). Finally, the combined masses after collision will seem to be passing 
by with velocity v. We therefore conclude that an object with velocity 2v, hitting 
an equal one at rest, will end up with velocity v, or what is mathematically exactly 
the same, an object with velocity v hitting and sticking to one at rest will produce 
an object moving with velocity v/2. Note that if we multiply the mass and the 
velocity beforehand and add them together, mv + 0, we get the same answer as 
when we multiply the mass and the velocity of everything afterwards, 2m times 
v/2. So that tells us what happens when a mass of velocity v hits one standing still. 

In exactly the same manner we can deduce what happens when equal objects 
having any two velocities hit each other. 

Suppose we have two equal bodies with velocities v\ and v 2 , respectively, 
which collide and stick together. What is their velocity v after the collision? 
Again we ride by in an automobile, say at velocity v 2 , so that one body appears to 
be at rest. The other then appears to have a velocity v\ — v 2 , and we have the 
same case that we had before. When it is all finished they will be moving at 
i( y i — v 2 ) with respect to the car. What then is the actual speed on the ground? 
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Fig. 1 0-3. Schematic view of action- 
reaction experiment with equal masses. 


VIEW FROM VIEW FROM 

CENTER OF MASS MOVING CAR 

(CAR VELOCITY * — V) 

V -V 2 V -► 0 

I m I f~m I BEFORE COLLISION | m | | m j 


V=0 V-» 

I m | m | AFTER COLLISION |~m fm 1 
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* H. V. Neher and R. B. Leighton, Amer . Jour, of Phys . 31, 255 (1963). 
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Fig. 1 0-6. An experiment to verify 
that a mass m with velocity v striking a 
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velocity v/2. 
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It is v = %(v x — v 2 ) + v 2 or \{v x + v 2 ) (Fig. 10-5). Again we note that 

mv i + mv 2 = 2m(v x + v 2 )/2. (10.6) 

Thus, using this principle, we can analyze any kind of collision in which two 
bodies of equal mass hit each other and stick. In fact, although we have worked 
only in one dimension, we can find out a great deal about much more complicated 
collisions by imagining that we are riding by in a car in some oblique direction. 
The principle is the same, but the details get somewhat complicated. 

In order to test experimentally whether an object moving with velocity v, 
colliding with an equal one at rest, forms an object moving with velocity v/2 , we 
may perform the following experiment with our air-trough apparatus. We place 
in the trough three equally massive objects, two of which are initially joined to- 
gether with our explosive cylinder device, the third being very near to but slightly 
separated from these and provided with a sticky bumper so that it will stick to 
another object which hits it. Now, a moment after the explosion, we have two 
objects of mass m moving with equal and opposite velocities v. A moment after 
that, one of these collides with the third object and makes an object of mass 2m 
moving, so we believe, with velocity v/2. How do we test whether it is really v/2? 
By arranging the initial positions of the masses on the trough so that the distances 
to the ends are not equal, but are in the ratio 2:1. Thus our first mass, which 
continues to move with velocity v } should cover twice as much distance in a given 
time as the two which are stuck together (allowing for the small distance travelled 
by the second object before it collided with the third). The mass m and the mass 
2m should reach the ends at the same time, and when we try it, we find that they 
do (Fig. 10-6). 

The next problem that we want to work out is what happens if we have two 
different masses. Let us take a mass m and a mass 2m and apply our explosive 
interaction. What will happen then? If, as a result of the explosion, m moves with 
velocity v, with what velocity does 2m move? The experiment we have just done 
may be repeated with zero separation between the second and third masses, and 
when we try it we get the same result, namely, the reacting masses m and 2m 
attain velocities — v and v/2 . Thus the direct reaction between m and 2m gives 
the same result as the symmetrical reaction between m and m, followed by a collision 
between m and a third mass m in which they stick together. Furthermore, we find 
that the masses m and 2m returning from the ends of the trough, with their veloci- 
ties (nearly) exactly reversed, stop dead if they stick together. 

Now the next question we may ask is this. What will happen if a mass m with 
velocity v, say, hits and sticks to another mass 2m at rest? This is very easy to 
answer using our principle of Galilean relativity, for we simply watch the collision 
which we have just described from a car moving with velocity —v/2 (Fig. 10-7). 
From the car, the velocities are 

v\ = v — z;(car) = v + v/2 = 3v/2 

and 

v f 2 = —v/2 — r(car) = —v/2 + v/2 - 0. 

After the collision, the mass 3m appears to us to be moving with velocity v/2. 
Thus we have the answer, i.e., the ratio of velocities before and after collision is 
3 to 1 : if an object of mass m collides with a stationary object of mass 2m, then the 
whole thing moves off, stuck together, with a velocity 1/3 as much. The general 
rule again is that the sum of the products of the masses and the velocities stays the 
same: mv + 0 equals 3m times v/3, so we are gradually building up the theorem 
of the conservation of momentum, piece by piece. 

Now we have one against two. Using the same arguments, we can predict the 
result of one against three, two against three, etc. The case of two against three, 
starting from rest, is shown in Fig. 10-8. 

In every case we find that the mass of the first object times its velocity, plus 
the mass of the second object times its velocity, is equal to the total mass of the 
final object times its velocity. These are all examples, then, of the conservation of 
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momentum. Starting from simple, symmetrical cases, we have demonstrated the 
law for more complex cases. We could, in fact, do it for any rational mass ratio, 
and since every ratio is exceedingly close to a rational ratio, we can handle every 
ratio as precisely as we wish. 

10-4 Momentum and energy 

All the foregoing examples are simple cases where the bodies collide and stick 
together, or were initially stuck together and later separated by an explosion. 
However, there are situations in which the bodies do not cohere, as, for example, 
two bodies of equal mass which collide with equal speeds and then rebound. 
For a brief moment they are in contact and both are compressed. At the instant 
of maximum compression they both have zero velocity and energy is stored in the 
elastic bodies, as in a compressed spring. This energy is derived from the kinetic 
energy the bodies had before the collision, which becomes zero at the instant their 
velocity is zero. The loss of kinetic energy is only momentary, however. The 
compressed condition is analogous to the cap that releases energy in an explosion. 
The bodies are immediately decompressed in a kind of explosion, and fly apart 
again; but we already know that case — the bodies fly apart with equal speeds. 
However, this speed of rebound is less, in general, than the initial speed, because 
not all the energy is available for the explosion, depending on the material. If the 
material is putty no kinetic energy is recovered, but if it is something more rigid, 
some kinetic energy is usually regained. In the collision the rest of the kinetic 
energy is transformed into heat and vibrational energy — the bodies are hot and 
vibrating. The vibrational energy also is soon transformed into heat. It is possible 
to make the colliding bodies from highly elastic materials, such as steel, with 
carefully designed spring bumpers, so that the collision generates very little heat 
and vibration. In these circumstances the velocities of rebound are practically 
equal to the initial velocities; such a collision is called elastic. 

That the velocities before and after an elastic collision are equal is not a matter 
of conservation of momentum, but a matter of conservation of kinetic energy . 
That the speeds of the bodies rebounding after a symmetrical collision are equal 
to each other , however, is a matter of conservation of momentum. 

We might similarly analyze collisions between bodies of different masses, 
different initial velocities, and various degrees of elasticity, and determine the final 
velocities and the loss of kinetic energy, but we shall not go into the details of 
these processes. 

Elastic collisions are especially interesting for systems that have no internal 
“gears, wheels, or parts.” Then when there is a collision there is nowhere for the 
energy to be impounded, because the objects that move apart are in the same 
condition as when they collided. Therefore, between very elementary objects, the 
collisions are always elastic or very nearly elastic. For instance, the collisions 
between atoms or molecules in a gas are said to be perfectly elastic. Although this 
is an excellent approximation, even such collisions are not perfectly elastic; other- 
wise one could not understand how energy in the form of light or heat radiation 
could come out of a gas. Once in a while, in a gas collision, a low-energy infrared 
ray is emitted, but this occurrence is very rare and the energy emitted is very small. 
So, for most purposes, collisions of molecules in gases are considered to be per- 
fectly elastic. 

As an interesting example, let us consider an elastic collision between two 
objects of equal mass. If they come together with the same speed, they would 
come apart at that same speed, by symmetry. But now look at this in another 
circumstance, in which one of them is moving with velocity v and the other one is 
at rest. What happens? We have been through this before. We watch the sym- 
metrical collision from a car moving along with one of the objects, and we find 
that if a stationary body is struck elastically by another body of exactly the same 
mass, the moving body stops, and the one that was standing still now moves away 
with the same speed that the other one had; the bodies simply exchange velocities. 
This behavior can easily be demonstrated with a suitable impact apparatus. More 
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generally, if both bodies are moving, with different velocities, they simply exchange 
velocity at impact. 

Another example of an almost elastic interaction is magnetism. If we arrange 
a pair of U-shaped magnets in our glide blocks, so that they repel each other, 
when one drifts quietly up to the other, it pushes it away and stands perfectly still, 
and now the other goes along, frictionlessly. 

The principle of conservation of momentum is very useful, because it enables 
us to solve many problems without knowing the details. We did not know the 
details of the gas motions in the cap explosion, yet we could predict the velocities 
with which the bodies came apart, for example. Another interesting example is 
rocket propulsion. A rocket of large mass, Af, ejects a small piece, of mass m , with 
a terrific velocity V relative to the rocket. After this the rocket, if it were originally 
standing still, will be moving with a small velocity, v. Using the principle of con- 
servation of momentum, we can calculate this velocity to be 


So long as material is being ejected, the rocket continues to pick up speed. 
Rocket propulsion is essentially the same as the recoil of a gun: there is no need 
for any air to push against. 


10-5 Relativistic momentum 


In modern times the law of conservation of momentum has undergone certain 
modifications. However, the law is still true today, the modifications being mainly 
in the definitions of things. In the theory of relativity it turns out that we do have 
conservation of momentum; the particles have mass and the momentum is still 
given by mv , the mass times the velocity, but the mass changes with the velocity , 
hence the momentum also changes. The mass varies with velocity according to 
the law 


m 


m 0 5 

Vl - v 2 /c 2 ’ 


(10.7) 


where m 0 is the mass of the body at rest and c is the speed of light. It is easy to 
see from the formula that there is negligible difference between m and m 0 unless 
v is very large, and that for ordinary velocities the expression for momentum 
reduces to the old formula. 

The components of momentum for a single particle are written as 


Px = 


m 0 v x 


\/l — v 2 /c 2 


Pv 


m 0 v y 


\/\ — v 2 /c 2 


Pz = 


m 0 v z 


\/l — v 2 /c 2 


( 10 . 8 ) 


where v 2 = v 2 x + + v 2 . If the ^-components are summed over all the inter- 

acting particles, both before and after a collision, the sums are equal; that is, 
momentum is conserved in the jc-direction. The same holds true in any direction. 

In Chapter 4 we saw that the law of conservation of energy is not valid unless 
we recognize that energy appears in different forms, electrical energy, mechanical 
energy, radiant energy, heat energy, and so on. In some of these cases, heat energy 
for example, the energy might be said to be “hidden.” This example might suggest 
the question, “Are there also hidden forms of momentum — perhaps heat momen- 
tum?” The answer is that it is very hard to hide momentum for the following 
reasons. 

The random motions of the atoms of a body furnish a measure of heat energy, 
if the squares of the velocities are summed. This sum will be a positive result, 
having no directional character. The heat is there, whether or not the body moves 
as a whole, and conservation of energy in the form of heat is not very obvious. 
On the other hand, if one sums the velocities , which have direction, and finds a 
result that is not zero, that means that there is a drift of the entire body in some 
particular direction, and such a gross momentum is readily observed. Thus there 
is no random internal lost momentum, because the body has net momentum only 
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when it moves as a whole. Therefore momentum, as a mechanical quantity, is 
difficult to hide. Nevertheless, momentum can be hidden — in the electromagnetic 
field, for example. This case is another effect of relativity. 

One of the propositions of Newton was that interactions at a distance are 
instantaneous. It turns out that such is not the case; in situations involving 
electrical forces, for instance, if an electrical charge at one location is suddenly 
moved, the effects on another charge, at another place, do not appear instantane- 
ously — there is a little delay. In those circumstances, even if the forces are equal 
the momentum will not check out; there will be a short time during which there 
will be trouble, because for a while the first charge will feel a certain reaction force, 
say, and will pick up some momentum, but the second charge has felt nothing and 
has not yet changed its momentum. It takes time for the influence to cross the 
intervening distance, which it does at 186,000 miles a second. In that tiny time 
the momentum of the particles is not conserved. Of course after the second charge 
has felt the effect of the first one and all is quieted down, the momentum equation 
will check out all right, but during that small interval momentum is not conserved. 
We represent this by saying that during this interval there is another kind of mo- 
mentum besides that of the particle, mv, and that is momentum in the electro- 
magnetic field. If we add the field momentum to the momentum of the particles, 
then momentum is conserved at any moment all the time. The fact that the electro- 
magnetic field can possess momentum and energy makes that field very real, and 
so, for better understanding, the original idea that there are just the forces between 
particles has to be modified to the idea that a particle makes a field, and a field 
acts on another particle, and the field itself has such familiar properties as energy 
content and momentum, just as particles can have. To take another example: an 
electromagnetic field has waves, which we call light; it turns out that light also 
carries momentum with it, so when light impinges on an object it carries in a 
certain amount of momentum per second ; this is equivalent to a force, because if 
the illuminated object is picking up a certain amount of momentum per second, 
its momentum is changing and the situation is exactly the same as if there were a 
force on it. Light can exert pressure by bombarding an object; this pressure is 
very small, but with sufficiently delicate apparatus it is measurable. 

Now in quantum mechanics it turns out that momentum is a different thing — 
it is no longer mv . It is hard to define exactly what is meant by the velocity of a 
particle, but momentum still exists. In quantum mechanics the difference is that 
when the particles are represented as particles, the momentum is still mv, but when 
the particles are represented as waves, the momentum is measured by the number 
of waves per centimeter: the greater this number of waves, the greater the momen- 
tum. In spite of the differences, the law of conservation of momentum holds also 
in quantum mechanics. Even though the law f — ma is false, and all the deriva- 
tions of Newton were wrong for the conservation of momentum, in quantum 
mechanics, nevertheless, in the end, that particular law maintains itself! 
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11 


Vectors 


11-1 Symmetry in physics 

In this chapter we introduce a subject that is technically known in physics as 
symmetry in physical law . The word “symmetry” is used here with a special 
meaning, and therefore needs to be defined. When is a thing symmetrical — how 
can we define it? When we have a picture that is symmetrical, one side is somehow 
the same as the other side. Professor Hermann Weyl has given this definition of 
symmetry: a thing is symmetrical if one can subject it to a certain operation and 
it appears exactly the same after the operation. For instance, if we look at a vase 
that is left-and-right symmetrical, then turn it 180° around the vertical axis, it 
looks the same. We shall adopt the definition of symmetry in Weyl’s more general 
form, and in that form we shall discuss symmetry of physical laws. 

Suppose we build a complex machine in a certain place, with a lot of compli- 
cated interactions, and balls bouncing around with forces between them, and so on. 
Now suppose we build exactly the same kind of equipment at some other place, 
matching part by part, with the same dimensions and the same orientation, every- 
thing the same only displaced laterally by some distance. Then, if we start the 
two machines in the same initial circumstances, in exact correspondence, we ask: 
will one machine behave exactly the same as the other? Will it follow all the mo- 
tions in exact parallelism? Of course the answer may well be no , because if we 
choose the wrong place for our machine it might be inside a wall and interferences 
from the wall would make the machine not work. 

All of our ideas in physics require a certain amount of common sense in their 
application; they are not purely mathematical or abstract ideas. We have to under- 
stand what we mean when we say that the phenomena are the same when we move 
the apparatus to a new position. We mean that we move everything that we 
believe is relevant; if the phenomenon is not the same, we suggest that something 
relevant has not been moved, and we proceed to look for it. If we never find it, 
then we claim that the laws of physics do not have this symmetry. On the other 
hand, we may find it — we expect to find it — if the laws of physics do have this 
symmetry; looking around, we may discover, for instance, that the wall is pushing 
on the apparatus. The basic question is, if we define things well enough, if all the 
essential forces are included inside the apparatus, if all the relevant parts are moved 
from one place to another, will the laws be the same? Will the machinery work 
the same way? 

It is clear that what we want to do is to move all the equipment and essential 
influences, but not everything in the world — planets, stars, and all — for if we do 
that, we have the same phenomenon again for the trivial reason that we are right 
back where we started. No, we cannot move everything. But it turns out in 
practice that with a certain amount of intelligence about what to move, the ma- 
chinery will work. In other words, if we do not go inside a wall, if we know the 
origin of the outside forces, and arrange that those are moved too, then the ma- 
chinery will work the same in one location as in another. 
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11-2 Translations 

We shall limit our analysis to just mechanics, for which we now have sufficient 
knowledge. In previous chapters we have seen that the laws of mechanics can be 
summarized by a set of three equations for each particle : 

m(d 2 x/dt 2 ) — F x , m(d 2 y/dt 2 ) = F y , m(d 2 z/dt 2 ) = F z . (11.1) 
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Translations 
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Fig. 11-1. 
systems. 


Two parallel coordinate 


Now this means that there exists a way to measure x, y, and z on three perpendicu- 
lar axes, and the forces along those directions, such that these laws are true. 
These must be measured from some origin, but where do we put the origin ? All 
that Newton would tell us at first is that there is some place that we can measure 
from, perhaps the center of the universe, such that these laws are correct. But we 
can show immediately that we can never find the center, because if we use some 
other origin it would make no difference. In other words, suppose that there are 
two people — Joe, who has an origin in one place, and Moe, who has a parallel 
system whose origin is somewhere else (Fig. 11-1). Now when Joe measures the 
location of the point in space, he finds it at x, y, and z (we shall usually leave z out 
because it is too confusing to draw in a picture). Moe, on the other hand, when 
measuring the same point, will obtain a different jc (in order to distinguish it, we 
will call it x'), and in principle a different y, although in our example they are 
numerically equal. So we have 

x r = x — a, y f = y, z r — z. (11.2) 

Now in order to complete our analysis we must know what Moe would obtain for 
the forces. The force is supposed to act along some line, and by the force in the 
jc-direction we mean the part of the total which is in the jc-direction, which is 
the magnitude of the force times this cosine of its angle with the x:-axis. Now we 
see that Moe would use exactly the same projection as Joe would use, so we have 
a set of equations 


iv - F x> F y > = Fy, Ty = F z . (11.3) 

These would be the relationships between quantities as seen by Joe and Moe. 

The question is, if Joe knows Newton’s laws, and if Moe tries to write down 
Newton’s laws, will they also be correct for him? Does it make any difference from 
which origin we measure the points? In other words, assuming that equations 
(1 1 . 1) are true, and the Eqs. (1 1 .2) and (11.3) give the relationship of the measure- 
ments, is it or is it not true that 


( a ) 

m{d 2 x f /dt 2 ) = Tv, 


(b) 

m{d 2 y f /dt 2 ) = iy, 

(11.4) 

(c) 

m(d 2 z f /dt 2 ) = iy? 



In order to test these equations we shall differentiate the formula for x f 
twice. First of all 


dx f _ d ( ^ \ dx da 

~dt ' Jr x a) ~ Tt Tt' 

Now we shall assume that Moe’s origin is fixed (not moving) relative to Joe’s; 
therefore a is a constant and da/dt = 0, so we find that 

dx f /dt = dx/df 

and therefore 

d 2 x f /dt 2 = d 2 x/dt 2 ; 

therefore we know that Eq. (11.4a) becomes 

m(d 2 x/dt 2 ) = Ty. 

(We also suppose that the masses measured by Joe and Moe are equal.) Thus the 
acceleration times the mass is the same as the other fellow’s. We have also found 
the formula for iy, for, substituting from Eq. (11.1), we find that 

F x * = F x . 

Therefore the laws as seen by Moe appear the same; he can write Newton’s 
laws too, with different coordinates, and they will still be right. That means that 
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there is no unique way to define the origin of the world, because the laws will 
appear the same, from whatever position they are observed. 

This is also true: if there is a piece of equipment in one place with a certain 
kind of machinery in it, the same equipment in another place will behave in the 
same way. Why? Because one machine, when analyzed by Moe, has exactly the 
same equations as the other one, analyzed by Joe. Since the equations are the same, 
the phenomena appear the same. So the proof that an apparatus in a new position 
behaves the same as it did in the old position is the same as the proof that the 
equations when displaced in space reproduce themselves. Therefore we say that 
the lam of physics are symmetrical for translational displacements , symmetrical 
in the sense that the laws do not change when we make a translation of our co- 
ordinates. Of course it is quite obvious intuitively that this is true, but it is inter- 
esting and entertaining to discuss the mathematics of it. 


11-3 Rotations 

The above is the first of a series of ever more complicated propositions con- 
cerning the symmetry of a physical law. The next proposition is that it should 
make no difference in which direction we choose the axes. In other words, if we 
build a piece of equipment in some place and watch it operate, and nearby we 
build the same kind of apparatus but put it up on an angle, will it operate in the 
same way? Obviously it will not if it is a Grandfather clock, for example! If a 
pendulum clock stands upright, it works fine, but if it is tilted the pendulum falls 
against the side of the case and nothing happens. The theorem is then false in 
the case of the pendulum clock, unless we include the earth, which is pulling on 
the pendulum. Therefore we can make a prediction about pendulum clocks if 
we believe in the symmetry of physical law for rotation: something else is involved 
in the operation of a pendulum clock besides the machinery of the clock, something 
outside it that we should look for. We may also predict that pendulum clocks will 
not work the same way when located in different places relative to this mysterious 
source of asymmetry, perhaps the earth. Indeed, we know that a pendulum clock 
up in an artificial satellite, for example, would not tick either, because there is no 
effective force, and on Mars it would go at a different rate. Pendulum clocks do 
involve something more than just the machinery inside, they involve something 
on the outside. Once we recognize this factor, we see that we must turn the earth 
along with the apparatus. Of course we do not have to worry about that, it is easy 
to do; one simply waits a moment or two and the earth turns; then the pendulum 
clock ticks again in the new position the same as it did before. While we are 
rotating in space our angles are always changing, absolutely; this change does not 
seem to bother us very much, for in the new position we seem to be in the same 
condition as in the old. This has a certain tendency to confuse one, because it is 
true that in the new turned position the laws are the same as in the unturned 
position, but it is not true that as we turn a thing it follows the same laws as it does 
when we are not turning it. If we perform sufficiently delicate experiments, we 
can tell that the earth is rotating , but not that it had rotated. In other words, we 
cannot locate its angular position, but we can tell that it is changing. 

Now we may discuss the effects of angular orientation upon physical laws. 
Let us find out whether the same game with Joe and Moe works again. This time, 
to avoid needless complication, we shall suppose that Joe and Moe use the same 
origin (we have already shown that the axes can be moved by translation to another 
place). Assume that Moe’s axes have rotated relative to Joe’s by an angle 6. 
The two coordinate systems are shown in Fig. 11-2, which is restricted to two 
dimensions. Consider any point P having coordinates (x, y) in Joe’s system and 
( x', y') in Moe’s system. We shall begin, as in the previous case, by expressing 
the coordinates x' and y ' in terms of x , y, and 0. To do so, we first drop perpendic- 
ulars from P to all four axes and draw AB perpendicular to PQ. Inspection of the 
figure shows that x f can be written as the sum of two lengths along the x'-axis, 
and y* as the difference of two lengths along AB. All these lengths are expressed 
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Fig. 1 1 -2. Two coordinate systems 
having different angular orientations. 




in terms of x, y, and 0 in equations (11.5), to which we have added an equation 
for the third dimension. 

x f = x cos d + y sin 0 , 

y = y cos 0 — xsinO, (11-5) 

z' = z. 

The next step is to analyze the relationship of forces as seen by the two observers, 

following the same general method as before. Let us assume that a force F, which 

has already been analyzed as having components F x and F y (as seen by Joe), is 
acting on a particle of mass m , located at point P in Fig. 1 1-2. For simplicity, let 
us move both sets of axes so that the origin is at F, as shown in Fig. 11-3. Moe 
sees the components of F along his axes as F x > and F y > . F x has components along 
both the x f - and y'-axes, and F y likewise has components along both these axes. 
To express F x > in terms of F x and F y , we sum these components along the ;c'-axis, 
and in a like manner we can express F y > in terms of F x and F y . The results are 

F x > — F x cos 0 + F y sin 0 , 

F y * = Fy cos 6 — F x sin 0, (11 .6) 

F z * = F z . 


Fig. 1 1-3. Components of a force in It is interesting to note an accident of sorts, which is of extreme importance: the 
the two systems. formulas (1 1.5) and (1 1.6), for coordinates of P and components of F, respectively, 

are of identical form. 

As before, Newton’s laws are assumed to be true in Joe’s system, and are 
expressed by equations (11.1). The question, again, is whether Moe can apply 
Newton’s laws — will the results be correct for his system of rotated axes? In other 
words, if we assume that Eqs. (1 1.5) and (1 1.6) give the relationship of the measure- 
ments, is it true or not true that 

m(d 2 x f /dt 2 ) = Fg', 

m{d 2 y f /dt 2 ) = Fy>, (11.7) 

m(d 2 z f /dt 2 ) = F z > ? 

To test these equations, we calculate the left and right sides independently, and 
compare the results. To calculate the left sides, we multiply equations (1 1.5) by m, 
and differentiate twice with respect to time, assuming the angle 0 to be constant. 
This gives 


m(d 2 x f /dt 2 ) = m(d 2 x/dt 2 ) cos 0 + m(d 2 y/dt 2 ) sin 0, 
m(d 2 y f /dt 2 ) = m(d 2 y/dt 2 ) cos 0 — m(d 2 x/dt 2 ) sin 0, (1 1.8) 

m{d 2 z f /dt 2 ) = m{d 2 z/dt 2 ). 

We calculate the right sides of equations (11.7) by substituting equations (11.1) 
into equations (11.6). This gives 

F x > = m{d 2 x/dt 2 ) cos d + m{d 2 y/dt 2 ) sin 0, 

F y > — m{d 2 y/dt 2 ) cos 6 — m(d 2 x/dt 2 ) sin 0, (11.9) 

F z ’ = m(d 2 z/dt 2 ). 

Behold! The right sides of Eqs. (1 1 .8) and (1 1 .9) are identical, so we conclude 
that if Newton’s laws are correct on one set of axes, they are also valid on any 
other set of axes. This result, which has now been established for both translation 
and rotation of axes, has certain consequences: first, no one can claim his particular 
axes are unique, but of course they can be more convenient for certain particular 
problems. For example, it is handy to have gravity along one axis, but this is not 
physically necessary. Second, it means that any piece of equipment which is 
completely self-contained, with all the force-generating equipment completely in- 
side the apparatus, would work the same when turned at an angle. 
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11-4 Vectors 


Not only Newton’s laws, but also the other laws of physics, so far as we know 
today, have the two properties which we call invariance (or symmetry) under 
translation of axes and rotation of axes. These properties are so important that a 
mathematical technique has been developed to take advantage of them in writing 
and using physical laws. 

The foregoing analysis involved considerable tedious mathematical work. 
To reduce the details to a minimum in the analysis of such questions, a very power- 
ful mathematical machinery has been devised. This system, called vector analysis , 
supplies the title of this chapter; strictly speaking, however, this is a chapter on 
the symmetry of physical laws. By the methods of the preceding analysis we were 
able to do everything required for obtaining the results that we sought, but in 
practice we should like to do things more easily and rapidly, so we employ the 
vector technique. 

We began by noting some characteristics of two kinds of quantities that are 
important in physics. (Actually there are more than two, but let us start out with 
two.) One of them, like the number of potatoes in a sack, we call an ordinary 
quantity, or an undirected quantity, or a scalar . Temperature is an example of 
such a quantity. Other quantities that are important in physics do have direction, 
for instance velocity: we have to keep track of which way a body is going, not just 
its speed. Momentum and force also have direction, as does displacement: when 
someone steps from one place to another in space, we can keep track of how far 
he went, but if we wish also to know where he went, we have to specify a direction. 

All quantities that have a direction, like a step in space, are called vectors. 

A vector is three numbers. In order to represent a step in space, say from the 
origin to some particular point P whose location is (x, y, z), we really need three 
numbers, but we are going to invent a single mathematical symbol, r, which is 
unlike any other mathematical symbols we have so far used.* It is not a single 
number, it represents three numbers: x, y, and z. It means three numbers, but 
not really only those three numbers, because if we were to use a different coordinate 
system, the three numbers would be changed to x\ y\ and z'. However, we want 
to keep our mathematics simple and so we are going to use the same mark to repre- 
sent the three numbers (x, y, z ) and the three numbers (x! , y\ z'). That is, we use 
the same mark to represent the first set of three numbers for one coordinate system, 
but the second set of three numbers if we are using the other coordinate system. 
This has the advantage that when we change the coordinate system, we do not 
have to change the letters of our equations. If we write an equation in terms of 
x, y, z, and then use another system, we have to change to x f 9 y\ z', but we shall 
just write r, with the convention that it represents (x, y , z) if we use one set of axes, 
or (x\ y , z f ) if we use another set of axes, and so on. The three numbers which 
describe the quantity in a given coordinate system are called the components of the 
vector in the direction of the coordinate axes of that system. That is, we use the 
same symbol for the three letters that correspond to the same object , as seen from 
different axes. The very fact that we can say “the same object” implies a physical 
intuition about the reality of a step in space, that is independent of the components 
in terms of which we measure it. So the symbol r will represent the same thing 
no matter how we turn the axes. 

Now suppose there is another directed physical quantity, any other quantity, 
which also has three numbers associated with it, like force, and these three 
numbers change to three other numbers by a certain mathematical rule, if we 
change the axes. It must be the same rule that changes (*, y, z) into (*', y\ z'). In 
other words, any physical quantity associated with three numbers which transform 
as do the components of a step in space is a vector. An equation like 

F - r 

would thus be true in any coordinate system if it were true in one. This equation, 

* In type, vectors are represented by boldface; in handwritten form an arrow is used: r. 

11-5 



of course, stands for the three equations 

F x = Fy = y, F z = z, 

or, alternatively, for 

F X ' = Fy' = /, F Z ' = z'. 

The fact that a physical relationship can be expressed as a vector equation assures 
us the relationship is unchanged by a mere rotation of the coordinate system. 
That is the reason why vectors are so useful in physics. 

Now let us examine some of the properties of vectors. As examples of vectors 
we may mention velocity, momentum, force, and acceleration. For many purposes 
it is convenient to represent a vector quantity by an arrow that indicates the direc- 
tion in which it is acting. Why can we represent force, say, by an arrow? Because 
it has the same mathematical transformation properties as a “step in space.” We 
thus represent it in a diagram as if it were a step, using a scale such that one unit 
of force, or one newton, corresponds to a certain convenient length. Once we 
have done this, all forces can be represented as lengths, because an equation like 

F = At, 

where k is some constant, is a perfectly legitimate equation. Thus we can always 
represent forces by lines, which is very convenient, because once we have drawn 
the line we no longer need the axes. Of course, we can quickly calculate the three 
components as they change upon turning the axes, because that is just a geometric 
problem. 

11-5 Vector algebra 

Now we must describe the laws, or rules, for combining vectors in various 
ways. The first such combination is the addition of two vectors: suppose that 
a is a vector which in some particular coordinate system has the three components 
(( a x , a y , a z ), and that b is another vector which has the three components (b x , b v , b z ). 
Now let us invent three new numbers (a x + b x , a v + b y , a z + b z ). Do these form 
a vector? “Well,” we might say, “they are three numbers, and every three numbers 
form a vector.” No, not every three numbers form a vector! In order for it to be a 
vector, not only must there be three numbers, but these must be associated with a 
coordinate system in such a way that if we turn the coordinate system, the three 
numbers “revolve” on each other, get “mixed up” in each other, by the precise 
laws we have already described. So the question is, if we now rotate the coordinate 
system so that ( a x , a y , a z ) become (a X ', a y ', a z > ) and ( b x , b y , b z ) become ( [b x > , b y », b z >) 9 
what do (a x + b x , a y + b y , a z + b z ) become? Do they become {a X ' + b X ', 
a y ' + b y ', a Z ' + b Z ') or not? The answer is, of course, yes, because the prototype 
transformations of Eq. (11.5) constitute what we call a linear transformation. 
If we apply those transformations to a x and b x to get a x > + b x >, we find that 
the transformed a x + b x is indeed the same as a x ' + b x '. When a and b are 
“added together” in this sense, they willjorm a vector which we may call c. We 
would write this as 

c = a + b. 

Now c has the interesting property 

c = b + a, 

as we can immediately see from its components. Thus also, 
a + (b + c) = (a + b) + c. 

We can add vectors in any order. 

What is the geometric significance of a + b? Suppose that a and b were 
represented by lines on a piece of paper, what would c look like? This is shown in 
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Fig. 11-4. We see that we can add the components of b to those of a most con- 
veniently if we place the rectangle representing the components of b next to that 
representing the components of a in the manner indicated. Since b just “fits” 
into its rectangle, as does a into its rectangle, this is the same as putting the “tail” 
of b on the “head” of a, the arrow from the “tail” of a to the “head” of b being 
the vector c. Of course, if we added a to b the other way around, we would put the 
“tail” of a on the “head” of b, and by the geometrical properties of parallelograms 
we would get the same result for c. Note that vectors can be added in this way 
without reference to any coordinate axes. 

Suppose we multiply a vector by a number a, what does this mean? We 
define it to mean a new vector whose components are aa x> aa y , and aa z . We leave 
it as a problem for the student to prove that it is a vector. 

Now let us consider vector subtraction. We may define subtraction in the 
same way as addition, but instead of adding, we subtract the components. Or 
we might define subtraction by defining a negative vector, — b = —lb, and then 
we would add the components. It comes to the same thing. The result is shown 
in Fig. 11-5. This figure shows d = a — b=a-H ( — b); we also note that the 
difference a — b can be found very easily from a and b by using the equivalent 
relation a — b + d. Thus the difference is even easier to find than the sum: we 
just draw the vector from b to a, to get a — b! 

Next we discuss velocity. Why is velocity a vector? If position is given by the 
three coordinates (x, y, z), what is the velocity? The velocity is given by dx/dt , 
dy/dt, and dz/dt Is that a vector, or not? We can find out by differentiating the 
expressions in Eq. (11.5) to find out whether dx f /dt transforms in the right way. 
We see that the components dx/dt and dy/dt do transform according to the same 
law as ;t and y, and therefore the time derivative is a vector. So the velocity is a 
vector. We can write the velocity in an interesting way as 

v = dx/dt. 

What the velocity is, and why it is a vector, can also be understood more pictorially : 
How far does a particle move in a short time A/? Answer; Ar, so if a particle is 
“here” at one instant and “there” at another instant, then the vector difference 
of the positions Ar = r 2 — ri, which is in the direction of motion shown in Fig. 
11-6, divided by the time interval At = t 2 ~ 1 1 , is the “average velocity” vector. 

In other words, by vector velocity we mean the limit, as At goes to 0, of the 
difference between the radius vectors at the time t + At and the time t, divided by 
Ar. 

v = lim (Ar/A t) = dx/dt. (11.10) 

Af— >-0 

Thus velocity is a vector because it is the difference of two vectors. It is also the 
right definition of velocity because its components are dx/dt , dy/dt , and dz/dt. 
In fact, we see from this argument that if we differentiate any vector with respect 
to time we produce a new vector. So we have several ways of producing new 
vectors: (1) multiply by a constant, (2) differentiate with respect to time, (3) add 
or subtract two vectors. 



Fig. 1 1-4. The addition of vectors. 



Fig. 1 1-5. The subtraction of vectors. 



Fig. 11-6. 
particle in a 
h - fi. 


The displacement of 
short time interval At 


a 


11-6 Newton’s laws in vector notation 


In order to write Newton’s laws in vector form, we have to go just one step 
further, and define the acceleration vector. This is the time derivative of the velocity 
vector, and it is easy to demonstrate that its components are the second derivatives 
of x, y, and z with respect to t: 

d\ ( d\( dr\ 

a dt \dt)\dt) 


d 2 r 
dt 2 : 


( 11 . 11 ) 


a x 


dv x 

~dt 


d 2 x 


dt 2 


dv y d 2 y 

= "dt = di 2 ’ 


dv z 


a z = 


dt 


fz 

dt 2 * 


( 11 . 12 ) 
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With this definition, then, Newton’s laws can be written in this way: 



Fig. 11-7. A curved trajectory. 



Fig. 11-8. Diagram for calculating 
the acceleration. 


wa = F (11.13) 

or 

m(d 2 r/dt 2 ) = F. (11.14) 

Now the problem of proving the invariance of Newton’s laws under rotation 
of coordinates is this: prove that a is a vector; this we have just done. Prove that F 
is a vector; we suppose it is. So if force is a vector, then, since we know acceleration 
is a vector, Eq. (1 1.13) will look the same in any coordinate system. Writing it in 
a form which does not explicitly contain x’s, y% and z’s has the advantage that 
from now on we need not write three laws every time we write Newton’s equations 
or other laws of physics. We write what looks like one law, but really, of course, 
it is the three laws for any particular set of axes, because any vector equation 
involves the statement that each of the components is equal. 

The fact that the acceleration is the rate of change of the vector velocity helps 
us to calculate the acceleration in some rather complicated circumstances. Suppose, 
for instance, that a particle is moving on some complicated curve (Fig. 11-7) and 
that, at a given instant t , it had a certain velocity Vi, but that when we go to another 
instant t 2 a little later, it has a different velocity v 2 . What is the acceleration? 
Answer: Acceleration is the difference in the velocity divided by the small time 
interval, so we need the difference of the two velocities. How do we get the differ- 
ence of the velocities? To subtract two vectors, we put the vector across the ends 
of v 2 and Vi; that is, we draw A as the difference of the two vectors, right? No! 
That only works when the tails of the vectors are in the same place! It has no mean- 
ing if we move the vector somewhere else and then draw a line across, so watch out! 
We have to draw a new diagram to subtract the vectors. In Fig. 11-8, Vi and v 2 
are both drawn parallel and equal to their counterparts in Fig. 11-7, and now we 
can discuss the acceleration. Of course the acceleration is simply Av/A t. It is 
interesting to note that we can compose the velocity difference out of two parts; 
we can think of acceleration as having two components , AVj , in the direction tangent 
to the path and A\ ± at right angles to the path, as indicated in Fig. 11-8. The 
acceleration tangent to the path is, of course, just the change in the length of the 
vector, i.e., the change in the speed v: 

a„ = dv/dt. (11.15) 

The other component of acceleration, at right angles to the curve, is easy to cal- 
culate, using Figs. 11-7 and 11-8. In the short time At let the change in angle 
between Vi and v 2 be the small angle A0. If the magnitude of the velocity is called 
v, then of course 

Av ± — v Ad 

and the acceleration a will be 

a ± = v (Ad/ At). 


Now we need to know Ad/ At, which can be found this way: If, at the given moment, 
the curve is approximated as a circle of a certain radius R , then in a time At the 
distance s is, of course, v At, where v is the speed. 


AS — v (A t/R\ or Ad/A t — v/R. 


Therefore, we find 

a = v 2 /R , 

as we have seen before. 


(11.16) 


11-7 Scalar product of vectors 

Now let us examine a little further the properties of vectors. It is easy to see 
that the length of a step in space would be the same in any coordinate system. 
That is, if a particular step r is represented by jc, y, z, in one coordinate system, 
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and by x f , /, 2' in another coordinate system, surely the distance r = |r| would 
be the same in both. Now 


r = V* 2 + y 2 + z 2 

and also 

r’ = VV 2 + / 2 + z' 2 . 

So what we wish to verify is that these two quantities are equal. It is much mote 
convenient not to bother to take the square root, so let us talk about the square of 
the distance; that is, let us find out whether 

x 2 + y 2 + z 2 = x' 2 + y' 2 + z' 2 . (11.17) 

It had better be — and if we substitute Eq. (11.5) we do indeed find that it is. So 
we see that there are other kinds of equations which are true for any two coordinate 
systems. 

Something new is involved. We can produce a new quantity, a function of 
x , 7, and 2, called a scalar function , a quantity which has no direction but which is 
the same in both systems. Out of a vector we can make a scalar. We have to find 
a general rule for that. It is clear what the rule is for the case just considered : 
add the squares of the components. Let us now define a new thing, which we call 
a • a. This is not a vector, but a scalar; it is a number that is the same in all coordi- 
nate systems, and it is defined to be the sum of the squares of the three components 
of the vector: 

a -a = a x + aj + a 2 . (11.18) 

Now you say, “But with what axes?” It does not depend on the axes, the answer 
is the same in every set of axes. So we have a new kind of quantity, a new invariant 
or scalar produced by one vector “squared.” If we now define the following 
quantity for any two vectors a and b: 


a * b • — &yby Clzl)z> (11.1 9) 

we find that this quantity, calculated in the primed and unprimed systems, also 
stays the same. To prove it we note that it is true of a * a, b • b, and c • c, where 
c — a T b. Therefore the sum of the squares ( a x + b x ) 2 + ( a y + b y ) 2 + 
( a z + b z ) 2 will be invariant: 

(a x + b x ) 2 + ( a y + b v ) 2 + (a z + b z ) 2 = (a x > -j- b x >) 2 

+ (fly' + M 2 + (fl,' + M 2 . (11*20) 

If both sides of this equation are expanded, there will be cross products of just the 
type appearing in Eq. (11.19), as well as the sums of squares of the components 
of a and b. The invariance of terms of the form of Eq. (11.18) then leaves the cross 
product terms (11.19) invariant also. 

The quantity a • b is called the scalar product of two vectors, a and b, and it 
has many interesting and useful properties. For instance, it is easily proved that 

a • (b 4* c) = a * b + a * c. (1 1.21) 

Also, there is a simple geometrical way to calculate a • b, without having to cal- 
culate the components of a and b: a • b is the product of the length of a and the 
length of b times the cosine of the angle between them. Why? Suppose that we 
choose a special coordinate system in which the x-axis lies along a; in those cir- 
cumstances, the only component of a that will be there is a x , which is of course 
the whole length of a. Thus Eq. (11.19) reduces to a • b = a x b x for this case, 
and this is the length of a times the component of b in the direction of a, that is, 
b cos 0: 

a * b = ab cos 0. 

Therefore, in that special coordinate system, we have proved that a • b is the 
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K ngth of a times the length of b times cos 6. But if it is true in one coordinate system, 
it is true in all, because a • b is independent of the coordinate system ; that is our 
argument. 

What good is the dot product? Are there any cases in physics where we need 
it? Yes, we need it all the time. For instance, in Chapter 4 the kinetic energy was 
called imv 2 , but if the object is moving in space it should be the velocity squared 
in the x-direction, the ^-direction, and the z-direction, and so the formula for 
kinetic energy according to vector analysis is 

K.E. = Jm(vv) = + v\ + v 2 z ). (11.22) 

Energy does not have direction. Momentum has direction ; it is a vector, and it is 
the mass times the velocity vector. 

Another example of a dot product is the work done by a force when something 
is pushed from one place to the other. We have not yet defined work, but it is 
equivalent to the energy change, the weights lifted, when a force F acts through 
a distance s: 

Work = F * s (11.23) 

It is sometimes very convenient to talk about the component of a vector in a 
certain direction (say the vertical direction because that is the direction of gravity). 
For such purposes, it is useful to invent what we call a unit vector in the direction 
that we want to study. By a unit vector we mean one whose dot product with 
itself is equal to unity. Let us call this unit vector i; then i ■ i = 1 . Then, if we want 
the component of some vector in the direction of i, we see that the dot product 
a * i will be a cos 0, i.e., the component of a in the direction of i. This is a nice 
way to get the component; in fact, it permits us to get all the components and to 
write a rather amusing formula. Suppose that in a given system of coordinates, 
x, y, and z, we invent three vectors: i, a unit vector in the direction x; j, a unit vector 
in the direction y; and k, a unit vector in the direction z. Note first that i * i = 1. 
What is i • j? When two vectors are at right angles, their dot product is zero. 
Thus 

i i = 1 

i*j = 0 j j = 1 

i • k = 0 j-k = 0 k * k = 1 (11.24) 

Now with these definitions, any vector whatsoever can be written this way: 

a = a x i + a y ] + a z k. (11.25) 

By this means we can go from the components of a vector to the vector itself. 

This discussion of vectors is by no means complete. However, rather than 
try to go more deeply into the subject now, we shall first learn to use in physical 
situations some of the ideas so far discussed. Then, when we have properly mastered 
this basic material, we shall find it easier to penetrate more deeply into the subject 
without getting too confused. We shall later find that it is useful to define another 
kind of product of two vectors, called the vector product, and written as a X b. 
However, we shall undertake a discussion of such matters in a later chapter. 
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12 


Characteristics of Force 


12-1 What is a force? 

Although it is interesting and worth while to study the physical laws simply 
because they help us to understand and to use nature, one ought to stop every once 
in a while and think, “What do they really mean?” The meaning of any statement 
is a subject that has interested and troubled philosophers from time immemorial, 
and the meaning of physical laws is even more interesting, because it is generally 
believed that these laws represent some kind of real knowledge. The meaning of 
knowledge is a deep problem in philosophy, and it is always important to ask, 
“What does it mean?” 

Let us ask, “What is the meaning of the physical laws of Newton, which we 
write as F = mal What is the meaning of force, mass, and acceleration?” Well, 
we can intuitively sense the meaning of mass, and we can define acceleration if we 
know the meaning of position and time. We shall not discuss those meanings, 
but shall concentrate on the new concept of force . The answer is equally simple: 
“If a body is accelerating, then there is a force on it.” That is what Newton’s laws 
say, so the most precise and beautiful definition of force imaginable might simply 
be to say that force is the mass of an object times the acceleration. Suppose we 
have a law which says that the conservation of momentum is valid if the sum 
of all the external forces is zero; then the question arises, “What does it mean , 
that the sum of all the external forces is zero?” A pleasant way to define that 
statement would be: “When the total momentum is a constant, then the sum of the 
external forces is zero.” There must be something wrong with that, because it is 
just not saying anything new. If we have discovered a fundamental law, which 
asserts that the force is equal to the mass times the acceleration, and then define the 
force to be the mass times the acceleration, we have found out nothing. We could 
also define force to mean that a moving object with no force acting on it continues 
to move with constant velocity in a straight line. If we then observe an object 
not moving in a straight line with a constant velocity, we might say that 
there is a force on it. Now such things certainly cannot be the content of physics, 
because they are definitions going in a circle. The Newtonian statement above, 
however, seems to be a most precise definition of force, and one that appeals to 
the mathematician; nevertheless, it is completely useless, because no prediction 
whatsoever can be made from a definition. One might sit in an armchair all day 
long and define words at will, but to find out what happens when two balls push 
against each other, or when a weight is hung on a spring, is another matter al- 
together, because the way the bodies behave is something completely outside any 
choice of definitions. 

For example, if we were to choose to say that an object left to itself keeps its 
position and does not move, then when we see something drifting, we could say 
that must be due to a “gorce” — a gorce is the rate of change of position. Now we 
have a wonderful new law, everything stands still except when a gorce is acting. You 
see, that would be analogous to the above definition of force, and it would contain no 
information. The real content of Newton’s laws is this: that the force is supposed 
to have some independent properties , in addition to the law F = ma; but the 
specific independent properties that the force has were not completely described 
by Newton or by anybody else, and therefore the physical law F = ma is an 
incomplete law. It implies that if we study the mass times the acceleration and 
call the product the force, i.e., if we study the characteristics of force as a program 
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of interest, then we shall find that forces have some simplicity; the law is a good 
program for analyzing nature, it is a suggestion that the forces will be simple. 

Now the first example of such forces was the complete law of gravitation, 
which was given by Newton, and in stating the law he answered the question, 
“What is the force?” If there were nothing but gravitation, then the combination 
of this law and the force law (second law of motion) Would be a complete theory, 
but there is much more than gravitation, and we want to use Newton’s laws in 
many different situations. Therefore in order to proceed we have to tell something 
about the properties of force. 

For example, in dealing with force the tacit assumption is always made that 
the force is equal to zero unless some physical body is present, that if we find a 
force that is not equal to zero we also find something in the neighborhood that 
is a source of the force. This assumption is entirely different from the case of the 
“gorce” that we introduced above. One of the most important characteristics of 
force is that it has a material origin, and this is not just a definition. 

Newton also gave one rule about the force: that the forces between interacting 
bodies are equal and opposite — action equals reaction; that rule, it turns out, is 
not exactly true. In fact, the law F ~ ma is not exactly true; if it were a definition 
we should have to say that it is always exactly true; but it is not. 

The student may object, “I do not like this imprecision, I should like to have 
everything defined exactly; in fact, it says in sofne books that any science is an exact 
subject, in which everything is defined.” If you insist upon a precise definition of 
force, you will never get it 1 First, because Newton’s Second Law is not exact, and 
second, because in order to understand physical laws you must understand that 
they are all some kind of approximation. 

Any simple idea is approximate; as an illustration, consider an object, . . . 
what is an object? Philosophers are always saying, “Well, just take a chair for 
example.” The moment they say that, you know that they do not know what 
they are talking about any more. What is a chair? Well, a chair is a certain thing 
over there . . . certain?, how certain? The atoms are evaporating from it from time 
to time — not many atoms, but a few — dirt falls on it and gets dissolved in the paint; 
so to define a chair precisely, to say exactly which atoms are chair, and which 
atoms are air, or which atoms are dirt, or which atoms are paint that belongs to 
the chair is impossible. So the mass of a chair can be defined only approximately. 
In the same way, to define the mass of a single object is impossible, because there 
are not any single, left-alone objects in the world — every object is a mixture of a 
lot of things, so we can deal with it only as a series of approximations and idealiza- 
tions. 

The trick is the idealizations. To an excellent approximation of perhaps one 
part in 10 10 , the number of atoms in the chair does not change in a minute, and if 
we are not too precise we may idealize the chair as a definite thing; in the same way 
we shall learn about the characteristics of force, in an ideal fashion, if we are not 
too precise. One may be dissatisfied with the approximate view of nature that 
physics tries to obtain (the attempt is always to increase the accuracy of the 
approximation), and may prefer a mathematical definition; but mathematical 
definitions can never work in the real world. A mathematical definition will be 
good for mathematics, in which all the logic can be followed out completely, but 
the physical world is complex, as we have indicated in a number of examples, such 
as those of the ocean waves and a glass of wine. When we try to isolate pieces of it, 
to talk about one mass, the wine and the glass, how can we know which is which, 
when one dissolves in the other? Xhe forces on a single thing already involve 
approximation, and if we have a system of discourse about the real world, then 
that system, at least for the present day, must involve approximations of some 
kind. 

This system is quite unlike the case of mathematics, in which everything can 
be defined, and then we do not know what we are talking about. In fact, the glory 
of mathematics is that we do not have to say what we are talking about. The glory 
is that the laws, the arguments, and the logic are independent of what “it” is. If 
we have any other set of objects that obey the same system of axioms as Euclid’s 
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geometry, then if we make new definitions and follow them out with correct logic, 
all the consequences will be correct, and it makes no difference what the subject 
was. In nature, however, when we draw a line or establish a line by using a light 
beam and a theodolite, as we do in surveying, are we measuring a line in the sense 
of Euclid? No, we are making an approximation; the cross hair has some width, 
but a geometrical line has no width, and so, whether Euclidean geometry can be 
used for surveying or not is a physical question, not a mathematical question. 
However, from an experimental standpoint, not a mathematical standpoint, we 
need to know whether the laws of Euclid apply to the kind of geometry that we 
use in measuring land ; so we make a hypothesis that it does, and it works pretty 
well; but it is not precise, because our surveying lines are not really geometrical 
lines. Whether or not those lines of Euclid, which are really abstract, apply to the 
lines of experience is a question for experience; it is not a question that can be 
answered by sheer reason. 

In the same way, we cannot just call F = ma a definition, deduce everything 
purely mathematically, and make mechanics a mathematical theory, when me- 
chanics is a description of nature. By establishing suitable postulates it is always 
possible to make a system of mathematics, just as Euclid did, but we cannot make 
a mathematics of the world, because sooner or later we have to find out whether 
the axioms are valid for the objects of nature. Thus we immediately get involved 
with these complicated and “dirty'’ objects of nature, but with approximations 
ever increasing in accuracy. 


12-2 Friction 

The foregoing considerations show that a true understanding of Newton’s 
laws requires a discussion of forces, and it is the purpose of this chapter to introduce 
such a discussion, as a kind of completion of Newton’s laws. We have already 
studied the definitions of acceleration and related ideas, but now we have to study 
the properties of force, and this chapter, unlike the previous chapters, will not be 
very precise, because forces are quite complicated. 

To begin with a particular force, let us consider the drag on an airplane 
flying through the air. What is the law for that force? (Surely there is a law for 
every force, we must have a law!) One can hardly think that the law for that force 
will be simple. Try to imagine what makes a drag on an airplane flying through 
the air — the air rushing over the wings, the swirling in the back, the changes going 
on around the fuselage, and many other complications, and you see that there is 
not going to be a simple law. On the other hand, it is a remarkable fact that the 
drag force on an airplane is approximately a constant times the square of the 
velocity, or F ~ cv 2 . 

Now what is the status of such a law, is it analogous to F = mal Not at all, 
because in the first place this law is an empirical thing that is obtained roughly by 
tests in a wind tunnel. You say, “Well F — ma might be empirical too.” That is 
not the reason that there is a difference. The difference is not that it is empirical, 
but that, as we understand nature, this law is the result of an enormous complexity 
of events and is not, fundamentally, a simple thing. If we continue to study it more 
and more, measuring more and more accurately, the law will continue to become 
more complicated, not less. In other words, as we study this law of the drag on an 
airplane more and more closely, we find out that it is “falser” and “falser,” and 
the more deeply we study it, and the more accurately we measure, the more compli- 
cated the truth becomes ; so in that sense we consider it not to result from a simple, 
fundamental process, which agrees with our original surmise. For example, if the 
velocity is extremely low, so low that an ordinary airplane is not flying, as when 
the airplane is dragged slowly through the air, then the law changes, and the drag 
friction depends more nearly linearly on the velocity. To take another example, 
the frictional drag on a ball or a bubble or anything that is moving slowly through 
a viscous liquid like honey, is proportional to the velocity, but for motion so fast 
that the fluid swirls around (honey does not but water and air do) then the drag 
becomes more nearly proportional to the square of the velocity (F = cv 2 ), and 
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if the velocity continues to increase, then even this law begins to fail. People who 
say, “Well the coefficient changes slightly,” are dodging the issue. Second, there 
are other great complications: can this force on the airplane be divided or analyzed 
as a force on the wings, a force on the front, and so on? Indeed, this can be done, 
if we are concerned about the torques here and there, but then we have to get 
special laws for the force on the wings, and so on. It is an amazing fact that the 
force on a wing depends upon the other wing: in other words, if we take the airplane 
apart and put just one wing in the air, then the force is not the same as if the rest 
of the plane were there. The reason, of course, is that some of the wind that hits 
the front goes around to the wings and changes the force on the wings. It seems a 
miracle that there is such a simple, rough, empirical law that can be used in the 
design of airplanes, but this law is not in the same class as the basic laws of physics, 
and further study of it will only make it more and more complicated. A study of 
how the coefficient c depends on the shape of the front of the airplane is, to put 
it mildly, frustrating. There just is no simple law for determining the coefficient 
in terms of the shape of the airplane. In contrast, the law of gravitation is simple, 
and further study only indicates its greater simplicity. 

We have just discussed two cases of friction, resulting from fast movement in 
air and slow movement in honey. There is another kind of friction, called dry 
friction or sliding friction, which occurs when one solid body slides on another. 
In this case a force is needed to maintain motion. This is called a frictional force, 
and its origin, also, is a very complicated matter. Both surfaces of contact are 
irregular, on an atomic level. There are many points of contact where the atoms 
seem to cling together, and then, as the sliding body is pulled along, the atoms 
snap apart and vibration ensues; something like that has to happen. Formerly 
the mechanism of this friction was thought to be very simple, that the surfaces 
were merely full of irregularities and the friction originated in lifting the slider 
over the bumps; but this cannot be, for there is no loss of energy in that process, 
whereas power is in fact consumed. The mechanism of power loss is that as the 
slider snaps over the bumps, the bumps deform and then generate waves and 
atomic motions and, after a while, heat, in the two bodies. Now it is very remark- 
able that again, empirically, this friction can be described approximately by a 
simple law. This law is that the force needed to overcome friction and to drag one 
object over another depends upon the normal force (i.e., perpendicular to the 
surface) between the two surfaces that are in contact. Actually, to a fairly good 
approximation, the frictional force is proportional to this normal force, and has 
a more or less constant coefficient; that is, 


Fig. 12-1. The relation between fric- 
tional force and the normal force for 
sliding contact. 


F = fxN, (12.1) 

where ju is called the coefficient of friction (Fig. 12-1). Although this coefficient is 
not exactly constant, the formula is a good empirical rule for judging approxi- 
mately the amount of force that will be needed in certain practical or engineering 
circumstances. If the normal force or the speed of motion gets too big, the law fails 
because of the excessive heat generated. It is important to realize that each of these 
empirical laws has its limitations, beyond which it does not really work. 

That the formula F = fiN is approximately correct can be demonstrated by 
a simple experiment. We set up a plane, inclined at a small angle 8, and place a 
block of weight W on the plane. We then tilt the plane at a steeper angle, until 
the block just begins to slide from its own weight. The component of the weight 
downward along the plane is W sin 8 , and this must equal the frictional force F 
when the block is sliding uniformly. The component of the weight normal to the 
plane is W cos 8 , and this is the normal force N. With these values, the formula 
becomes W sin 8 = juW cos 8, from which we get ju = sin 0/cos 8 = tan 8. If 
this law were exactly true, an object would start to slide at some definite inclination. 
If the same block is loaded by putting extra weight on it, then, although W is 
increased, all the forces in the formula are increased in the same proportion, and 
W cancels out. If y stays constant, the loaded block will slide again at the same 
slope. When the angle 8 is determined by trial with the original weight, it is found 
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that with the greater weight the block will slide at about the same angle. This will 
be true even when one weight is many times as great as the other, and so we con- 
clude that the coefficient of friction is independent of the weight. 

In performing this experiment it is noticeable that when the plane is tilted 
at about the correct angle 6, the block does not slide steadily but in a halting fashion. 
At one place it may stop, at another it may move with acceleration. This behavior 
indicates that the coefficient of friction is only roughly a constant, and varies from 
place to place along the plane. The same erratic behavior is observed whether the 
block is loaded or not. Such variations are caused by different degrees of smooth- 
ness or hardness of the plane, and perhaps dirt, oxides, or other foreign matter. 
The tables that list purported values of ju for “steel on steel,” “copper on copper,” 
and the like, are all false, because they ignore the factors mentioned above, which 
really determine p. The friction is never due to “copper on copper,” etc., but to 
the impurities clinging to the copper. 

In experiments of the type described above, the friction is nearly independent 
of the velocity. Many people believe that the friction to be overcome to get 
something started (static friction) exceeds the force required to keep it sliding 
(sliding friction), but with dry metals it is very hard to show any difference. The 
opinion probably arises from experiences where small bits of oil or lubricant are 
present, or where blocks, for example, are supported by springs or other flexible 
supports so that they appear to bind. 

It is quite difficult to do accurate quantitative experiments in friction, and the 
laws of friction are still not analyzed very well, in spite of the enormous engineering 
value of an accurate analysis. Although the law F = fiN is fairly accurate once the 
surfaces are standardized, the reason for this form of the law is not really under- 
understood. To show that the coefficient p is nearly independent of velocity 
requires some delicate experimentation, because the apparent friction is much 
reduced if the lower surface vibrates very fast. When the experiment is done at 
very high speed, care must be taken that the objects do not vibrate relative to one 
another, since apparent decreases of the friction at high speed are often due to 
vibrations. At any rate, this friction law is another of those semiempirical laws 
that are not thoroughly understood, and in view of all the work that has been 
done it is surprising that more understanding of this phenomenon has not come 
about. At the present time, in fact, it is impossible even to estimate the coefficient 
of friction between two substances. 

It was pointed out above that attempts to measure \x by sliding pure substances 
such as copper on copper will lead to spurious results, because the surfaces in 
contact are not pure copper, but are mixtures of oxides and other impurities. 
If we try to get absolutely pure copper, if we clean and polish the surfaces, outgas 
the materials in a vacuum, and take every conceivable precaution, we still do not 
get ju. For if we tilt the apparatus even to a vertical position, the slider will not 
fall off — the two pieces of copper stick together! The coefficient ju, which is ordi- 
narily less than unity for reasonably hard surfaces, becomes several times unity! 
The reason for this unexpected behavior is that when the atoms in contact are all 
of the same kind, there is no way for the atoms to “know” that they are in different 
pieces of copper. When there are other atoms, in the oxides and greases and 
more complicated thin surface layers of contaminants in between, the atoms 
“know” when they are not on the same part. When we consider that it is forces 
between atoms that hold the copper together as a solid, it should become clear 
that it is impossible to get the right coefficient of friction for pure metals. 

The same phenomenon can be observed in a simple home-made experiment 
with a flat glass plate and a glass tumbler. If the tumbler is placed on the plate and 
pulled along with a loop of string, it slides fairly well and one can feel the coefficient 
of friction; it is a little irregular, but it is a coefficient. If we now wet the glass plate 
and the bottom of the tumbler and pull again, we find that it binds, and if we look 
closely we shall find scratches, because the water is able to lift the grease and the 
other contaminants off the surface, and then we really have a glass-to-glass contact; 
this contact is so good that it holds tight and resists separation so much that the 
glass is torn apart; that is, it makes scratches. 
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12-3 Molecular forces 



Fig. 12-2. The force between two 
atoms as a function of their distance of 
separation. 


We shall next discuss the characteristics of molecular forces. These are forces 
between the atoms, and are the ultimate origin of friction. Molecular forces have 
never been satisfactorily explained on a basis of classical physics; it takes quantum 
mechanics to understand them fully. Empirically, however, the force between 
atoms is illustrated schematically in Fig. 12-2, where the force F between two 
atoms is plotted as a function of the distance r between them. There are different 
cases: in the water molecule, for example, the negative charges sit more on the 
oxygen, and the mean positions of the negative charges and of the positive charges 
are not at the same point; consequently, another molecule nearby feels a relatively 
large force, which is called a dipole-dipole force. However, for many systems 
the charges are very much better balanced, in particular for oxygen gas, which is 
perfectly symmetrical. In this case, although the minus charges and the plus 
charges are dispersed over the molecule, the distribution is such that the center 
of the minus charges and the center of the plus charges coincide. A molecule 
where the centers do not coincide is called a polar molecule, and charge times the 
separation between centers is called the dipole moment. A nonpolar molecule is 
one in which the centers of the charges coincide. For all nonpolar molecules, in 
which all the electrical forces are neutralized, it nevertheless turns out that the 
force at very large distances is an attraction and varies inversely as the seventh 
power of the distance, or F — k/r 7 , where k is a constant that depends on the 
molecules. Why this is we shall learn only when we learn quantum mechanics. 
When there are dipoles the forces are greater. When atoms or molecules get too 
close they repel with a very large repulsion; that is what keeps us from falling 
through the floor! 

These molecular forces can be demonstrated in a fairly direct way: one of 
these is the friction experiment with a sliding glass tumbler; another is to take two 
very carefully ground and lapped surfaces which are very accurately flat, so that 
the surfaces can be brought very close together. An example of such surfaces is 
the Johansson blocks that are used in machine shops as standards for making 
accurate length measurements. If one such block is slid over another very carefully 
and the upper one is lifted, the other one will adhere and also be lifted by the molec- 
ular forces, exemplifying the direct attraction between the atoms on one block 
for the atoms on the other block. 

Nevertheless these molecular forces of attraction are still not fundamental 
in the sense that gravitation is fundamental; they are due to the vastly complex 
interactions of all the electrons and nuclei in one molecule with all the electrons 
and nuclei in another. Any simple-looking formula we get represents a sum- 
mation of complications, so we still have not got the fundamental phenomena. 

Since the molecular forces attract at large distances and repel at short dis- 
tances, as shown in Fig. 12-2, we can make up solids in which all the atoms are 
held together by their attractions and held apart by the repulsion that sets in when 
they are too close together. At a certain distance d (where the graph in Fig. 12-2 
crosses the axis) the forces are zero, which means that they are all balanced, so that 
the molecules stay that distance apart from one another. If the molecules are 
pushed closer together than the distance d they all show a repulsion, represented by 
the portion of the graph above the r-axis. To push the molecules only slightly 
closer together requires a great force, because the molecular repulsion rapidly 
becomes very great at distances less than d. If the molecules are pulled slightly 
apart there is a slight attraction, which increases as the separation increases. If 
they are pulled sufficiently hard, they will separate permanently — the bond is 
broken. 

If the molecules are pushed only a very small distance closer, or pulled only 
a very small distance farther than d , the corresponding distance along the curve of 
Fig. 12-2 is also very small, and can then be approximated by a straight line. 
Therefore, in many circumstances, if the displacement is not too great the force is 
proportional to the displacement. This principle is known as Hooke’s law, or the 
law of elasticity, which says that the force in a body which tries to restore the body 
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to its original condition when it is distorted is proportional to the distortion. This 
law, of course, holds true only if the distortion is relatively small; when it gets too 
large the body will be torn apart or crushed, depending on the kind of distortion. 
The amount of force for which Hooke’s law is valid depends upon the material; for 
instance, for dough or putty the force is very small, but for steel it is relatively large. 
Hooke’s law can be nicely demonstrated with a long coil spring, made of steel and 
suspended vertically. A suitable weight hung on the lower end of the spring 
produces a tiny twist throughout the length of the wire, which results in a small 
vertical deflection in each turn and adds up to a large displacement if there are 
many turns. If the total elongation produced, say, by a 100-gram weight, is meas- 
ured, it is found that additional weights of 100 grams will each produce an addi- 
tional elongation that is very nearly equal to the stretch that was measured for the 
first 100 grams. This constant ratio of force to displacement begins to change when 
the spring is overloaded, i.e., Hooke’s law no longer holds. 

12-4 Fundamental forces. Fields 

We shall now discuss the only remaining forces that are fundamental. We 
call them fundamental in the sense that their laws are fundamentally simple. We 
shall first discuss electrical force. Objects carry electrical charges which consist 
simply of electrons or protons. If any two bodies are electrically charged, there 
is an electrical force between them, and if the magnitudes of the charges are 
qi and q 2 , respectively, the force varies inversely as the square of the distance 
between the charges, or F ~ (const) q\q 2 /r 2 . For unlike charges, this law is like 
the law of gravitation, but for like charges the force is repulsive and the sign 
(direction) is reversed. The charges q i and q 2 can be intrinsically either positive 
or negative, and in any specific application of the formula the direction of the 
force will come out right if the q' s are given the proper plus or minus sign; the force 
is directed along the line between the two charges. The constant in the formula 
depends, of course, upon what units are used for the force, the charge, and the 
distance. In current practice the charge is measured in coulombs, the distance in 
meters, and the force in newtons. Then, in order to get the force to come out prop- 
erly in newtons, the constant (which for historical reasons is written l/47re 0 ) 
takes the numerical value 

€ 0 = 8.854 X 10“ 12 coul 2 /newton • m 2 
or 

1/47T6Q = 8.99 X 10 9 n * m 2 /coul 2 . 

Thus the force law for static charges is 

F = 2i42r/47T€ 0 r 3 . (12.2) 

In nature, the most important charge of all is the charge on a single electron, which 
is 1.60 X 10“ 19 coulomb. In working with electrical forces between fundamental 
particles rather than with large charges, many people prefer the combination 
(q e i) 2 /4x€ 0 , in which q Q i is defined as the charge on an electron. This combination 
occurs frequently, and to simplify calculations it has been defined by the symbol 
e 2 ; its numerical value in the mks system of units turns out to be (1.52 X 10“ 14 ) 2 . 
The advantage of using the constant in this form is that the force between two 
electrons in newtons can then be written simply as e 2 /r 2 , with r in meters, without 
all the individual constants. Electrical forces are much more complicated than 
this simple formula indicates, since the formula gives the force between two 
objects only when the objects are standing still. We shall consider the more 
general case shortly. 

In the analysis of forces of the more fundamental kinds (not such forces as 
friction, but the electrical force or the gravitational force), an interesting and very 
important concept has been developed. Since at first sight the forces are very 
much more complicated than is indicated by the inverse-square laws and these 
laws hold true only when the interacting bodies are standing still, an improved 
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method is needed to deal with the very complex forces that ensue when the bodies 
start to move in a complicated way. Experience has shown that an approach 
known as the concept of a “field” is of great utility for the analysis of forces of this 
type. To illustrate the idea for, say, electrical force, suppose we have two electrical 
charges, q x and q 2 , located at points P and R respectively. Then the force between 
the charges is given by 

F = qmr/r 3 . (12.3) 

To analyze this force by means of the field concept, we say that the charge q x at P 
produces a “condition” at R, such that when the charge q 2 is placed at R it “feels” 
the force. This is one way, strange perhaps, of describing it; we say that the force 
F on q 2 at R can be written in two parts. It is q 2 multiplied by a quantity E that 
would be there whether q 2 were there or not (provided we keep all the other charges 
in their right places). E is the “condition” produced by q u we say, and F is the 
response of q 2 to E. E is called an electric field , and it is a vector. The formula for 
the electric field E that is produced at R by a charge q x at P is the charge q x times 
the constant l/47r€o divided by r 2 (r is the distance from P to R ), and it is acting in 
the direction of the radius vector (the radius vector r divided by its own length). 
The expression for E is thus 

E = qiT/47re 0 r 3 . (12.4) 

We then write 

F = q 2 E, (12.5) 

which expresses the force, the field, and the charge in the field. What is the point 
of all this? The point is to divide the analysis into two parts. One part says that 
something produces a field. The other part says that something is acted on by 
the field. By allowing us to look at the two parts independently, this separation 
of the analysis simplifies the calculation of a problem in many situations. If 
many charges are present, we first work out the total electric field produced at R 
by all the charges, and then, knowing the charge that is placed at R , we find the 
force on it. 

In the case of gravitation, we can do exactly the same thing. In this case, 
where the force F — — Gm x m 2 r/r z , we can make an analogous analysis, as follows : 
the force on a body in a gravitational field is the mass of that body times the field 
C. The force on m 2 is the mass m 2 times the field C produced by m x \ that is, 
F = m 2 C. Then the field C produced by a body of mass m x is C = — Gmiv/r z 
and it is directed radially, as in the electrical case. 

In spite of how it might at first seem, this separation of one part from another 
is not a triviality. It would be trivial, just another way of writing the same thing, 
if the laws of force were simple, but the laws of force are so complicated that it 
turns out that the fields have a reality that is almost independent of the objects 
which create them. One can do something like shake a charge and produce an 
effect, a field, at a distance; if one then stops moving the charge, the field keeps 
track of all the past, because the interaction between two particles is not in- 
stantaneous. It is desirable to have some way to remember what happened previ- 
ously. If the force upon some charge depends upon where another charge was 
yesterday, which it does, then we need machinery to keep track of what went on 
yesterday, and that is the character of a field. So when the forces get more compli- 
cated, the field becomes more and more real, and this technique becomes less and 
less of an artificial separation. 

In analyzing forces by the use of fields, we need two kinds of laws pertaining 
to fields. The first is the response to a field, and that gives the equations of motion. 
For example, the law of response of a mass to a gravitational field is that the force 
is equal to the mass times the gravitational field; or, if there is also a charge 
on the body, the response of the charge to the electric field equals the charge times 
the electric field. The second part of the analysis of nature in these situations is to 
formulate the laws which determine the strength of the field and how it is produced. 
These laws are sometimes called the field equations. We shall learn more about 
them in due time, but shall write down a few things about them now. 
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First, the most remarkable fact of all, which is true exactly and which can 
be easily understood, is that the total electric field produced by a number of sources 
is the vector sum of the electric fields produced by the first source, the second source, 
and so on. In other words, if we hav£ numerous charges making a field, and if all 
by itself one of them would make the field E 1# another would make the field E 2 , 
and so on, then we merely add the vectors to get the total field. This principle 
can be expressed as 

E = Ei + E 2 + E 3 + • • • (12.6) 

or, in view of the definition given above, 


* = £ 

1 


47 r€ 0 /-? 


(12.7) 


Can the same methods be applied to gravitation? The force between two 
masses m x and m 2 was expressed by Newton as F— Gmim 2 T/r 3 . But according 
to the field concept, we may say that m x creates a field C in all the surrounding 
space, such that the force on m 2 is given by 

F = m 2 C. (12.8) 

By complete analogy with the electrical case, 

C = -Gm x xjr\ (12.9) 

and the gravitational field produced by several masses is 

C = C x + C 2 + C 3 + (12.10) 

In Chapter 7, in working out a case of planetary motion, we used this principle in 
essence. We simply added all the force vectors to get the resultant force on a 
planet. If we divide out the mass of the planet in question, we get Eq. (12.10). 

Equations (12.6) and (12.10) express what is known as the principle of super- 
position of fields. This principle states that the total field due to all the sources is 
the sum of the fields due to each source. So far as we know today, for electricity 
this is an absolutely guaranteed law, which is true even when the force law is 
complicated because of the motions of the charges. There are apparent violations, 
but more careful analysis has always shown these to be due to the overlooking of 
certain moving charges. However, although the principle of superposition applies 
exactly for electrical forces, it is not exact for gravity if the field is too strong, and 
Newton’s equation (12.10) is only approximate, according to Einstein’s gravita- 
tional theory. 

Closely related to electrical force is another kind, called magnetic force, and 
this too is analyzed in terms of a field. Some of the qualitative relations between 
electrical and magnetic forces can be illustrated by an experiment with an electron- 
ray tube (Fig. 12-3). At one end of such a tube is a source that emits a stream of 
electrons. Within the tube are arrangements for accelerating the electrons to a 
high speed and sending some of them in a narrow beam to a fluorescent screen at 
the other end of the tube. A spot of light glows in the center of the screen where 
the electrons strike, and this enables us to trace the electron path. On the way to 
the screen the electron beam passes through a narrow space between a pair of 
parallel metal plates, which are arranged, say, horizontally. A voltage can be 
applied across the plates, so that either plate can be made negative at will. When 
such a voltage is present, there is an electric field between the plates. 

The first part of the experiment is to apply a negative voltage to the lower 
plate, which means that extra electrons have been placed on the lower plate. Since 
like charges repel, the light spot on the screen instantly shifts upward. (We could 
also say this in another way — that the electrons “felt” the field, and responded by 
deflecting upward.) We next reverse the voltage, making the upper plate negative. 
The light spot on the screen now jumps below the center, showing that the electrons 
in the beam were repelled by those in the plate above them. (Or we could say again 

12-9 



Fig. 1 2-3. An electron-beam tube. 



that the electrons had “responded” to the field, which is now in the reverse direc- 
tion.) 

The second part of the experiment is to disconnect the voltage from the 
plates and test the effect of a magnetic field on the electron beam. This is done by 
means of a horseshoe magnet, whose poles are far enough apart to more or less 
straddle the tube. Suppose we hold the magnet below the tube in the same orienta- 
tion as the letter U, with its poles up and part of the tube in between. We note 
that the light spot is deflected, say, upward, as the magnet approaches the tube 
from below. So it appears that the magnet repels the electron beam. However, it 
is not that simple, for if we invert the magnet without reversing the poles side-for- 
side, and now approach the tube from above, the spot still moves upward , so the 
electron beam is not repelled; instead, it appears to be attracted this time. Now 
we start again, restoring the magnet to its original U orientation and holding it 
below the tube, as before. Yes, the spot is still deflected upward; but now turn 
the magnet 180 degrees around a vertical axis, so that it is still in the U position 
but the poles are reversed side-for-side. Behold, the spot now jumps downward, 
and stays down, even if we invert the magnet and approach from above, as before. 

To understand this peculiar behavior, we have to have a new combination 
of forces. We explain it thus: Across the magnet from one pole to the other there 
is a magnetic field. This field has a direction which is always away from one 
particular pole (which we could mark) and toward the other. Inverting the magnet 
did not change the direction of the field, but reversing the poles side-for-side did 
reverse its direction. For example, if the electron velocity were horizontal in the 
x>direction and the magnetic field were also horizontal but in the jy-direction, the 
magnetic force on the moving electrons would be in the z-direction, i.e., up or down, 
depending on whether the field was in the positive or negative ^-direction. 

Although we shall not at the present time give the correct law of force between 
charges moving in an arbitrary manner, one relative to the other, because it is too 
complicated, we shall give one aspect of it: the complete law of the forces if the 
fields are known. The force on a charged object depends upon its motion; if, 
when the object is standing still at a given place, there is some force, this is taken 
to be proportional to the charge, the coefficient being what we call the electric field . 
When the object moves the force may be different, and the correction, the new 
“piece” of force, turns out to be dependent exactly linearly on the velocity , but at 
right angles to v and to another vector quantity which we call the magnetic 
induction B. If the components of the electric field E and the magnetic induction 
B are, respectively, (E x , E yi E z ) and ( B Xi B y > B z ), and if the velocity v has the 
components (v x , v y , v z \ then the total electric and magnetic force on a moving 
charge q has the components 

F x = q{E x + v y B z - v z B y \ 

Fy = q{Fy + V Z B X - v x B z \ (12.11) 

F z = q(E z + v x B y - v y B x ). 

If, for instance, the only component of the magnetic field were B v and the only 
component of the velocity were v x , then the only term left in the magnetic force 
would be a force in the z-direction, at right angles to both B and v. 


12-5 Pseudo forces 

The next kind of force we shall discuss might be called a pseudo force. In 
Chapter 1 1 we discussed the relationship between two people, Joe and Moe, who 
use different coordinate systems. Let us suppose that the positions of a particle 
as measured by Joe are x and by Moe are x r ; then the laws are as follows: 

* = y = y' 9 2 = z', 

where s is the displacement of Moe’s system relative to Joe’s. If we suppose that 
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the laws of motion are correct for Joe, how do they look for Moe? We find first, 
that 

dx/dt = dx'/dt + ds/dt. 

Previously, we considered the case where s was constant, and we found that 5 
made no difference in the laws of motion, since ds/dt = 0; ultimately, therefore, 
the laws of physics were the same in both systems. But another case we can take is 
that s = ut , where u is a uniform velocity in a straight line. Then s is not constant, 
and ds/dt is not zero, but is w, a constant. However, the acceleration d 2 x/dt 2 
is still the same as d 2 x f /dt 2 , because du/dt = 0. This proves the law that we used 
in Chapter 10, namely, that if we move in a straight line with uniform velocity the 
laws of physics will look the same to us as when we are standing still. That is 
the Galilean transformation. But we wish to discuss the interesting case where s 
is still more complicated, say s — at 2 /2. Then ds/dt = at and d 2 s/dt 2 = a, a 
uniform acceleration ; or in a still more complicated case, the acceleration might 
be a function of time. This means that although the laws of force from the point 
of view of Joe would look like 



= F X9 


the laws of force as looked upon by Moe would appear as 

d 2 x! 

m -jp - F x > - ma. 


That is, since Moe’s coordinate system is accelerating with respect to Joe’s, the 
extra term ma comes in, and Moe will have to correct his forces by that amount 
in order to get Newton’s laws to work. In other words, here is an apparent, mysteri- 
ous new force of unknown origin which arises, of course, because Moe has the 
wrong coordinate system. This is an example of a pseudo force; other examples 
occur in coordinate systems that are rotating . 

Another example of pseudo force is what is often called “centrifugal force.” 
An observer in a rotating coordinate system, e.g., in a rotating box, will find 
mysterious forces, not accounted for by any known origin of force, throwing 
things outward toward the walls. These forces are due merely to the fact that the 
observer does not have Newton’s coordinate system, which is the simplest coordi- 
nate system. 

Pseudo force can be illustrated by an interesting experiment in which we push 
a jar of water along a table, with acceleration. Gravity, of course, acts downward 
on the water, but because of the horizontal acceleration there is also a pseudo force 
acting horizontally and in a direction opposite to the acceleration. The resultant 
of gravity and pseudo force makes an angle with the vertical, and during the 
acceleration the surface of the water will be perpendicular to the resultant force, 
i.e., inclined at an angle with the table, with the water standing higher in the 
rearward side of the jar. When the push on the jar stops and the jar decelerates 
because of friction, the pseudo force is reversed, and the water stands higher in 
the forward side of the jar (Fig. 12-4). 

One very important feature of pseudo forces is that they are always propor- 
tional to the masses; the same is true of gravity. The possibility exists, therefore, 
that gravity itself is a pseudo force . Is it not possible that perhaps gravitation is 
due simply to the fact that we do not have the right coordinate system? After all, 
we can always get a force proportional to the mass if we imagine that a body is 
accelerating. For instance, a man shut up m a box that is standing still on the 
earth finds himself held to the floor of the box with a certain force that is propor- 
tional to his mass. But if there were no earth at all and the box were standing 
still, the man inside would float in space. On the other hand, if there were no 
earth at all and something were pulling the box along with an acceleration g , 
then the man in the box, analyzing physics, would find a pseudo force which 
would pull him to the floor, just as gravity does. 


o > \\ \A ^a_ 

1 g |g 

Fig. 1 2-4. Illustration of a pseudo force 
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Einstein put forward the famous hypothesis that accelerations give an imita- 
tion of gravitation, that the forces of acceleration (the pseudo forces) cannot be 
distinguished from those of gravity; it is not possible to tell how much of a given 
force is gravity and how much is pseudo force. 

It might seem all right to consider gravity to be a pseudo force, to say that we 
are all held down because we are accelerating upward, but how about the people 
in Madagascar, on the other side of the earth — are they accelerating too? Einstein 
found that gravity could be considered a pseudo force only at one point at a time, 
and was led by his considerations to suggest that the geometry of the world is more 
complicated than ordinary Euclidean geometry. The present discussion is only 
qualitative, and does not pretend to convey anything more than the general idea. 
To give a rough idea of how gravitation could be the result of pseudo forces, we 
present an illustration which is purely geometrical and does not represent the real 
situation. Suppose that we all lived in two dimensions, and knew nothing of a 
third. We think we are on a plane, but suppose we are really on the surface of a 
sphere. And suppose that we shoot an object along the ground, with no forces 
on it. Where will it go? It will appear to go in a straight line, but it has to remain 
on the surface of a sphere, where the shortest distance between two points is along 
a great circle; so it goes along a great circle. If we shoot another object similarly, 
but in another direction, it goes along another great circle. Because we think we 
are on a plane, we expect that these two bodies will continue to diverge linearly 
with time, but careful observation will show that if they go far enough they move 
closer together again, as though they were attracting each other. But they are not 
attracting each other — there is just something “weird” about this geometry. This 
particular illustration does not describe correctly the way in which Euclid’s 
geometry is “weird,” but it illustrates that if we distort the geometry sufficiently 
it is possible that all gravitation is related in some way to pseudo forces; that is the 
general idea of the Einsteinian theory of gravitation. 

12-6 Nuclear forces 

We conclude this chapter with a brief discussion of the only other known forces, 
which are called nuclear forces. These forces are within the nuclei of atoms, and 
although they are much discussed, no one has ever calculated the force between 
two nuclei, and indeed at present there is no known law for nuclear forces. These 
forces have a very tiny range which is just about the same as the size of the nucleus, 
perhaps 10“ 13 centimeter. With particles so small and at such a tiny distance, 
only the quantum-mechanical laws are valid, not the Newtonian laws. In nuclear 
analysis we no longer think in terms of forces, and in fact we can replace the force 
concept with a concept of the energy of interaction of two particles, a subject that 
will be discussed later. Any formula that can be written for nuclear forces is a 
rather crude approximation which omits many complications; one might be 
somewhat as follows: forces within a nucleus do not vary inversely as the square 
of the distance, but die off exponentially over a certain distance /*, as expressed by 
F = (1 /r 2 ) exp( — /-/Vo), where the distance r 0 is of the order of 10“ 1 3 centimeter. 
In other words, the forces disappear as soon as the particles are any great distance 
apart, although they are very strong within the 10“ 13 centimeter range. So far 
as they are understood today, the laws of nuclear force are very complex; we do 
not understand them in any simple way, and the whole problem of analyzing the 
fundamental machinery behind nuclear forces is unsolved. Attempts at a solution 
have led to the discovery of numerous strange particles, the 7r-mesons, for example, 
but the origin of these forces remains obscure. 


L. 
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13 


Work und Potential Energy C4) 


13-1 Energy of a falling body 

In Chapter 4 we discussed the conservation of energy. In that discussion, we 
did not use Newton’s laws, but it is, of course, of great interest to see how it comes 
about that energy is in fact conserved in accordance with these laws. For clarity 
we shall start with the simplest possible example, and then develop harder and 
harder examples. 

The simplest example of the conservation of energy is a vertically falling object, 
one that moves only in a vertical direction. An object which changes its height 
under the influence of gravity alone has a kinetic energy T (or K.E.) due to its 
motion during the fall, and a potential energy mgh , abbreviated U or P.E., whose 
sum is constant: 

\mv 2 + mgh = const, 

K E PE 

or 

T + U = const. (13.1) 


Now we would like to show that this statement is true. What do we mean, show 
it is true? From Newton’s Second Law we can easily tell how the object moves, 
and it is easy to find out how the velocity varies with time, namely, that it increases 
proportionally with the time, and that the height varies as the square of the time. 
So if we measure the height from a zero point where the object is stationary, 
it is no miracle that the height turns out to be equal to the square of the velocity 
times a number of constants. However, let us look at it a little more closely. 

Let us find out directly from Newton’s Second Law how the kinetic energy 
should change, by taking the derivative of the kinetic energy with respect to time 
and then using Newton’s laws. When we differentiate \mv 2 with respect to time, 
we obtain 


dT 

dt 


dt 


(%mv 2 ) = hm2v 


dv 

It 



(13.2) 


since m is assumed constant. But from Newton’s Second Law, m(dv/dt) = F, 
so that 


dT/dt = Fv. 


(13.3) 


In general, it will come out to be F • v, but in our one-dimensional case let us leave 
it as the force times the velocity. 

Now in our simple example the force is constant, equal to — mg, a vertical 
force (the minus sign means that it acts downward), and the velocity, of course, is 
the rate of change of the vertical position, or height h , with time. Thus the rate 
of change of the kinetic energy is — mg(dh/dt) i which quantity, miracle of miracles, 
is the rate of change of something else! It is the time rate of change of mgh\ 
Therefore, as time goes on, the changes in kinetic energy and in the quantity mgh 
are equal and opposite, so that the sum of the two quantities remains constant. 
Q.E.D. 

We have shown, from Newton’s second law of motion, that energy is con- 
served for constant forces when we add the potential energy mgh to the kinetic 
energy \ mv 2 . Now let us look into this further and see whether it can be generalized, 
and thus advance our understanding. Does it work only for a freely falling body, 
or is it more general? We expect from our discussion of the conservation of energy 
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that it would work for an object moving from one point to another in some kind 
of frictionless curve, under the influence of gravity (Fig. 13-1). If the object 
reaches a certain height h from the original height H , then the same formula should 
again be right, even though the velocity is now in some direction other than the 
vertical. We would like to understand why the law is still correct. Let us follow 
the same analysis, finding the time rate of change of the kinetic energy. This 
will again be mv(dv/dt), but m(dv/dt) is the rate of change of the magnitude of the 
momentum, i.e., the force in the direction of motion — the tangential force F t . Thus 


dT 

It 


dv 

mu dt = FiV - 



Now the speed is the rate of change of distance along the curve, ds/dt 9 and the 
tangential force F t is not mg but is weaker by the ratio of the distance ds along the 
path to the vertical distance dh. In other words, 


so that 


Ft = —mg sin 8 = — mg ^ 



Fig. 13-1. An object moving on a since the ds 9 s cancel. Thus we get — mg(dh/dt ), which is equal to the rate of change 
frictionless curve under the influence of of mgh, as before. 

9 ravit Y- In order to understand exactly how the conservation of energy works in 

general in mechanics, we shall now discuss a number of concepts which will help 
us to analyze it. 

First, we discuss the rate of change of kinetic energy in general in three 
dimensions. The kinetic energy in three dimensions is 


T — im(vx + Vy + v 2 ). 


When we differentiate this with respect to time, we get three terrifying terms: 


dT 

dt 


m v x 


dvx 

dt 


dVy 

)y dt 



(13.4) 


But m{dv x /di) is the force F x acting on the object in the ^-direction. Thus the right 
side of Eq. (13.4) is F x v x + F y v y + F z v z . We recall our vector analysis and recog- 
nize this as F • v; therefore 

dT/dt = F • v. (13.5) 


This result can be derived more quickly as follows: if a and b are two vectors, 
both of which may depend upon the time, the derivative of a * b is, in general, 

d( a • b )/dt = a * db/dt + (da/*) ■ b. (13.6) 

We then use this in the form a = b = v: 

dihmv 2 ) dfkmv ■ v) d\ _ r ds /n 

— - = — — j = m- r *v = F*v = F*-r* (1 3.7) 

dt dt dt dt 

Because the concepts of kinetic energy, and energy in general, are so impor- 
tant, various names have been given to the important terms in equations such as 
these. \mv 2 is, as we know, called kinetic energy . F • v is called power : the force 
acting on an object times the velocity of the object (vector “dot” product) is the 
power being delivered to the object by that force. We thus have a marvelous 
theorem: the rate of change of kinetic energy of an object is equal to the power 
expended by the forces acting on it. 

However, to study the conservation of energy, we want to analyze this still 
more closely. Let us evaluate the change in kinetic energy in a very short time dt. 
If we multiply both sides of Eq. (13.7) by dt , we find that the differential change in 
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the kinetic energy is the force “dot” the differential distance moved: 

dT=F-ds. (13.8) 

If we now integrate, we get 

AT = J 2 ¥-ds. (13.9) 

What does this mean? It means that if an object is moving in any way under the 
influence of a force, moving in some kind of curved path, then the change in K.E. 
when it goes from one point to another along the curve is equal to the integral of 
the component of the force along the curve times the differential displacement ds , 
the integral being carried out from one point to the other. This integral also has a 
name; it is called the work done by the force on the object. We see immediately 
that power equals work done per second. We also see that it is only a component 
of force in the direction of motion that contributes to the work done. In our simple 
example the forces were only vertical, and had only a single component, say F z , 
equal to —mg. No matter how the object moves in those circumstances, falling 
in a parabola for example, F * ds , which can be written as F x dx + F y dy + F z dz , 
has nothing left of it but F z dz — — mg dz , because the other components of force 
are zero. Therefore, in our simple case, 

I ¥ - ds = J — mg dz = — mg(z 2 - z x ), (13.10) 

so again we find that it is only the vertical height from which the object falls that 
counts toward the potential energy. 

A word about units. Since forces are measured in newtons, and we multiply 
by a distance in order to obtain work, work is measured in newton • meters (n • m), 
but people do not like to say newton-meters, they prefer to say joules (j). A 
newton-meter is called a joule; work is measured in joules. Power, then, is joules 
per second, and that is also called a wait (w). If we multiply watts by time, the 
result is the work done. The work done by the electrical company in our houses, 
technically, is equal to the watts times the time. That is where we get things like 
kilowatt hours, 1000 watts times 3600 seconds, or 3.6 X 10 6 joules. 

Now we take another example of the law of conservation of energy. Consider 
an object which initially has kinetic energy and is moving very fast, and which 
slides against the floor with friction. It stops. At the start the kinetic energy is not 
zero, but at the end it is zero; there is work done by the forces, because whenever 
there is friction there is always a component of force in a direction opposite to that 
of the motion, and so energy is steadily lost. But now let us take a mass on the end 
of a pivot swinging in a vertical plane in a gravitational field with no friction. What 
happens here is different, because when the mass is going up the force is downward, 
and when it is coming down, the force is also downward. Thus F • ds has one 
sign going up and another sign coming down. At each corresponding point of the 
downward and upward paths the values of F ■ ds are exactly equal in size but of 
opposite sign, so the net result of the integral will be zero for this case. Thus the 
kinetic energy with which the mass comes back to the bottom is the same as it had 
when it left, that is the principle of the conservation of energy. (Note that when 
there are friction forces the conservation of energy seems at first sight to be invalid. 
We have to find another form of energy. It turns out, in fact, that heat is generated 
in an object when it rubs another with friction, but at the moment we supposedly 
do not know that.) 


13-2 Work done by gravity 

The next problem to be discussed is much more difficult than the above; 
it has to do with the case when the forces are not constant, or simply vertical, as 
they were in the cases we have worked out. We want to consider a planet, for 
example, moving around the sun, or a satellite in the space around the earth. 
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Fig. 13-2. A small mass m falls under 
the influence of gravity toward a large 
mass M. 



Fig. 1 3-3. A closed path in a gravi- 
tational field. 


We shall first consider the motion of an object which starts at some point 1 
and falls, say, directly toward the sun or toward the earth (Fig. 13-2). Will there 
be a law of conservation of energy in these circumstances? The only difference is 
that in this case, the force is changing as we go along, it is not just a constant. As 
we know, the force is GM/r 2 times the mass m, where m is the mass that moves. 
Now certainly when a body falls toward the earth, the kinetic energy increases as 
the distance fallen increases, just as it does when we do not worry about the 
variation of force with height. The question is whether it is possible to find another 
formula for potential energy different from mgh , a different function of distance 
away from the earth, so that conservation of energy will still be true. 

This one-dimensional case is easy to treat because we know that the change 
in the kinetic energy is equal to the integral, from one end of the motion to the other, 
of — GMm/r 2 times the displacement dr: 


C 2 

T 2 - Ti = - / GMm% ■ (13.11) 

There are no cosines needed for this case because the force and the displacement 
are in the same direction. It is easy to integrate dr/r 2 \ the result is — 1 /r, so 
Eq. (13.11) becomes 

T 2 -T x = +GMm (1 - i) • (13.12) 

Thus we have a different formula for potential energy. Equation (13.12) tells us 
that the quantity (^mv 2 — GMm/r) calculated at point 1, at point 2, or at any 
other place, has a constant value. 

We now have the formula for the potential energy in a gravitational field for 
vertical motion. Now we have an interesting problem. Can we make perpetual 
motion in a gravitational field? The gravitational field varies; in different places 
it is in different directions and has different strengths. Could we do something 
like this, using a fixed, frictionless track: start at some point and lift an object out 
to some other point, then move it around an arc to a third point, then lower it a 
certain distance, then move it in at a certain slope and pull it out some other way, 
so that when we bring it back to the starting point, a certain amount of work has 
been done by the gravitational force, and the kinetic energy of the object is in- 
creased? Can we design the curve so that it comes back moving a little bit faster 
than it did before, so that it goes around and around and around, and gives us 
perpetual motion? Since perpetual motion is impossible, we ought to find out that 
this is also impossible. We ought to discover the following proposition: since there 
is no friction the object should come back with neither higher nor lower velocity — 
it should be able to keep going around and around any closed path. Stated in 
another way, the total work done in going around a complete cycle should be zero 
for gravity forces, because if ir is not zero we can get energy out by going around. 
(If the work turns out to be less than zero, so that we get less speed when we go 
around one way, then we merely go around the other way, because the forces, of 
course, depend only upon the position, not upon the direction; if one way is plus, 
the other way would be minus, so unless it is zero we will get perpetual motion 
by going around either way.) 

Is the work really zero ? Let us try to demonstrate that it is. First we shall 
explain more or less why it is zero, and then we shall examine it a little better 
mathematically. Suppose that we use a simple path such as that shown in Fig. 
13-3, in which a small mass is carried from point 1 to point 2, and then is made 
to go around a circle to 3, back to 4, then to 5, 6, 7, and 8, and finally back to 1. 
All of the lines are either purely radial or circular, with Af as the center. How 
much work is done in carrying m around this path? Between points 1 and 2, it is 
GMm times the difference of 1 fr between these two points: 

r 2 r 2 . 

W 12 = / F-ds = Ji - GMm * = -GMm(~- 
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From 2 to 3 the force is exactly at right angles to the curve, so that W 2 z = 0. 
The work from 3 to 4 is 


In the same fashion, we find that W i5 = 0, W 56 = — GM/n(l/r r , — l/r s ), 
W 67 = 0, W 78 = —GMm(l/r 8 - 1 /r 7 ), and W 81 = 0. Thus 


W = GMm 



_I + i_I + i_i+I_ 

r 2 f 3 r 4 r 5 r 6 r 7 



But we note that r 2 — r 3 , r 4 = r 5 , r & = r 7 , and r 8 = r x . Therefore W = 0. 

Of course we may wonder whether this is too trivial a curve. What if we use 
a real curve? Let us try it on a real curve. First of all, we might like to assert that 
a real curve could always be imitated sufficiently well by a series of sawtooth 
jiggles like those of Fig. 13-4, and that therefore, etc., Q.E.D., but without a little 
analysis, it is not obvious at first that the work done going around even a small 
triangle is zero. Let us magnify one of the triangles, as shown in Fig. 13-4. Is the 
work done in going from a to b and b to c on a triangle the same as the work done 
in going directly from a to c? Suppose that the force is acting in a certain direction ; 
let us take the triangle such that the side be is in this direction, just as an example. 
We also suppose that the triangle is so small that the force is essentially constant 
over the entire triangle. What is the work done in going from a to cl It is 


W ac = f F ■ ds = Fs cos 0, 
Ja 



Fig. 13-4. A “smooth" closed path, 
showing a magnified segment of it ap- 
proximated by a series of radial and 
circumferential steps, and an enlarged 
view of one step. 


since the force is constant. Now let us calculate the work done in going around 
the other two sides of the triangle. On the vertical side ab the force is perpendicular 
to ds , so that here the work is zero. On the horizontal side be, 


W bc = £ F • ds = Fx. 

Thus we see that the work done in going along the sides of a small triangle is 
the same as that done going on a slant, because s cos 0 is equal to x. We have 
proved previously that the answer is zero for any path composed of a series of 
notches like those of Fig. 13-3, and also that we do the same work if we cut across 
the corners instead of going along the notches (so long as the notches are fine 
enough, and we can always make them very fine); therefore, the work done in 
going around any path in a gravitational field is zero . 

This is a very remarkable result. It tells us something we did not previously 
know about planetary motion. It tells us that when a planet moves around the 
sun (without any other objects around, no other forces) it moves in such a manner 
that the square of the speed at any point minus some constants divided by the 
radius at that point is always the same at every point on the orbit. For example, 
the closer the planet is to the sun, the faster it is going, but by how much? By 
the following amount: if instead of letting the planet go around the sun, we were 
to change the direction (but not the magnitude) of its velocity and make it move 
radially, and then we let it fall from some special radius to the radius of interest, 
the new speed would be the same as the speed it had in the actual orbit, because 
this is just another example of a complicated path. So long as we come back to the 
same distance, the kinetic energy will be the same. So, whether the motion is the 
real, undisturbed one, or is changed in direction by channels, by frictionless con- 
straints, the kinetic energy with which the planet arrives at a point will be the same. 

Thus, when we make a numerical analysis of the motion of the planet in its 
orbit, as we did earlier, we can check whether or not we are making appreciable 
errors by calculating this constant quantity, the energy, at every step, and it should 
not change. For the orbit of Table 9-2 the energy does change,* it changes by 


* The energy is + Vy) — 1/r in the units of Table 9-2. 
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some 1.5 percent from the beginning to the end. Why? Either because for the 
numerical method we use finite intervals, or else because we made a slight mistake 
somewhere in arithmetic. 

Let us consider the energy in another case: the problem of a mass on a spring. 
When we displace the mass from its balanced position, the restoring force is 
proportional to the displacement. In those circumstances, can we work out a 
law for conservation of energy? Yes, because the work done by such a force is 

W = f X Fdx = r-kxdx = -\kx 2 , (13.13) 

Jo Jo 

Therefore, for a mass on a spring we have that the kinetic energy of the oscillating 
mass plus \kx 2 is a constant. Let us see how this works. We pull the mass down; 
it is standing still and so its speed is zero. But x is not zero, x is at its maximum, 
so there is some energy, the potential energy, of course. Now we release the mass 
and things begin to happen (the details not to be discussed), but at any instant the 
kinetic plus potential energy must be a constant. For example, after the mass is 
on its way past the original equilibrium point, the position * equals zero, but that 
is when it has its biggest v 2 , and as it gets more x 2 it gets less v 2 , and so on. So 
the balance of x 2 and v 2 is maintained as the mass goes up and down. Thus we 
have another rule now, that the potential energy for a spring is \kx 2 , if the force 
is — kx. 


13-3 Summation of energy 

Now we go on to the more general consideration of what happens when there 
are large numbers of objects. Suppose we have the complicated problem of many 
objects, which we label i — 1, 2, 3, . . . , all exerting graviational pulls on each 
other. What happens then? We shall prove that if we add the kinetic energies of 
all the particles, and add to this the sum, over all pairs of particles, of their mutual 
gravitational potential energy, — GMm/r tJ , the total is a constant: 

2 + X) - 9ahHh = const. (i3.i4) 

% (pairs ij) 


How do we prove it? We differentiate each side with respect to time and get zero. 
When we differentiate \m x v 2 „ we find derivatives of the velocity that are the forces, 
just as in Eq. (13.5). We replace these forces by the law of force that we know from 
Newton’s law of gravity and then we notice that what is left is the same as the time 
derivative of 

_ Gm t m 3 

pairs r t3 

The time derivative of the kinetic energy is 


J t = Z) m * v * • ^ 


= Z) Fi 

t 

i \ 3 ] 

The time derivative of the potential energy is 

d yy Grn l m ] __ y^^ / ^ Grn x rn^\Jdr % \ 

^ pairs pairs ^ j > ^ ' 


But 
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= V(*. — x , ) 2 + (>>» - ytf + (z, - z,)\ 


(13.15) 



so that 



since r tJ = —r JU while r XJ = r 3l . Thus 


d 

dt 


E 

pairs 


Gm,m, 


Gm l m,r lJ , Gm } m t r ]t 
3 ' v * + 3 ' 


(13.16) 


Now we must note carefully what £ {X} and £ mean. In Eq. (13.15), 

% j pairs 

XI {X) means that / takes on all values i = 1, 2, 3, ... in turn, and for each value 

1 3 

of /, the index 7 takes on all values except /. Thus ifz = 3, j takes on the values 
1,2,4,... 

In Eq. (13.16), on the other hand, X means that given values of i and j occur 

pairs 

only once. Thus the particle pair 1 and 3 contributes only one term to the sum. 
To keep track of this, we might agree to let / range over all values 1, 2, 3, ... , and 
for each / let j range only over values greater than /. Thus if / = 3, j could only 
have values 4, 5, 6, . . . But we notice that for each /, j value there are two contribu- 
tions to the sum, one involving v t , and the other v J9 and that these terms have the 
same appearance as those of Eq. (13.14), where all values of / and j (except / = j) 
are included in the sum. Therefore, by matching the terms one by one, we see 
that Eqs. (13.16) and (13.15) are precisely the same, but of opposite sign, so that 
the time derivative of the kinetic plus potential energy is indeed zero. Thus we 
see that, for many objects, the kinetic energy is the sum of the contributions from 
each individual object , and that the potential energy is also simple, it being also 
just a sum of contributions, the energies between all the pairs. We can understand 
why it should be the energy of every pair this way: Suppose that we want to find 
the total amount of work that must be done to bring the objects to certain distances 
from each other. We may do this in several steps, bringing them in from infinity 
where there is no force* one by one. First we bring in number one, which requires 
no work, since no other qbjects are yet present to exert force on it. Next we bring 
in number two, which does take some work, namely W X2 = — Gm x m 2 lr x2 . 
Now, and this is an important point, suppose we bring in the next object to position 
three. At any moment the force on number 3 can be written as the sum of two 
forces — the force exerted by number 1 and that exerted by number 2. Therefore 
the work done is the sum of the works done by each , because if F 3 can be resolved 
into the sum of two forces, 

F3 = Fj 3 + F 2 3, 

then the work is 


/F» * ds = /f 13 * ds + J F23 ’ ds — W x % + W 2 s. 

That is, the work done is the sum of the work done against the first force and the 
second force, as if each acted independently. Proceeding in this way, we see that 
the total work required to assemble the given configuration of objects is precisely 
the value given in Eq. (13.14) as the potential energy. It is because gravity obeys 
the principle of superposition of forces that we can write the potential energy as 
a sum over each pair of particles. 
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Fig. 13-5. The gravitational force F 
on a mass point produced by an infinite 
plane sheet of matter. 


13-4 Gravitational field of large objects 

Now we shall calculate the fields which are met in a few physical circumstances 
involving distributions of mass . We have not so far considered distributions of 
mass, only particles, so it is interesting to calculate the forces when they are 
produced by more than just one particle. First we shall find the gravitational force 
on a mass that is produced by a plane sheet of material, infinite in extent. The 
force on a unit mass at a given point P, produced by this sheet of material (Fig. 
13-5), will of course be directed toward the sheet. Let the distance of the point 
from the sheet be a , and let the amount of mass per unit area of this huge sheet be p. 
We shall suppose p to be constant; it is a uniform sheet of material. Now, what 
small field dC is produced by the mass dm lying between p and p + dp from the 
point O of the sheet nearest point PI Answer: dC = G(dmr/r s ). But this field 
is directed along r, and we know that only the ^-component of it will remain when 
we add all the little vector <fC’s to produce C. The ^-component of dC is 

, dmr x dma 

dC x = G — = G — • 

Now all masses dm which are at the same distance r from P will yield the same 
dC x , so we may at once write for dm the total mass in the ring between p and 
p + dp , namely dm = ulwpdp (2wpdp is the area of a ring of radius p and 
width dp, if dp « p). Thus 


d.C x = Gplwp ^ ■ 


Then, since r 2 — p 2 + a 2 , p dp = r dr. Therefore, 

C x = 2ir GfiaJ^ ~ = lirGpa Q - = 2 wGp. (13.17) 

Thus the force is independent of distance a\ Why? Have we made a mistake? 
One might think that the farther away we go, the weaker the force would be. But 
no! If we are close, most of the matter is pulling at an unfavorable angle; if we are 
far away, more of the matter is situated more favorably to exert a pull toward 
the plane. At any distance, the matter which is most effective lies in a certain cone. 
When we are farther away the force is smaller by the inverse square, but in the 
same cone, in the same angle, there is much more matter , larger by just the square 
of the distance! This analysis can be made rigorous by just noticing that the 
differential contribution in any given cone is in fact independent of the distance, 
because of the reciprocal variation of the strength of the force from a given mass, 
and the amount of mass included in the cone, with changing distance. The force 
is not really constant of course, because when we go on the other side of the sheet 
it is reversed in sign. 

We have also, in effect, solved an electrical problem: if we have an electrically 
charged plate, with an amount cr of charge per unit area, then the electric field at 
a point outside the sheet is equal to <r/2e 0 , and is in the outward direction if the 
sheet is positively charged, and inward if the sheet is negatively charged. To prove 
this, we merely note that G, gravity, plays the same role as l/47re 0 for electricity. 

Now suppose that we have two plates, with a positive charge +<r on one 
and a negative charge — a on another at a distance D from the first. What is the 
field? Outside the two plates it is zero. Why? Because one attracts and the other 
repels, the force being independent of distance , so that the two balance out! Also, 
the force between the two plates is clearly twice as great as that from one plate, 
namely E = c r/e 0 , and is directed from the positive plate to the negative one. 

Now we come to a most interesting and important problem, whose solution 
we have been assuming all the time, namely, that the force produced by the earth 
at a point on the surface or outside it is the same as if all the mass of the earth 
were located at its center. The validity of this assumption is not obvious, because 
when we are close, some of the mass is very close to us, and some is farther away, 
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and so on. When we add the effects all together, it seems a miracle that the net 
force is exactly the same as we would get if we put all the mass in the middle! 

We now demonstrate the correctness of this miracle. In order to do so, how- 
ever, we shall consider a thin uniform hollow shell instead of the whole earth. 
Let the total mass of the shell be m, and let us calculate the potential energy of a 
particle of mass m' a distance R away from the sphere (Fig. 13-6) and show that the 
potential energy is the same as it would be if the mass m were a point at the center. 
(The potential energy is easier to work with than is the field because we do not 
have to worry about angles, we merely add the potential energies of all the pieces 
of mass.) If we call x the distance of a certain plane section from the center, then 
all the mass that is in a slice dx is at the same distance r from P 9 and the potential 
energy due to this ring is — Gm ' dm/r . How much mass is in the small slice dxl 
An amount 

, - , liryii dx 2iryu dxa * , 

dm = 2 7tvu ds — — : — — — — — 2 wap dx y 

7 sin 6 y 

where p = m/4wa 2 is the surface density of mass on the spherical shell. (It is a 
general rule that the area of a zone of a sphere is proportional to its axial width.) 
Therefore the potential energy due to dm is 



Fig. 13-6. A thin spherical shell of 
mass or charge. 


But we see that 


dW = - 


Gm' dm 
r 


Gm'2wap dx 
r 


Thus 

or 

Therefore, 

and so 


r 2 — y 2 + (R — x) 2 — y 2 + x 2 + R 2 — 2 Rx 
— a 2 + R 2 — 2Rx. 

2 r dr — —2 R dx 

dx _ dr 

T ~ R m 


dW = - 


Gm'2wap dr 
R 


W = — 


Gm'2wap 

R 

Gm'2wap 

R 

Gm'm 

R 


/ dr 

J R—a 


2a = — 


Gm'(4wa 2 p) 


(13.18) 


Thus, for a thin spherical shell, the potential energy of a mass m f , external to the 
shell, is the same as though the mass of the shell were concentrated at its center. 
The earth can be imagined as a series of spherical shells, each one of which con- 
tributes an energy which depends only on its mass and the distance from the 
center; adding them all together we get the total mass , and therefore the earth acts 
as though all the material were at the center! 

But notice what happens if our point is on the inside of the shell. Making 
the same calculation, but with P on the inside, we still get the difference of the two 
r’s, but now in the form a + R —(a — R) = 2 R, or twice the distance from the 
center. In other words, W comes out to be W = —Gm'm/ a, which is independent 
of R and independent of position, i.e., the same energy no matter where we are 
inside. Therefore no force; no work is done when we move about inside. If the 
potential energy is the same no matter where an object is placed inside the sphere, 
there can be no force on it. So there is no force inside, there is only a force outside, 
and the force outside is the same as though the mass were all at the center. 
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Worh and Potential Energy (conclusion) 


14-1 Work 

In the preceding chapter we have presented a great many new ideas and 
results that play a central role in physics. These ideas are so important that it 
seems worth while to devote a whole chapter to a closer examination of them. 
In the present chapter we shall not repeat the “proofs” or the specific tricks by 
which the results were obtained, but shall concentrate instead upon a discussion 
of the ideas themselves. 

In learning any subject of a technical nature where mathematics plays a role, 
one is confronted with the task of understanding and storing away in the memory 
a huge body of facts and ideas, held together by certain relationships which can 
be “proved” or “shown” to exist between them. It is easy to confuse the proof 
itself with the relationship which it establishes. Clearly, the important thing to 
learn and to remember is the relationship, not the proof. In any particular cir- 
cumstance we can either say “it can be shown that” such and such is true, or we 
can show it. In almost all cases, the particular proof that is used is concocted, 
first of all, in such form that it can be written quickly and easily on the chalkboard 
or on paper, and so that it will be as smooth-looking as possible. Consequently, 
the proof may look deceptively simple, when in fact, the author might have 
worked for hours trying different ways of calculating the same thing until he has 
found the neatest way, so as to be able to show that it can be shown in the shortest 
amount of time! The thing to be remembered, when seeing a proof, is not the proof 
itself, but rather that it can be shown that such and such is true. Of course, if the 
proof involves some mathematical procedures or “tricks” that one has not seen 
before, attention should be given not to the trick exactly, but to the mathematical 
idea involved. 

It is certain that in all the demonstrations that are made in a course such as 
this, not one has been remembered from the time when the author studied fresh- 
man physics. Quite the contrary: he merely remembers that such and such is 
true, and to explain how it can be shown he invents a demonstration at the moment 
it is needed. Anyone who has really learned a subject should be able to follow a 
similar procedure, but it is no use remembering the proofs. That is why, in this 
chapter, we shall avoid the proofs of the various statements made previously, and 
merely summarize the results. 

The first idea that has to be digested is work done by a force. The physical 
word “work” is not the word in the ordinary sense of “Workers of the world 
unite!,” but is a different idea. Physical work is expressed as J F • ds , called “the 
line integral of F dot dsf which means that if the force, for instance, is in one 
direction and the object on which the force is working is displaced in a certain 
direction, then only the component of force in the direction of the displacement 
does any work. If, for instance, the force were constant and the displacement were 
a finite distance As, then the work done in moving the constant force through that 
distance is only the component of force along As times As. The rule is “force times 
distance,” but we really mean only the component of force in the direction of the 
displacement times As or, equivalently, the component of displacement in the 
direction of force times F. It is evident that no work whatsoever is done by a 
force which is at right angles to the displacement. 

Now if the vector displacement As is resolved into components, in other 
words, if the actual displacement is As and we want to consider it effectively as a 
component of displacement Ax in the ^-direction, Ay in the ^-direction, and A z 
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in the z-direction, then the work done in carrying an object from one place to 
another can be calculated in three parts, by calculating the work done along x, 
along y, and along z. The work done in going along x involves only that component 
of force, namely F x , and so on, so the work is F x Ax + F y Ay + F z A z. When 
the force is not constant, and we have a complicated curved motion, then we must 
resolve the path into a lot of little As’s, add the work done in carrying the object 
along each As, and take the limit as As goes to zero. This is the meaning of the 
“line integral.” 

Everything we have just said is contained in the formula W — Jf * ds. It 
is all very well to say that it is a marvelous formula, but it is another thing to 
understand what it means, or what some of the consequences are. 

The word “work” in physics has a meaning so different from that of the \yord 
as it is used in ordinary circumstances that it must be observed carefully that t^ere 
are some peculiar circumstances in which it appears not to be the same. For ex- 
ample, according to the physical definition of work, if one holds a hundred-pound 
weight off the ground for a while, he is doing no work. Nevertheless, everyone 
knows that he begins to sweat, shake, and breathe harder, as if he were running 
up a flight of stairs. Yet running upstairs is considered as doing work (in running 
downstairs , one gets work out of the world, according to physics), but in simply 
holding an object in a fixed position, no work is done. Clearly, the physical defini- 
tion of work differs from the physiological definition, for reasons we shall briefly 
explore. 

It is a fact that when one holds a weight he has to do “physiological” work. 
Why should he sweat? Why should he need to consume food to hold the weight 
up? Why is the machinery inside him operating at full throttle, just to hold the 
weight up? Actually, the weight could be held up with no effort by just placing it 
on a table; then the table, quietly and calmly, without any supply of energy, is 
able to maintain the same weight at the same height! The physiological situation 
is something like the following. There are two kinds of muscles in the human 
body and in other animals: one kind, called striated or skeletal muscle, is the type 
of muscle we have in our arms, for example, which is under voluntary control; 
the other kind, called smooth muscle, is like the muscle in the intestines or, in the 
clam, the greater adductor muscle that closes the shell. The smooth muscles 
work very slowly, but they can hold a “set”; that is to say, if the clam tries to 
close its shell in a certain position, it will hold that position, even if there is a very 
great force trying to change it. It will hold a position under load for hours and 
hours without getting tired because it is very much like a table holding up a weight, 
it “sets” into a certain position, and the molecules just lock there temporarily 
with no work being done, no effort being generated by the clam. The fact that we 
have to generate effort to hold up a weight is simply due to the design of striated 
muscle. What happens is that when a nerve impulse reaches a muscle fiber, the 
fiber gives a little twitch and then relaxes, so that when we hold something up, 
enormous volleys of nerve impulses are coming in to the muscle, large numbers 
of twitches are maintaining the weight, while the other fibers relax. We can see 
this, of course: when we hold a heavy weight and get tired, we begin to shake. 
The reason is that the volleys are coming irregularly, and the muscle is tired and 
not reacting fast enough. Why such an inefficient scheme? We do not know exactly 
why, but evolution has not been able to develop fast smooth muscle. Smooth 
muscle would be much more effective for holding up weights because you could 
just stand there and it would lock in; there would be no work involved and no 
energy would be required. However, it has the disadvantage that it is very slow- 
operating. 

Returning now to physics, we may ask why we want to calculate the work 
done. The answer is that it is interesting and useful to do so, since the work done 
on a particle by the resultant of all the forces acting on it is exactly equal to the 
change in kinetic energy of that particle. That is, if an object is being pushed, it 
picks up speed, and 

A(v 2 ) = — F ■ As • 
m 
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14-2 Constrained motion 

Another interesting feature of forces and work is this : suppose that we have 
a sloping or a curved track, and a particle that must move along the track, but 
without friction. Or we may have a pendulum with a string and a weight; the string 
constrains the weight to move in a circle about the pivot point. The pivot point 
may be changed by having the string hit a peg, so that the path of the weight is 
along two circles of different radii. These are examples of what we call fixed, 
frictionless constraints . 

In motion with a fixed frictionless constraint, no work is done by the constraint 
because the forces of constraint are always at right angles to the motion. By the 
“forces of constraint” we mean those forces which are applied to the object directly 
by the constraint itself — the contact force with the track, or the tension in the string. 

The forces involved in the motion of a particle on a slope moving under the 
influence of gravity are quite complicated, since there is a constraint force, a 
gravitational force, and so on. However, if we base our calculation of the motion 
on conservation of energy and the gravitational force alone, we get the right result. 
This seems rather strange, because it is not strictly the right way to do it — we 
should use the resultant force. Nevertheless, the work done by the gravitational 
force alone will turn out to be the change in the kinetic energy, because the work 
done by the constraint part of the force is zero (Fig. 14-1). 

The important feature here is that if a force can be analyzed as the sum of 
two or more “pieces” then the work done by the resultant force in going along a 
certain curve is the sum of the works done by the various “component” forces 
into which the force is analyzed. Thus if we analyze the force as being the vector 
sum of several effects, gravitational plus constraint forces, etc., or the x-component 
of all forces and the y-component of all forces, or any other way that we wish 
to split it up, then the work done by the net force is equal to the sum of the works 
done by all the parts into which we have divided the force in making the analysis. 




J 1 DIRECTION OF MOTION 
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Fig. 14-1. Forces acting on a sliding 
body (no friction). 


14-3 Conservative forces 


In nature there are certain forces, that of gravity, for example, which have 
a very remarkable property which we call “conservative” (no political ideas 
involved, it is again one of those “crazy words”). If we calculate how much work 
is done by a force in moving an object from one point to another along some 
curved path, in general the work depends upon the curve, but in special cases it 
does not. If it does not depend upoq the curve, we say that the force is a conserva- 
tive force. In other words, if the integral of the force times the distance in going 
from position 1 to position 2 in Fig. 14-2 is calculated along curve A and then 
along B, we get the same number of joules, and if this is true for this pair of points 
on every curve, and if the same proposition works no matter which pair of points 
we use, then, we say the force is conservative. In such circumstances, the work 
integral going from 1 to 2 can be evaluated in a simple manner, and we can give 
a formula for tfye result. Ordinarily it is not this easy, because we also have to 
specify the curve\ but when we have a case where the work does not depend on 
the curve, then, bf course, the work depends only upon the positions of 1 and 2. 

To demonstrate this idea, consider the following. We take a “standard” 
point P, at an arbitrary location (Fig. 14-2). Then, the work line-integral from 
1 to 2, which we want to calculate, can be evaluated as the work done in going 
from 1 to P plus the work done in going from P to 2, because the forces are con- 
servative and the work does not depend upon the curve. Now, the work done in 
going from position P to a particular position in space is a function of that position 
in space. Of course it really depends on P also, but we hold the arbitrary point P 
fixed permanently for the analysis. If that is done, then the work done in going 
from point P to point 2 is some function of the final position of 2. It depends upon 
where 2 is; if we go to some other point we get a different answer. 

We shall call this function of position — U(x, y, z ), and when we wish to refer 
to some particular point 2 whose coordinates are (x 2 , y 2 , 22)1 we shall write U( 2), 
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Fig. 1 4-2. Possible paths between 
two points in a field of force. 


I 



as an abbreviation for U(x 2 , y 2 , ^ 2 )- The work done in going from point 1 to 
point P can be written also by going the other way along the integral, reversing 
all the tfs’s. That is, the work done in going from 1 to P is minus the work done 
in going from the point P to 1 : 

I* F ds = J* F ■ (-*) = - £ F ds. 

Thus the work done in going from P to 1 is —U{ 1), and from P to 2 the work is 
— U( 2). Therefore the integral from 1 to 2 is equal to — U(2) plus [— £/(l) back- 
wards], or +U( 1) - 17(2): 

t/(l) = - j 1 F • ds, U{ 2) = - J* F • <fe, 

j 2 ¥-ds = 1/(1) - 1/(2). (14.1) 

The quantity £/(l) — £/(2) is called the change in the potential energy, and we 
call U the potential energy. We shall say that when the object is located at position 
2, it has potential energy U( 2) and at position 1 it has potential energy t/(l). 
If it is located at position P, it has zero potential energy. If we had used any other 
point, say 0, instead of P, it would turn out (and we shall leave it to you to demon- 
strate) that the potential energy is changed only by the addition of a constant. Since 
the conservation of energy depends only upon changes , it does not matter if we 
add a constant to the potential energy. Thus the point P is arbitrary. 

Now, we have the following two propositions: (1) that the work done by a 
force is equal to the change in kinetic energy of the particle, but (2) mathematically, 
for a conservative force, the work done is minus the change in a function U which 
we call the potential energy. As a consequence of these two, we arrive at the 
proposition that if only conservative forces act , the kinetic energy T plus the potential 
energy U remains constant: 

T + U = constant. (14.2) 

Let us now discuss the formulas for the potential energy for a number of cases. 
If we have a gravitational field that is uniform, if we are not going to heights 
comparable with the radius of the earth, then the force is a constant vertical force 
and the work done is simply the force times the vertical distance. Thus 

U(z) = mgz , (14.3) 

and the point P which corresponds to zero potential energy happens to be any 
point in the plane 2=0. We could also have said that the potential energy is 
mg{z — 6) if we had wanted to — all the results would, of course, be the same in 
our analysis except that the value of the potential energy at 2 = 0 would be 
—mg6. It makes no difference, because only differences in potential energy count. 

The energy needed to compress a linear spring a distance x from an equilibrium 
point is 

U(x) = \kx\ (14.4) 

and the zero of potential energy is at the point x = 0, the equilibrium position of 
the spring. Again we could add any constant we wish. 

The potential energy of gravitation for point masses M and m , a distance r 
apart, is 

U(r) = —GMm/r. (14.5) 

The constant has been chosen here so that the potential is zero at infinity. Of 
course the same formula applies to electrical charges, because it is the same law: 

U(r) = qiq 2 /4TU 0 r. (14.6) 

Now let us actually use one of these formulas, to see whether we understand 
what it means. Question : How fast do we have to shoot a rocket away from the 
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earth in order for it to leave? Solution : The kinetic plus potential energy must be 
a constant; when it “leaves/' it will be millions of miles away, and if it is just 
barely able to leave, we may suppose that it is moving with zero speed out there, 
just barely going. Let a be the radius of the earth, and M its mass. The kinetic 
plus potential energy is then initially given by %mv 2 — GmM/a. At the end of 
the motion the two energies must be equal. The kinetic energy is taken to be 
zero at the end of the motion, because it is supposed to be just barely drifting away 
at essentially zero speed, and the potential energy is GmM divided by infinity, 
which is zero. So everything is zero on one side and that tells us that the square of 
the velocity must be 2 GM/a, But GM/a 2 is what we call the acceleration of 
gravity, g. Thus 

v 2 = 2ga. 


At what speed must a satellite travel in order to keep going around the earth? 
We worked this out long ago and found that v 2 = GM/a . Therefore to go 
away from the earth, we need \/2 times the velocity we need to just go around 
the earth near its surface. We need, in other words, twice as much energy (because 
energy goes as the square of the velocity) to leave the earth as we do to go around it. 
Therefore the first thing that was done historically with satellites was to get one 
to go around the earth, which requires a speed of five miles per second. The next 
thing was to send a satellite away from the earth permanently; this required twice 
the energy, or about seven miles per second. 

Now, continuing our discussion of the characteristics of potential energy, let 
us consider the interaction of two molecules, or two atoms, two oxygen atoms for 
instance. When they are very far apart, the force is one of attraction, which varies 
as the inverse seventh power of the distance, and when they are very close the force 
is a very large repulsion. If we integrate the inverse seventh power to find the work 
done, we find that the potential energy U, which is a function of the radial distance 
between the two oxygen atoms, varies as the inverse sixth power of the distance 
for large distances. 

If we sketch the curve of the potential energy U(r) as in Fig. 14-3, we thus 
start out at large r with an inverse sixth power, but if we come in sufficiently near 
we reach a point d where there is a minimum of potential energy. The minimum of 
potential energy at r — d means this: if we start at d and move a small distance, 
a very small distance, the work done, which is the change in potential energy when 
we move this distance, is nearly zero, because there is very little change in potential 
energy at the bottom of the curve. Thus there is no force at this point, and so it is 
the equilibrium point. Another way to see that it is the equilibrium point is that 
it takes work to move away from d in either direction. When the two oxygen 
atoms have settled down, so that no more energy can be liberated from the force 
between them, they are in the lowest energy state, and they will be at this separation 
d. This is the way an oxygen molecule looks when it is cold. When we heat it up, 
the atoms shake and move farther apart, and we can in fact break them apart, but 
to do so takes a certain amount of work or energy, which is the potential energy 
difference between r — d and r = oo . When we try to push the atoms very close 
together the energy goes up very rapidly, because they repel each other. 

The reason we bring this out is that the idea of force is not particularly suitable 
for quantum mechanics; there the idea of energy is most natural. We find that 
although forces and velocities “dissolve” and disappear when we consider the 
more advanced forces between nuclear matter and between molecules and so on, 
the energy concept remains. Therefore we find curves of potential energy in 
quantum mechanics books, but very rarely do we ever see a curve for the force 
between two molecules, because by that time people who are doing analyses are 
thinking in terms of energy rather than of force. 

Next we note that if several conservative forces are acting on an object at the 
same time, then the potential energy of the object is the sum of the potential 
energies from each of the separate forces. This is the same proposition that we 
mentioned before, because if the force can be represented as a vector sum of forces, 
then the work done by the total force is the sum of the works done by the partial 
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Fig. 14-3. The potential energy be- 
tween two atoms as a function of the 
distance between them. 



forces, and it can therefore be analyzed as changes in the potential energies of each 
of them separately. Thus the total potential energy is the sum of all the little pieces. 

We could generalize this to the case of a system of many objects interacting 
with one another, like Jupiter, Saturn, Uranus, etc., or oxygen, nitrogen, carbon, 
etc., which are acting with respect to one another in pairs due to forces all of which 
are conservative. In these circumstances the kinetic energy in the entire system 
is simply the sum of the kinetic energies of all of the particular atoms or planets or 
whatever, and the potential energy of the system is the sum, over the pairs of 
particles, of the potential energy of mutual interaction of a single pair, as though 
the others were not there, ("phis is really not true for molecular forces, and the 
formula is somewhat more complicated; it certainly is true for Newtonian gravita- 
tion, and it is true as an approximation for molecular forces. For molecular forces 
there is a potential energy, but it is sometimes a more complicated function of the 
positions of the atoms than simply a sum of terms from pairs.) In the special 
case of gravity, therefore, the potential energy is the sum, over all the pairs i and j\ 
of —Gm % m 3 /r l}i as was indicated in Eq. (13.14). Equation (13.14) expressed 
mathematically the following proposition: that the total kinetic energy plus the 
total potential energy does not change with time. As the various planets wheel 
about, and turn and twist and so on, if we calculate the total kinetic energy and 
the total potential energy we find that the total remains constant. 

14-4 Nonconservative forces 

We have spent a considerable time discussing conservative forces; what about 
nonconservative forces ? We shall take a deeper view of this than is usual, and state 
that there are no nonconservative forces! As a matter of fact, all the fundamental 
forces in nature appear to be conservative. This is not a consequence of Newton’s 
laws. In fact, so far as Newton himself knew, the forces could be nonconservative, 
as friction apparently is. When we say friction apparently is, we are taking a 
modern view, in which it has been discovered that all the deep forces, the forces 
between the particles at the most fundamental level, are conservative. 

If, for example, we analyze a system like that great globular star cluster that 
we saw a picture of, with the thousands of stars all interacting, then the formula 
for the total potential energy is simply one term plus another term, etc., summed 
over all pairs of stars, and the kinetic energy is the sum of the kinetic energies of 
all the individual stars. But the globular cluster as a whole is drifting in space too, 
and, if we were far enough away from it and did not see the details, could be thought 
of as a single object. Then if forces were applied to it, some of those forces might 
end up driving it forward as a whole, and we would see the center of the whole 
thing moving. On the other hand, some of the forces can be, so to speak, “wasted” 
in increasing the kinetic or potential energy of the “particles” inside. Let us 
suppose, for instance, that the action of these forces expands the whole cluster 
and makes the particles move faster. The total energy of the whole thing is 
really conserved, but seen from the outside with our crude eyes which cannot see 
the confusion of motions inside, and just thinking of the kinetic energy of the 
motion of the whole object as though it were a single particle, it would appear that 
energy is not conserved, but this is due to a lack of appreciation of what it is that 
we see. And that, it turns out, is the case: the total energy of the world, kinetic 
plus potential, is a constant when we look closely enough. 

When we study matter in the finest detail at the atomic level, it is not always 
easy to separate the total energy of a thing into two parts, kinetic energy and 
potential energy, and such separation is not always necessary. It is almost always 
possible to do it, so let us say that it is always possible, and that the potential- 
plus-kinetic energy of the world is constant. Thus the total potential-plus-kinetic 
energy inside the whole world is constant, and if the “world” is a piece of isolated 
material, the energy is constant if there are no external forces. But as we have 
seen, some of the kinetic and potential energy of a thing may be internal, for 
instance the internal molecular motions, in the sense that we do not notice it. We 
know that in a glass of water everything is jiggling around, all the parts are moving 
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all the time, so there is a certain kinetic energy inside, which we ordinarily may not 
pay any attention to. We do not notice the motion of the atoms, which produces 
heat, and so we do not call it kinetic energy, but heat is primarily kinetic energy. 
Internal potential energy may also be in the form, for instance, of chemical energy: 
when we burn gasoline energy is liberated because the potential energies of the 
atoms in the new atomic arrangement are lower than in the old arrangement. It 
is not strictly possible to treat heat as being pure kinetic energy, for a little of the 
potential gets in, and vice versa for chemical energy, so we put the two together 
and say that the total kinetic and potential energy inside an object is partly heat, 
partly chemical energy, and so on. Anyway, all these different forms of internal 
energy are sometimes considered as “lost” energy in the sense described above; 
this will be made clearer when we study thermodynamics. 

As another example, when friction is present it is not true that kinetic energy 
is lost, even though a sliding object stops and the kinetic energy seems to be lost. 
The kinetic energy is not lost because, of course, the atoms inside are jiggling with 
a greater amount of kinetic energy than before, and although we cannot see that, 
we can measure it by determining the temperature. Of course if we disregard the 
heat energy, then the conservation of energy theorem will appear to be false. 

Another situation in which energy conservation appears to be false is when 
we study only part of a system. Naturally, the conservation of energy theorem 
will appear not to be true if something is interacting with something else on the 
outside and we neglect to take that interaction into account. 

In classical physics potential energy involved only gravitation and electricity, 
but now we have nuclear energy and other energies also. Light, for example, 
would involve a new form of energy in the classical theory, but we can also, if we 
want to, imagine that the energy of light is the kinetic energy of a photon, and then 
our formula (14.2) would still be right. 


14-5 Potentials and fields 

We shall now discuss a few of the ideas associated with potential energy and 
with the idea of a field. Suppose we have two large objects A and B and a third 
very small one which is attracted gravitationally by the two, with some resultant 
force F. We have already noted in Chapter 12 that the gravitational force on a 
particle can be written as its mass, m, times another vector, C, which is dependent 
only upon the position of the particle: 

F - mC. 

We can analyze gravitation, then, by imagining that there is a certain vector C at 
every position in space which “acts” upon a mass which we may place there, but 
which is there itself whether we actually supply a mass for it to “act” on or not. 
C has three components, and each of those components is a function of (x, y, z), 
a function of position in space. Such a thing we call a field , and we say that the 
objects A and B generate the field, i.e., they “make” the vector C. When an object 
is put in a field, the force on it is equal to its mass times the value of the field vector 
at the point where the object is put. 

We can also do the same with the potential energy. Since the potential energy, 
the integral of (force) • (ds) can be written as m times the integral of the (field) • ( c/s ), 
a mere change of scale, we see that the potential energy U(x, y , z) of an object 
located at a point (x, y , z,) in space can be written as m times another function which 
we may call the potential The integral JC • ds = — 'k, just as JF * ds = —U; 
there is only a scale factor between the two: 

U — — J F * c/s — —?nJC'ds = m'V. (14.7) 

By having this function \k(jc, y, z) at every point in space, we can immediately 
calculate the potential energy of an object at any point in space, namely, 
U(x, y , z) = mAr(x, y , z) — rather a trivial business, it seems. But it is not really 
trivial, because it is sometimes much nicer to describe the field by giving the value 
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of \k everywhere in space instead of having to give C. Instead of having to write 
three complicated components of a vector function, we can give instead the scalar 
function Furthermore, it is much easier to calculate 'k than any given component 
of C when the field is produced by a number of masses, for since the potential is a 
scalar we merely add, without worrying about direction. Also, the field C can be 
recovered easily from as we shall shortly see. Suppose we have point masses 
m u m 2 , . . . at the points 1 , 2, . . . and we wish to know the potential Sk at some 
arbitrary point p . This is simply the sum of the potentials at P due to the individual 
masses taken one by one: 


Hp) = £ - 


/ = 1 , 2 ,... 


-*(r) = — Gm/r 


- *(r ) constant =-Gm/a 


Fig. 14-4. Potential due to a spher- 
ical shell of radius a. 


In the last chapter we used this formula, that the potential is the sum of the 
potentials from all the different objects, to calculate the potential due to a spherical 
shell of matter by adding the contributions to the potential at a point from all 
parts of the shell. The result of this calculation is shown graphically in Fig. 14-4. 
It is negative, having the value zero at r — qo and varying as 1/7* down to the 
radius a, and then is constant inside the shell. Outside the shell the potential is 
— Gm/r , where m is the mass of the shell, which is exactly the same as it would 
have been if all the mass were located at the center. But it is not everywhere 
exactly the same, for inside the shell the potential turns out to be — Gm/a , and is a 
constant! When the potential is constant , there is no field, or when the potential 
energy is constant there is no force, because if we move an object from one place 
to another anywhere inside the sphere the work done by the force is exactly zero. 
Why? Because the work done in moving the object from one place to the other 
is equal to minus the change in the potential energy (or, the corresponding field 
integral is the change of the potential). But the potential energy is the same at 
any two points inside, so there is zero change in potential energy, and therefore 
no work is done in going between any two points inside the shell. The only way 
the work can be zero for all directions of displacement is that there is no force at all. 

This gives us a clue as to how we can obtain the force or the field, given the 
potential energy. Let us suppose that the potential energy of an object is known 
at the position ( x , y, z) and we want to know what the force on the object is. It 
will not do to know the potential at only this one point, as we shall see; it requires 
knowledge of the potential at neighboring points as well. Why? How can we 
calculate the ^-component of the force? (If we can do this, of course, we can also 
find the y- and z-components, and we will then know the whole force.) Now, if 
we were to move the object a small distance Ax , the work done by the force on the 
object would be the ^-component of the force times Ax, if Ax is sufficiently small, 
and this should equal the change in potential energy in going from one point to 
the other: 

AW = -A U = F x Ax. (14.9) 

We have merely used the formula jF*c?s = —At/, but for a very short path. 
Now we divide by Ax and so find that the force is 


F x = — AU/Ax. 


(14.10) 


Of course this is not exact. What we really want is the limit of (14.10) as Ax 
gets smaller and smaller, because it is only exactly right in the limit of infinitesimal 
Ax. This we recognize as the derivative of U with respect to x, and we would be 
inclined, therefore, to write — dU/dx . But U depends on x, y, and z, and the 
mathematicians have invented a different symbol to remind us to be very careful 
when we are differentiating such a function, so as to remember that we are con- 
sidering that only x varies , and y and z do not vary. Instead of a d they simply 
make a “backwards 6,” or d. (A d should have been used in the beginning of 
calculus because we always want to cancel that d , but we never want to cancel a d!) 
So they write dU/dx, and furthermore, in moments of duress, if they want to be 
very careful, they put a line beside it with a little yz at the bottom (dU/dx \ yz ). 


which means “Take the derivative of U with respect to x, keeping y and z constant.” 
Most often we leave out the remark about what is kept constant because it is 
usually evident from the context, so we usually do not use the line with the y and 
2 . However, always use a d instead of a d as a warning that it is a derivative with 
some other variables kept constant. This is called a partial derivative ; it is a deriva- 
tive in which we vary only x. 

Therefore, we find that the force in the x>direction is minus the partial deriva- 
tive of U with respect to x : 

F x = -dU/dx. (14.11) 

In a similar way, the force in the ^-direction can be found by differentiating U 
with respect to y , keeping x and z constant, and the third component, of course, 
is the derivative with respect to z , keeping y and x: constant: 

F y = -dU/dy, F z = - dU/dz . (14.12) 

This is the way to get from the potential energy to the force. We get the field from 
the potential in exactly the same way: 

C x = -dV/dx, C y = -d^/dy, C z = -d*/dz. (14.13) 

Incidentally, we shall mention here another notation, which we shall not 
actually use for quite a while: Since C is a vector and has x>, y -, and 2 -components, 
the symbolized d/dx , d/dy , and d/dz which produce the x:-, y-, and 2 -compo- 
nents are something like vectors. The mathematicians have invented a glorious 
new symbol, V, called “grad” or “gradient” which is not a quantity but an operator 
which makes a vector from a scalar. It has the following “components”: The 
xr-component of this “grad” is d/dx , the ^-component is d/dy, and the 2 -component 
is d/dz , and then we have the fun of writing our formulas this way: 

F = -VC/, C = -V*. (14.14) 

Using V gives us a quick way of testing whether we have a real vector equation or 
not, but actually Eq. (14.14) means precisely the same as Eqs. (14.11) and (14.12); 
it is just another way of writing them, and since we do not want to write three 
equations every time, we just write VU instead. 

One more example of fields and potentials has to do with the electrical case. 
In the case of electricity the force on a stationary object is the charge times the 
electric field: F = qE. (In general, of course, the x>component of force in an 
electrical problem has also a part which depends on the magnetic field. It is easy 
to show from Eq. (12.10) that the force on a particle due to magnetic fields is 
always at right angles to its velocity, and also at right angles to the field. Since 
the force due to magnetism on a moving charge is at right angles to the velocity, 
no work is done by the magnetism on the moving charge because the motion is at 
right angles to the force. Therefore, in calculating theorems of kinetic energy in 
electric and magnetic fields we can disregard the contribution from the magnetic 
field, since it does not change the kinetic energy.) We suppose that there is only 
an electric field. Then we can calculate the energy, or work done, in the same way 
as for gravity, and calculate a quantity <j> which is minus the integral of E • ds , 
from the arbitrary fixed point to the point where we make the calculation, and then 
the potential energy in an electric field is just charge times this quantity <p: 

<#>( r) = f Eds ’ 

U = q^>. 

Let us take, as an example, the case of two parallel metal plates, each with a 
surface charge of =L<r per unit area. This is called a parallel-plate capacitor. We 
found previously that there is zero force outside the plates and that there is a 
constant electric field between them, directed from + to — and of magnitude 
<r/e 0 (Fig. 14-5). We would like to know how much work would be done in 
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carrying a charge from one plate to the other. The work would be the (force) * (ds) 
integral, which can be written as charge times the potential value at plate 1 minus 
that at plate 2: 

w = j x ^ • Js = — <£ 2 )- 

We can actually work out the integral because the force is constant, and if we call 
the separation of the plates d , then the integral is easy: 

/*2 /»2 

/ *- vL- 

J 1 *0 J l €o 

The difference in potential, A <j> = <rd/e 0 , is called the voltage difference , and 0 
is measured in volts. When we say a pair of plates is charged to a certain voltage, 
what we mean is that the difference in electrical potential of the two plates is so- 
and-so many volts. For a capacitor made of two parallel plates carrying a surface 
charge ±<r, the voltage, or difference in potential, of the pair of plates is ad/e 0 . 


+ + + + + + + 

I 


Fig. 14-5. Field between parallel 
plates. 
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The Special Theory of Relativity 


15-1 The principle of relativity 

For over 200 years the equations of motion enunciated by Newton were be- 
lieved to describe nature correctly, and the first time that an error in these laws 
was discovered, the way to correct it was also discovered. Both the error and its 
correction were discovered by Einstein in 1905. 

Newton’s Second Law, which we have expressed by the equation 

F = d(mv)/dt, 

was stated with the tacit assumption that m is a constant, but we now know that 
this is not true, and that the mass of a body increases with velocity. In Einstein’s 
corrected formula m has the value 

- m ° 

m Vl - w2/c 2 ’ 05.1) 

where the “rest mass” m 0 represents the mass of a body that is not moving and 
c is the speed of light, which is about 3 X 10 5 km -sec" 1 or about 186,000 
mi * sec -1 . 

For those who want to learn just enough about it so they can solve problems, 
that is all there is to the theory of relativity — it just changes Newton’s laws by 
introducing a correction factor to the mass. From the formula itself it is easy to 
see that this mass increase is very small in ordinary circumstances. If the velocity 
is even as great as that of a satellite, which goes around the earth at 5 mi/sec, 
then vjc = 5/186,000: putting this value into the formula shows that the cor- 
rection to the mass is only one part in two to three billion, which is nearly impossible 
to observe. Actually, the correctness of the formula has been amply confirmed by 
the observation of many kinds of particles, moving at speeds ranging up to practi- 
cally the speed of light. However, because the effect is ordinarily so small, it 
seems remarkable that it was discovered theoretically before it was discovered 
experimentally. Empirically, at a sufficiently high velocity, the effect is very large, 
but it was not discovered that way. Therefore it is interesting to see how a law 
that involved so delicate a modification (at the time when it was first discovered) 
was brought to light by a combination of experiments and physical reasoning. 
Contributions to the discovery were made by a number of people, the final result 
of whose work was Einstein’s discovery. 

There are really two Einstein theories of relativity. This chapter is concerned 
with the Special Theory of Relativity, which dates from 1905. In 1915 Einstein 
published an additional theory, called the General Theory of Relativity. This 
latter theory deals with the extension of the Special Theory to the case of the law 
of gravitation ; we shall not discuss the General Theory here. 

The principle of relativity was first stated by Newton, in one of his corollaries 
to the laws of motion: “The motions of bodies included in a given space are the 
same among themselves, whether that space is at rest or moves uniformly forward 
in a straight line.” This means, for example, that if a space ship is drifting along 
at a uniform speed, all experiments performed in the space ship and all the phenom- 
ena in the space ship will appear the same as if the ship were not moving, pro- 
vided, of course, that one does not look outside. That is the meaning of the princi- 
ple of relativity. This is a simple enough idea, and the only question is whether it 
is true that in all experiments performed inside a moving system the laws of physics 
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will appear the same as they would if the system were standing still. Let us first 
investigate whether Newton’s laws appear the same in the moving system. 

Suppose that Moe is moving in the ^-direction with a uniform velocity u, and 
he measures the position of a certain point, shown in Fig. 15-1. He designates the 
“x-distance” of the point in his coordinate system as x' . Joe is at rest, and measures 
the position of the same point, designating its x-coordinate in his system as x. 
The relationship of the coordinates in the two systems is clear from the diagram. 
After time t Moe’s origin has moved a distance ut , and if the two systems originally 
coincided. 


x f — X — ut 



that these laws transform to the same laws in the primed system; that is, the laws 
Fig. 15-1. Two coordinate systems 0 f Newton are of the same form in a moving system as in a stationary system, and 

in uniform relative motion along their therefore it is impossible to tell, by making mechanical experiments, whether the 

x " axes * system is moving or not. 

The principle of relativity has been used in mechanics for a long time. It was 
employed by various people, in particular Huygens, to obtain the rules for the 
collision of billiard balls, in much the same way as we used it in Chapter 10 to 
discuss the conservation of momentum. In the past century interest in it was 
heightened as the result of investigations into the phenomena of electricity, mag- 
netism, and light. A long series of careful studies of these phenomena by many 
people culminated in Maxwell’s equations of the electromagnetic field, which 
describe electricity, magnetism, and light in one uniform system. However, the 
Maxwell equations did not seem to obey the principle of relativity. That is, if we 
transform Maxwell’s equations by the substitution of equations 15.2, their form 
does not remain the same; therefore, in a moving space ship the electrical and 
optical phenomena should be different from those in a stationary ship. Thus 
one could use these optical phenomena to determine the speed of the ship; in 
particular, one could determine the absolute speed of the ship by making suitable 
optical or electrical measurements. One of the consequences of Maxwell’s equa- 
tions is that if there is a disturbance in the field such that light is generated, these 
electromagnetic waves go out in all directions equally and at the same speed c, or 
186,000 mi/sec. Another consequence of the equations is that if the source 
of the disturbance is moving, the light emitted goes through space at the same 
speed c. This is analogous to the case of sound, the speed of sound waves being 
likewise independent of the motion of the source. 

This independence of the motion of the source, in the case of light, brings up 
an interesting problem: 

Suppose we are riding in a car that is going at a speed w, and light from the 
rear is going past the car with speed c. Differentiating the first equation in (15.2) 
gives 

dx’/dt = dx/dt — u, 

which means that according to the Galilean transformation the apparent speed of 
the passing light, as we measure it in the car, should not be c but should be c — u. 
For instance, if the car is going 100,000 mi/sec, and the light is going 186,000 
mi/sec, then apparently the light going past the car should go 86,000 mi/sec. 
In any case, by measuring the speed of the light going past the car (if the Galilean 
transformation is correct for light), one could determine the speed of the car. A 
number of experiments based on this general idea were performed to determine 
the velocity of the earth, but they all failed — they gave no velocity at all We 
shall discuss one of these experiments in detail, to show exactly what was done 
and what was the matter; something was the matter, of course, something was 
wrong with the equations of physics. What could it be? 
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15-2 The Lorentz transformation 


When the failure of the equations of physics in the above case came to light, 
the first thought that occurred was that the trouble must lie in the new Maxwell 
equations of electrodynamics, which were only 20 years old at the time. It seemed 
almost obvious that these equations must be wrong, so the thing to do was to 
change them in such a way that under the Galilean transformation the principle 
of relativity would be satisfied. When this was tried, the new terms that had to 
be put into the equations led to predictions of new electrical phenomena that did 
not exist at all when tested experimentally, so this attempt had to be abandoned. 
Then it gradually became apparent that Maxwell’s laws of electrodynamics were 
correct, and the trouble must be sought elsewhere. 

In the meantime, H. A. Lorentz noticed a remarkable and curious thing when 
he made the following substitutions in the Maxwell equations: 

, _ X — lit 

\/l — u 2 /c 2 

y f = y, 

z' = z , (15.3) 

t , = t - ux/c 2 s 
Vi — u 2 /c 2 

namely, Maxwell’s equations remain in the same form when this transformation 
is applied to them! Equations (15.3) are known as a Lorentz transformation. 
Einstein, following a suggestion originally made by Poincare, then proposed that 
all the physical laws should be of such a kind that they remain unchanged under a 
Lorentz transformation. In other words, we should change, not the laws of electro- 
dynamics, but the laws of mechanics. How shall we change Newton’s laws 
so that they will remain unchanged by the Lorentz transformation? If this 
goal is set, we then have to rewrite Newton’s equations in such a way that the 
conditions we have imposed are satisfied. As it turned out, the only requirement is 
that the mass m in Newton’s equations must be replaced by the form shown in 
Eq. (15.1). When this change is made, Newton’s laws and the laws of electrody- 
namics will harmonize. Then if we use the Lorentz transformation in comparing 
Moe’s measurements with Joe’s, we shall never be able to detect whether either is 
moving, because the form of all the equations will be the same in both coordinate 
systems! 

It is interesting to discuss what it means that we replace the old transformation 
between the coordinates and time with a new one, because the old one (Galilean) 
seems to be self-evident, and the new one (Lorentz) looks peculiar. We wish to 
know whether it is logically and experimentally possible that the new, and not the 
old, transformation can be correct. To find that out, it is not enough to study the 
laws of mechanics but, as Einstein did, we too must analyze our ideas of space 
and time in order to understand this transformation. We shall have to discuss 
these ideas and their implications for mechanics at some length, so we say in 
advance that the effort will be justified, since the results agree with experiment. 
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15-3 The Michelson-Morley experiment 

As mentioned above, attempts were made to determine the absolute velocity 
of the earth through the hypothetical “ether” that was supposed to pervade all 
space. The most famous of these experiments is one performed by Michelson 
and Morley in 1887. It was 18 years later before the negative results of the experi- 
ment were finally explained, by Einstein. 

The Michelson-Morley experiment was performed with an apparatus like that 
shown schematically in Fig. 15-2. This apparatus is essentially comprised of a 
light source A , a partially silvered glass plate B , and two mirrors C and E , all 
mounted on a rigid base. The mirrors are placed at equal distances L from B. 
The plate B splits an oncoming beam of light, and the two resulting beams con- 


Fig. 15-2. Schematic diagram of the 
Michelson-Morley experiment. 
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tinue in mutually perpendicular directions to the mirrors, where they are reflected 
back to B. On arriving back at B , the two beams are recombined as two superposed 
beams, D and F . If the time taken for the light to go from B to E and back is the 
same as the time from B to C and back, the emerging beams D and F will be in 
phase and will reinforce each other, but if the two times differ slightly, the beams 
will be slightly out of phase and interference will result. If the apparatus is “at 
rest” in the ether, the times should be precisely equal, but if it is moving toward 
the right with a velocity w, there should be a difference in the times. Let us see why. 

First, let us calculate the time required for the light to go from B to E and 
back. Let us say that the time for light to go from plate B to mirror E is / 1 , and 
the time for the return is t 2 . Now, while the light is on its way from B to the mirror, 
the apparatus moves a distance ut i, so the light must traverse a distance L + ut\, 
at the speed c. We can also express this distance as c/i, so we have 

cti = L + ut x, or ti — L/(c — u). 

(This result is also obvious from the point of view that the velocity of light relative 
to the apparatus is c — u, so the time is the length L divided by c — «.) In a like 
manner, the time t 2 can be calculated. During this time the plate B advances a 
distance ut 2 , so the return distance of the light is L — ut 2 . Then we have 

ct 2 == Id — ut 2i or 1 2 = L/(c u). 

Then the total time is 

ti + t 2 — 2Lc/(c 2 — u 2 ). 


For convenience in later comparison of times we write this as 


1 1 + t 2 


2L/c 

1 - U 2 /c 2 * 


(15.4) 


Our second calculation will be of the time t 3 for the light to go from B to the 
mirror C. As before, during time t 3 the mirror C moves to the right a distance ut 3 
to the position C"; in the same time, the light travels a distance ct 3 along the 
hypotenuse of a triangle, which is BC f . For this right triangle we have 

(c/3) 2 = L 2 + (w/3) 2 
or 

L 2 = c 2,2 _ u 2 t 2 = (c 2 _ ^ 

from which we get 


t 3 = L/\Jc 2 — « 2 . 


For the return trip from C f the distance is the same, as can be seen from the 
symmetry of the figure; therefore the return time is also the same, and the total 
time is 2/ 3 . With a little rearrangement of the form we can write 



(15.5) 


We are now able to compare the times taken by the two beams of light. In 
expressions (15.4) and (15.5) the numerators are identical, and represent the time 
that would be taken if the apparatus were at rest. In the denominators, the term 
u 2 /c 2 will be small, unless u is comparable in size to c. The denominators represent 
the modifications in the times caused by the motion of the apparatus. And behold, 
these modifications are not the same — the time to go to C and back is a little less 
than the time to E and back, even though the mirrors are equidistant from B , and 
all we have to do is to measure that difference with precision. 

Here a minor technical point arises — suppose the two lengths L are not 
exactly equal? In fact, we surely cannot make them exactly equal. In that case 
we simply turn the apparatus 90 degrees, so that BC is in the line of motion and 
BE is perpendicular to the motion. Any small difference in length then becomes 
15-4 



unimportant, and what we look for is a shift in the interference fringes when we 
rotate the apparatus. 

In carrying out the experiment, Michelson and Morley oriented the apparatus 
so that the line BE was nearly parallel to the earth’s motion in its orbit (at certain 
times of the day and night). This orbital speed is about 18 miles per second, and 
any “ether drift” should be at least that much at some time of the day or night and 
at some time during the year. The apparatus was amply sensitive to observe such 
an effect, but no time difference was found — the velocity of the earth through the 
ether could not be detected. The result of the experiment was null. 

The result of the Michelson-Morley experiment was very puzzling and most 
disturbing. The first fruitful idea for finding a way out of the impasse came from 
Lorentz. He suggested that material bodies contract when they are moving, and 
that this foreshortening is only in the direction of the motion, and also, that if 
the length is L 0 when a body is at rest, then when it moves with speed u parallel 
to its length, the new length, which we call L\\ (L-parallel), is given by 

L\\ — Lq Vi - u 2 /c 2 . (15.6) 


When this modification is applied to the Michelson-Morley interferometer appara- 
tus the distance from B to C does not change, but the distance from B to E is 
shortened to L\/l — u 2 /c 2 . Therefore Eq. (15.5) is not changed, but the L of 
Eq. (15.4) must be changed in accordance with Eq. (15.6). When this is done we 
obtain 


t\ + t 2 = 


(2 L/c) Vl - u 2 /c 2 
1 - u 2 /c 2 


2 L/c 

Vl - u 2 j c 2 


(15.7) 


Comparing this result with Eq. (15.5), we see that t 1 + / 2 = 2/ 3 . So if the ap- 
paratus shrinks in the manner just described, we have a way of understanding why 
the Michelson-Morley experiment gives no effect at all. Although the contraction 
hypothesis successfully accounted for the negative result of the experiment, it was 
open to the objection that it was invented for the express purpose of explaining 
away the difficulty, and was too artificial. However, in many other experiments 
to discover an ether wind, similar difficulties arose, until it appeared that nature 
was in a “conspiracy” to thwart man by introducing some new phenomenon to 
undo every phenomenon that he thought would permit a measurement of u. 

It was ultimately recognized, as Poincare pointed out, that a complete conspiracy 
is itself a law of nature! Poincare then proposed that there is such a law of nature, 
that it is not possible to discover an ether wind by any experiment; that is, there 
is no way to determine an absolute velocity. 


15-4 Transformation of time 

In checking out whether the contraction idea is in harmony with the facts in 
other experiments, it turns out that everything is correct provided that the times 
are also modified, in the manner expressed in the fourth equation of the set (15.3). 
That is because the time / 3 , calculated for the trip from B to C and back, is not the 
same when calculated by a man performing the experiment in a moving space 
ship as when calculated by a stationary observer who is watching the space ship. 
To the man in the ship the time is simply 2 L/c, but to the other observer it is 
(2L/c)/y/\ — u 2 /c 2 (Eq. 15.5). In other words, when the outsider sees the man 
in the space ship lighting a cigar, all the actions appear to be slower than normal, 
while to the man inside, everything moves at a normal rate. So not only must the 
lengths shorten, but also the time-measuring instruments (“clocks”) must appar- 
ently slow down. That is, when the clock in the space ship records 1 second 
elapsed, as seen by the man in the ship, it shows 1/Vl — u 2 /c 2 second to the 
man outside. 

This slowing of the clocks in a moving system is a very peculiar phenomenon, 
and is worth an explanation. In order to understand this, we have to watch the 
machinery of the clock and see what happens when it is moving. Since that is 
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Fig. 15-3. (a) A “light clock" at rest 

in the S' system, (b) The same clock, 
moving through the S system, (c) Illustra- 
tion of the diagonal path taken by the 
light beam in a moving "light clock.” 


rather difficult, we shall take a very simple kind of clock. The one we choose is 
rather a silly kind of clock, but it will work in principle: it is a rod (meter stick) 
with a mirror at each end, and when we start a light signal between the mirrors, 
the light keeps going up and down, making a click every time it comes down, like 
a standard ticking clock. We build two such clocks, with exactly the same lengths, 
and synchronize them by starting them together ; then they agree always thereafter, 
because they are the same in length, and light always travels with speed c . We 
give one of these clocks to the man to take along in his space ship, and he mounts 
the rod perpendicular to the direction of motion of the ship; then the length of 
the rod will not change. How do we know that perpendicular lengths do not 
change? The men can agree to make marks on each other’s y-meter stick as they 
pass each other. By symmetry, the two marks must come at the same y- and 
^'-coordinates, since otherwise, when they get together to compare results, one 
mark will be above or below the other, and so we could tell who was really moving. 

Now let us see what happens to the moving clock. Before the man took it 
aboard, he agreed that it was a nice, standard clock, and when he goes along in 
the space ship he will not see anything peculiar. If he did, he would know he was 
moving — if anything at all changed because of the motion, he could tell he was 
moving. But the principle of relativity says this is impossible in a uniformly moving 
system, so nothing has changed. On the other hand, when the external observer 
looks at the clock going by, he sees that the light, in going from mirror to mirror, 
is “really” taking a zigzag path, since the rod is moving sidewise all the while. 
We have already analyzed such a zigzag motion in connection with the Michelson- 
Morley experiment. If in a given time the rod moves forward a distance propor- 
tional to u in Fig. 15-3, the distance the light travels in the same time is propor- 
tional to c , and the vertical distance is therefore proportional to %/ c 2 — u 2 . 

That is, it takes a longer time for light to go from end to end in the moving 
clock than in the stationary clock. Therefore the apparent time between clicks is 
longer for the moving clock, in the same proportion as shown in the hypotenuse 
of the triangle (that is the source of the square root expressions in our equations). 
From the figure it is also apparent that the greater u is, the more slowly the moving 
clock appears to run. Not only does this particular kind of clock run more slowly, 
but if the theory of relativity is correct, any other clock, operating on any principle 
whatsoever, would also appear to run slower, and in the same proportion — we 
can say this without further analysis. Why is this so? 

To answer the above question, suppose we had two other clocks made exactly 
alike with wheels and gears, or perhaps based on radioactive decay, or something 
else. Then we adjust these clocks so they both run in precise synchronism with 
our first clocks. When light goes up and back in the first clocks and announces 
its arrival with a click, the new models also complete some sort of cycle, which 
they simultaneously announce by some doubly coincident flash, or bong, or other 
signal. One of these clocks is taken into the space ship, along with the first kind. 
Perhaps this clock will not run slower, but will continue to keep the same time as 
its stationary counterpart, and thus disagree with the other moving clock. Ah no, 
if that should happen, the man in the ship could use this mismatch between his 
two clocks to determine the speed of his ship, which we have been supposing 
is impossible. We need not know anything about the machinery of the new clock 
that might cause the effect — we simply know that whatever the reason, it will 
appear to run slow, just like the first one. 

Now if all moving clocks run slower, if no way of measuring time gives any- 
thing but a slower rate, we shall just have to say, in a certain sense, that time 
itself appears to be slower in a space ship. All the phenomena there — the man’s 
pulse rate, his thought processes, the time he takes to light a cigar, how long it 
takes to grow up and get old — all these things must be slowed down in the same 
proportion, because he cannot tell he is moving. The biologists and medical men 
sometimes say it is not quite certain that the time it takes for a cancer to develop 
will be longer in a space ship, but from the viewpoint of a modern physicist it is 
nearly certain ; otherwise one could use the rate of cancer development to determine 
the speed of the ship! 
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T A very interesting example of the slowing of time with motion is furnished 
by mu-mesons (muons), which are particles that disintegrate spontaneously 
after an average lifetime of 2.2 X 10~° sec. They come to the earth in cosmic 
rays, and can also be produced artificially in the laboratory. Some of them 
disintegrate in midair, but the remainder disintegrate only after they encounter 
a piece of material and stop. It is clear that in its short lifetime a muon cannot 
travel, even at the speed of light, much more than 600 meters. But although the 
muons are created at the top of the atmosphere, some 10 kilometers up, yet they are 
! actually found in a laboratory down here, in cosmic rays. How can that be? 

: The answer is that different muons move at various speeds, some of which are 

very close to the speed of light. While from their own point of view they live only 
j' about 2 jusec, from our point of view they live considerably longer — enough longer 
that they may reach the earth. The factor by which the time is increased has already 
; been given as 1 /V 1 — u 2 /c 2 . The average life has been measured quite accurately 

for muons of different velocities, and the values agree closely with the formula. 

We do not know why the meson disintegrates or what its machinery is, but 
we do know its behavior satisfies the principle of relativity. That is the utility of 
the principle of relativity — -it permits us to make predictions, even about things 
that otherwise we do not know much about. For example, before we have any 
idea at all about what makes the meson disintegrate, we can still predict that when 
it is moving at nine-tenths of th e-sqeed oflighU the apparent length of time that 
it lasts is ( 22 JX 10~ G )/\/l — 9 2 /10 2 sec; a nd our prediction works — that is 
the gobd~tHing about it. 

15-5 The Lorentz contraction 

Now let us return to the Lorentz transformation (15.3) and try to get a better 
understanding of the relationship between the (x, y , z, t) and the (x', y', z ', t') 
coordinate systems, which we shall call the S and S' systems, or Joe and Moe 
systems, respectively. We have already noted that the first equation is based on 
the Lorentz suggestion of contraction along the ^-direction; how can we prove that 
a contraction takes place? In the Michelson-Morley experiment, we now appre- 
ciate that the transverse arm BC cannot change length, by the principle of relativity; 

1 yet the null result of the experiment demands that the times must be equal. So, in 

order for the experiment to give a null result, the longitudinal arm BE must 
appear shorter, by the square root \/l — u 2 /c 2 . What does this contraction mean, 
in terms of measurements made by Joe and Moe? Suppose that Moe, moving 
with the S' system in the jc-direction, is measuring the ^-coordinate of some point 
with a meter stick. He lays the stick down x' times, so he thinks the distance is 
x' meters. From the viewpoint of Joe in the S system, however, Moe is using a 
foreshortened ruler, so the “real” distance measured is x'y/l — u 2 /c % meters. 
Then if the S' system has travelled a distance ut away from the S system, the S 
observer would say that the same point, measured in his coordinates, is at a 
distance x = x'\/l — u 2 /c 2 + ut, or 

, x — ut 
x = r = > 

VT — u 2 /c 2 

which is the first equation of the Lorentz transformation. 


15-6 Simultaneity 

In an analogous way, because of the difference in time scales, the denominator 
expression is introduced into the fourth equation of the Lorentz transformation. 
The most interesting term in that equation is the ux/c 2 in the numerator, because 
that is quite new and unexpected. Now what does that mean? If we look at the 
situation carefully we see that events that occur at two separated places at the same 
time, as seen by Moe in S', do not happen at the same time as viewed by Joe in S. 
If one event occurs at point Xi at time / 0 and the other event at x 2 and t 0 (the same 
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time), we find that the two corresponding times t[ and differ by an amount 


*2 - - 


u(x 1 — x 2 )/c 2 
Vl — u 2 /c 2 


This circumstance is called “failure of simultaneity at a distance,” and to make 
the idea a little clearer let us consider the following experiment. 

Suppose that a man moving in a space ship (system S') has placed a clock at 
each end of the ship and is interested in making sure that the two clocks are in 
synchronism. How can the clocks be synchronized? There are many ways. 
One way, involving very little calculation, would be first to locate exactly the 
midpoint between the clocks. Then from this station we send out a light signal 
which will go both ways at the same speed and will arrive at both clocks, clearly, 
at the same time. This simultaneous arrival of the signals can be used to syn- 
chronize the clocks. Let us then suppose that the man in S' synchronizes his clocks 
by this particular method. Let us see whether an observer in system S' would agree 
that the two clocks are synchronous. The man in S' has a right to believe they are, 
because he does not know that he is moving. But the man in S reasons that since 
the ship is moving forward, the clock in the front end was running away from the 
light signal, hence the light had to go more than halfway in order to catch up ; the 
rear clock, however, was advancing to meet the light signal, so this distance was 
shorter. Therefore the signal reached the rear clock first, although the man in S' 
thought that the signals arrived simultaneously. We thus see that when a man in a 
space ship thinks the times at two locations are simultaneous, equal values of t ' 
in his coordinate system must correspond to different values of t in the other 
coordinate system! 



15-7 Four-vectors 

Let us see what else we can discover in the Loren tz transformation. It is 
interesting to note that the transformation between the x’s and *’s is analogous in 
form to the transformation of the x's and / s that we studied in Chapter 11 for a 
rotation of coordinates. We then had 

x' — x cos 9 + y sin 9 , 

(15.8) 

y' = y cos 9 — x sin 9 , 

in which the new x' mixes the old x and y, and the new y also mixes the old x 
and y; similarly, in the Lorentz transformation we find a new x' which is a mixture 
of x and t , and a new t' which is a mixture of t and x. So the Lorentz transforma- 
tion is analogous to a rotation, only it is a “rotation” in space and time , which 
appears to be a strange concept. A check of the analogy to rotation can be made 
by calculating the quantity 


+ y 2 + z* 


c 2 t' 2 = x 2 + y 2 + z 2 


In this equation the first three terms on each side represent, in three-dimensional 
geometry, the square of the distance between a point and the origin (surface of a 
sphere) which remains unchanged (invariant) regardless of rotation of the co- 
ordinate axes. Similarly, Eq. (15.9) shows that there is a certain combination 
which includes time, that is invariant to a Lorentz transformation. Thus, the 
analogy to a rotation is complete, and is of such a kind that vectors, i.e., quantities 
involving “components” which transform the same way as the coordinates and 
time, are also useful in connection with relativity. 

Thus we contemplate an extension of the idea of vectors, which we have so 
far considered to have only space components, to include a time component. 
That is, we expect that there will be vectors with four components, three of which 
are like the components of an ordinary vector, and with these will be associated 
a fourth component, which is the analog of the time part. 



T This concept will be analyzed further in the next chapters, where we shall 
find that if the ideas of the preceding paragraph are applied to momentum, the 
transformation gives three space parts that are like ordinary momentum com- 
ponents, and a fourth component, the time part, which is the energy. 


15-8 Relativistic dynamics 

We are now ready to investigate, more generally, what form the laws of 
mechanics take under the Lorentz transformation. [We have thus far explained 
how length and time change, but not how we get the modified formula for m 
(Eq. 15.1). We shall do this in the next chapter.] To see the consequences of 
Einstein’s modification of m for Newtonian mechanics, we start with the Newtonian 
law that force is the rate of change of momentum, or 

F = d(my)/dt. 


Momentum is still given by mv , but when we use the new m this becomes 


p — ms 


m 0 y 

Vi -V/c 2 


(15.10) 


This is Einstein’s modification of Newton’s laws. Under this modification, if 
action and reaction are still equal (which they may not be in detail, but are in the 
long run), there will be conservation of momentum in the same way as before, 
but the quantity that is being conserved is not the old my with its constant mass, 
but instead is the quantity shown in (15.10), which has the modified mass. When 
this change is made in the formula for momentum, conservation of momentum 
still works. 

Now let us see how momentum varies with speed. In Newtonian mechanics 
it is proportional to the speed and, according to (15.10), over a considerable range 
of speed, but small compared with c, it is nearly the same in relativistic mechanics, 
because the square-root expression differs only slightly from 1. But when v is 
almost equal to c, the square-root expression approaches zero, and the momentum 
therefore goes toward infinity. 

What happens if a constant force acts on a body for a long time? In Newtonian 
mechanics the body keeps picking up speed until it goes faster than light. But this 
is impossible in relativistic mechanics. In relativity, the body keeps picking up, 
not speed, but momentum, which can continually increase because the mass is 
increasing. After a while there is practically no acceleration in the sense of a change 
of velocity, but the momentum continues to increase. Of course, whenever a force 
produces very little change in the velocity of a body, we say that the body has a 
great deal of inertia, and that is exactly what our formula for relativistic mass 
says (see Eq. 15.10) — it says that the inertia is very great when v is nearly as great 
as c. As an example of this effect, to deflect the high-speed electrons in the syn- 
chrotron that is used here at Caltech, we need a magnetic field that is 2000 times 
stronger than would be expected on the basis of Newton’s laws. In other words, 
the mass of the electrons in the synchrotron is 2000 times as great as their normal 
mass, and is as great as that of a proton! That m should be 2000 times m Q means 
that 1 — v 2 /c 2 must be 1/4,000,000, and that means that v 2 /c 2 differs from 1 
by one part in 4,000,000, or that v differs from c by one part in 8,000,000, so the 
electrons are getting pretty close to the speed of light. If the electrons and light 
were both to start from the synchrotron (estimated as 700 feet away) and rush 
out to Bridge Lab, which would arrive first? The light, of course, because light 
always travels faster.* How much earlier? That is too hard to tell — instead, we 
tell by what distance the light is ahead : it is about 1/1000 of an inch, or J the thick- 
ness of a piece of paper! When the electrons are going that fast their masses are 
enormous, but their speed cannot exceed the speed of light. 


* The electrons would actually win the race versus visible light because of the index of 
refraction of air. A gamma ray would make out better. 
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Now let us look at some further consequences of relativistic change of mass. 
Consider the motion of the molecules in a small tank of gas. When the gas is 
heated, the speed of the molecules is increased, and therefore the mass is also 
increased and the gas is heavier. An approximate formula to express the increase 
of mass, for the case when the velocity is small, can be found by expanding 
wioA/1 “ v 2 /c 2 = w 0 ( 1 — v 2 /c 2 )~ 112 in a power series, using the binomial 
theorem. We get 

m 0 ( 1 — v 2 /c 2 )~ 112 = m Q ( 1 + iv 2 /c 2 + f v 4 /c* + ••*)• 

We see clearly from the formula that the series converges rapidly when v is small, 
and the terms after the first two or three are negligible. So we can write 


m ^ m 0 + h 




(15.11) 


in which the second term on the right expresses the increase of mass due to mo- 
lecular velocity. When the temperature increases the v 2 increases proportionately, 
so we can say that the increase in mass is proportional to the increase in tempera- 
ture. But since \m Q v 2 is the kinetic energy in the old-fashioned Newtonian sense, 
we can also say that the increase in mass of all this body of gas is equal to the 
increase in kinetic energy divided by c 2 , or Am = A(K.E .)/c 2 . 


15-9 Equivalence of mass ahd energy 

The above observation led Einstein to the suggestion that the mass of a body 
can be expressed more simply than by the formula (15.1), if we say that the mass 
is equal to the total energy content divided by c 2 . If Eq. (15.1 1) is multiplied by 


the result is 


me 2 = m 0 c 2 + %m 0 v 2 + 


(15.12) 


Here, the term on the left expresses the total energy of a body, and we recognize 
the last term as the ordinary kinetic energy. Einstein interpreted the large constant 
term, m 0 c 2 , to be part of the total energy of the body, an intrinsic energy known 
as the “rest energy.” 

Let us follow out the consequences of assuming, with Einstein, that the 
energy of a body always equals me 2 . As an interesting result, we shall find the 
formula (15.1) for the variation of mass with speed, which we have merely assumed 
up to now. We start with the body at rest, when its energy is m 0 c 2 . Then we 
apply a force to the body, which starts it moving and gives it kinetic energy; 
therefore, since the energy has increased, the mass has increased — this is implicit 
in the original assumption. So long as the force continues, the energy and the mass 
both continue to increase. We have already seen (Chapter 13) that the rate of change 
of energy with time equals the force times the velocity, or 


-= F * v. 


(15.13) 


We also have (Chapter 9, Eq. 9.1) that F = d(mv)/dt . When these relations are 
put together with the definition of E, Eq. (15.13) becomes 


d(mc 2 ) _ d(m\) 

dt V dt 


(15.14) 


We wish to solve this equation for m. To do this we first use the mathematical 
trick of multiplying both sides by 2m , which changes the equation to 


. dm . d(mv) 

c(2m )i = 2w V 


(15.15) 


We need to get rid of the derivatives, which can be accomplished by integrating 



both sides. The quantity (2m) dm/dt can be recognized as the time derivative of 
m 2 , and (2mv) * d(m\)/dt is the time derivative of ( mv ) 2 . So, Eq. (15.15) is the 


same as 

2 d(m 2 ) _ d(m 2 v 2 ) 

c ~ar ~ dt 


(15.16) 


If the derivatives of two quantities are equal, the quantities themselves differ at 
most by a constant, say C. This permits us to write 

m 2 c 2 = m 2 v 2 + c (15.17) 


We need to define the constant C more explicitly. Since Eq. (15.17) must be true 
for all velocities, we can choose a special case where v = 0, and say that in this 
case the mass is m 0 . Substituting these values into Eq. (15.17) gives 


m 2 c 2 — 0 + C. 

We can now use this value of C in Eq. (15.17), which becomes 



This is the formula (15.1), and is exactly what is necessary for the agreement be- 
tween mass and energy in Eq. (15.12). 

Ordinarily these energy changes represent extremely slight changes in mass, 
because most of the time we cannot generate much energy from a given amount 
of material ; but in an atomic bomb of explosive energy equivalent to 20 kilotons 
of TNT, for example, it can be shown that the dirt after the explosion is lighter by 
1 gram than the initial mass of the reacting material, because of the energy that was 
released, i.e., the released energy had a mass of 1 gram, according to the relationship 
A E = A (me 2 ). This theory of equivalence of mass and energy has been beautifully 
verified by experiments in which matter is annihilated — converted totally to energy: 
An electron and a positron come together at rest, each with a rest mass m 0 . When 
they come together they disintegrate and two gamma rays emerge, each with the 
measured energy of m 0 c 2 . This experiment furnishes a direct determination of 
the energy associated with the existence of the rest mass of a particle. 
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16 


Relativistic Energy and Momentum 


16-1 Relativity and the philosophers 

In this chapter we shall continue to discuss the principle of relativity of 
Einstein and Poincare, as it affects our ideas of physics and other branches of 
human thought. 

Poincare made the following statement of the principle of relativity: “Accord- 
ing to the principle of relativity, the laws of physical phenomena must be the same 
for a fixed observer as for an observer who has a uniform motion of translation 
relative to him, so that we have not, nor can we possibly have, any means of 
discerning whether or not we are carried along in such a motion.” 

When this idea descended upon the world, it caused a great stir among philos- 
ophers, particularly the “cocktail-party philosophers,” who say, “Oh, it is very 
simple: Einstein’s theory says all is relative!” In fact, a surprisingly large number 
of philosophers, not only those found at cocktail parties (but rather than embarrass 
them, we shall just call them “cocktail-party philosophers”), will say, “That all 
is relative is a consequence of Einstein, and it has profound influences on our 
ideas.” In addition, they say “It has been demonstrated in physics that phenomena 
depend upon your frame of reference.” We hear that a great deal, but it is difficult 
to find out what it means. Probably the frames of reference that were originally 
referred to were the coordinate systems which we use in the analysis of the theory 
of relativity. So the fact that “things depend upon your frame of reference” is 
supposed to have had a profound effect on modern thought. One might well 
wonder why, because, after all, that things depend upon one’s point of view is so 
simple an idea that it certainly cannot have been necessary to go to all the trouble 
of the physical relativity theory in order to discover it. That what one sees depends 
upon his frame of reference is certainly known to anybody who walks around, 
because he sees an approaching pedestrian first from the front and then from the 
back; there is nothing deeper in most of the philosophy which is said to have come 
from the theory of relativity than the remark that “A person looks different from 
the front than from the back.” The old story about the elephant that several blind 
men describe in different ways is another example, perhaps, of the theory of rela- 
tivity from the philosopher’s point of view. 

But certainly there must be deeper things in the theory of relativity than just 
this simple remark that “A person looks different from the front than from the 
back.” Of course relativity is deeper than this, because we can make definite 
predictions with it. It certainly would be rather remarkable if we could predict the 
behavior of nature from such a simple observation alone. 

There is another school of philosophers who feel very uncomfortable about 
the theory of relativity, which asserts that we cannot determine our absolute 
velocity without looking at something outside, and who would say, “It is obvious 
that one cannot measure his velocity without looking outside. It is self-evident that 
it is meaningless to talk about the velocity of a thing without looking outside; the 
physicists are rather stupid for having thought otherwise, but it has just dawned 
on them that this is the case. If only we philosophers had realized what the prob- 
lems were that the physicists had, we could have decided immediately by brain- 
work that it is impossible to tell how fast one is moving without looking outside, 
and we could have made an enormous contribution to physics.” These philosophers 
are always with us, struggling in the periphery to try to tell us something, but they 
never really understand the subtleties and depths of the problem. 


16-1 Relativity and the philosophers 
16-2 The twin paradox 
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Our inability to detect absolute motion is a result of experiment and not a 
result of plain thought, as we can easily illustrate. In the first place, Newton 
believed that it was true that one could not tell how fast he is going if he is moving 
with uniform velocity in a straight line. In fact, Newton first stated the principle 
of relativity, and one quotation made in the last chapter was a statement of New- 
ton’s, Why then did the philosophers not make all this fuss about “all is relative,” 
or whatever, in Newton’s time? Because it was not until Maxwell’s theory of 
electrodynamics was developed that there were physical laws that suggested that 
one could measure his velocity without looking outside ; soon it was found expert - 
mentally that one could not . 

Now, is it absolutely, definitely, philosophically necessary that one should 
not be able to tell how fast he is moving without looking outside? One of the 
consequences of relativity was the development of a philosophy which said, 
“You can only define what you can measure! Since it is self-evident that one can- 
not measure a velocity without seeing what he is measuring it relative to, therefore 
it is clear that there is no meaning to absolute velocity. The physicists should have 
realized that they can talk only about what they can measure.” But that is the whole 
problem: whether or not one can define absolute velocity is the same as the problem 
of whether or not one can detect in an experiment , without looking outside, whether 
he is moving. In other words, whether or not a thing is measurable is not something 
to be decided a priori by thought alone, but something that can be decided only 
by experiment. Given the fact that the velocity of light is 186,000 mi/sec, one will 
find few philosophers who will calmly state that it is self-evident that if light goes 
186,000 mi/sec inside a car, and the car is going 100,000 mi/sec, that the light 
also goes 186,000 mi/sec past an observer on the ground. That is a shocking 
fact to them; the very ones who claim it is obvious find, when you give them a 
specific fact, that it is not obvious. 

Finally, there is even a philosophy which says that one cannot detect any 
motion except by looking outside. It is simply not true in physics. True, one can- 
not perceive a uniform motion in a straight line , but if the whole room were ro- 
tating we would certainly know it, for everybody would be thrown to the wall — 
there would be all kinds of “centrifugal” effects. That the earth is turning on its 
axis can be determined without looking at the stars, by means of the so-called 
Foucault pendulum, for example. Therefore it is not true that “all is relative”; 
it is only uniform velocity that cannot be detected without looking outside. Uniform 
rotation about a fixed axis can be. When this is told to a philosopher, he is very 
upset that he did not really understand it, because to him it seems impossible that 
one should be able to determine rotation about an axis without looking outside. 
If the philosopher is good enough, after some time he may come back and say, 
“I understand. We really do not have such a thing as absolute rotation; we are 
really rotating relative to the stars , you see. And so some influence exerted by the 
stars on the object must cause the centrifugal force.” 

Now, for all we know, that is true; we have no way, at the present time, of 
telling whether there would have been centrifugal force if there were no stars and 
nebulae around. We have not been able to do the experiment of removing all the 
nebulae and then measuring our rotation, so we simply do not know. We must 
admit that the philosopher may be right. He comes back, therefore, in delight and 
says, “It is absolutely necessary that the world ultimately turn out to be this way: 
absolute rotation means nothing; it is only relative to the nebulae.” Then we say 
to him, “Wow, my friend, is it or is it not obvious that uniform velocity in a straight 
line, relative to the nebulae should produce no effects inside a car?” Now that the 
motion is no longer absolute, but is a motion relative to the nebulae , it becomes a 
mysterious question, and a question that can be answered only by experiment. 

What, then, are the philosophic influences of the theory of relativity? If we 
limit ourselves to influences in the sense of what kind of new ideas and suggestions 
are made to the physicist by the principle of relativity, we could describe some of 
them as follows. The first discovery is, essentially, that even those ideas which 
have been held for a very long time and which have been very accurately verified 
might be wrong. It was a shocking discovery, of course, that Newton’s laws are 
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wrong, after all the years in which they seemed to be accurate. Of course it is 
clear, not that the experiments were wrong, but that they were done over only a 
limited range of velocities, so small that the relativistic effects would not have 
been evident. But nevertheless, we now have a much more humble point of view 
of our physical laws — everything can be wrong! 

Secondly, if we have a set of “strange” ideas, such as that time goes slower 
when one moves, and so forth, whether we like them or do not like them is an 
irrelevant question. The only relevant question is whether the ideas are consistent 
with what is found experimentally. In other words, the “strange ideas” need 
only agree with experiment , and the only reason that we have to discuss the be- 
havior of clocks and so forth is to demonstrate that although the notion of the 
time dilation is strange, it is consistent with the way we measure time. 

Finally, there is a third suggestion which is a little more technical but which 
has turned out to be of enormous utility in our study of other physical laws, and 
that is to look at the symmetry of the laws or, more specifically, to look for the 
ways in which the laws can be transformed and leave their form the same. When 
we discussed the theory of vectors, we noted that the fundamental laws of motion 
are not changed when we rotate the coordinate system, and now we learn that they 
are not changed when we change the space and time variables in a particular way, 
given by the Lorentz transformation. So this idea of studying the patterns or 
operations under which the fundamental laws are not changed has proved to be a 
very useful one. 

16-2 The twin paradox 

To continue our discussion of the Lorentz transformation and relativistic 
effects, we consider a famous so-called “paradox” of Peter and Paul, who are 
supposed to be twins, born at the same time. When they are old enough to drive a 
space ship, Paul flies away at very high speed. Because Peter, who is left on the 
ground, sees Paul going so fast, all of Paul’s clocks appear to go slower, his heart 
beats go slower, his thoughts go slower, everything goes slower, from Peter’s 
point of view. Of course, Paul notices nothing unusual, but if he travels around 
and about for a while and then comes back, he will be younger than Peter, the man 
on the ground! That is actually right; it is one of the consequences of the theory 
of relativity which has been clearly demonstrated. Just as the mu-mesons last 
longer when they are moving, so also will Paul last longer when he is moving. This 
is called a “paradox” only by the people who believe that the principle of relativity 
means that all motion is relative; they say, “Heh, heh, heh, from the point of view 
of Paul, can’t we say that Peter was moving and should therefore appear to age 
more slowly? By symmetry, the only possible result is that both should be the 
same age when they meet.” But in order for them to come back together and make 
the comparison, Paul must either stop at the end of the trip and make a comparison 
of clocks or, more simply, he has to come back, and the one who comes back 
must be the man who was moving, and he knows this, because he had to turn 
around. When he turned around, all kinds of unusual things happened in his 
space ship — the rockets went off, things jammed up against one wall, and so on — 
while Peter felt nothing. 

So the way to state the rule is to say that the man who has felt the accelerations , 
who has seen things fall against the walls, and so on, is the one who would be the 
younger; that is the difference between them in an “absolute” sense, and it is 
certainly correct. When we discussed the fact that moving mu-mesons live longer, 
we used as an example their straight-line motion in the atmosphere. But we can 
also make mu-mesons in a laboratory and cause them to go in a curve with a 
magnet, and even under this accelerated motion, they last exactly as much longer 
as they do when they are moving in a straight line. Although no one has arranged 
an experiment explicitly so that we can get rid of the paradox, one could compare 
a mu-meson which is left standing with one that had gone around a complete 
circle, and it would surely be found that the one that went around the circle lasted 
longer. Although we have not actually carried out an experiment using a complete 
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circle, it is really not necessary, of course, because everything fits together all right. 
This may not satisfy those who insist that every single fact be demonstrated directly, 
but we confidently predict the result of the experiment in which Paul goes in a 
complete circle. 


16-3 Transformation of velocities 

The main difference between the relativity of Einstein and the relativity of 
Newton is that the laws of transformation connecting the coordinates and times 
between relatively moving systems are different. The correct transformation law, 
that of Lorentz, is 

, x — ut 

x = . = > 

V 1 — u 2 /c 2 

/ = y, 

z' = z, (16.1) 

f _ * ~ ux/c 2 
\/l “ u 2 /c 2 

These equations correspond to the relatively simple case in which the relative 
motion of the two observers is along their common jc-axes. Of course other direc- 
tions of motion are possible, but the most general Lorentz transformation is 
rather complicated, with all four quantities mixed up together. We shall continue 
to use this simpler form, since it contains all the essential features of relativity. 

Let us now discuss more of the consequences of this transformation. First, 
it is interesting to solve these equations in reverse. That is, here is a set of linear 
equations, four equations with four unknowns, and they can be solved in reverse, 
for x, y , z, t in terms of x\ y', z', The result is very interesting, since it tells us 
how a system of coordinates “at rest” looks from the point of view of one that is 
“moving.” Of course, since the motions are relative and of uniform velocity, the 
man who is “moving” can say, if he wishes, that it is really the other fellow who is 
moving and he himself who is at rest. And since he is moving in the opposite 
direction, he should get the same transformation, but with the opposite sign of 
velocity. That is precisely what we find by manipulation, so that is consistent. 
If it did not come out that way, we would have real cause to worry! 

x f + ut r 

x = . - — > 

V 1 — u 2 /c 2 

y = ?> 

z = z', (16.2) 

I __ + ux f fc 2 

\/ 1 — m" 2 /c 2 

Next we discuss the interesting problem of the addition of velocities in rela- 
tivity. We recall that one of the original puzzles was that light travels at 186,000 
mi/sec in all systems, even when they are in relative motion. This is a special case 
of the more general problem exemplified by the following. Suppose that an object 
inside a space ship is going at 100,000 mi /sec and the space ship itself is going at 
100,000 mi/sec; how fast is the object inside the space ship moving from the point 
of view of an observer outside? We might want to say 200,000 mi/sec, which is 
faster than the speed of light. This is very unnerving, because it is not supposed 
to be going faster than the speed of light! The general problem is as follows. 

Let us suppose that the object inside the ship, from the point of view of the 
man inside, is moving with velocity v, and that the space ship itself has a velocity 
u with respect to the ground. We want to know with what velocity v x this object 
is moving from the point of view of the man on the ground. This is, of course, still 
but a special case in which the motion is in the ^-direction. There will also be a 
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transformation for velocities in the j-direction, or for any angle; these can be 
worked out as needed. Inside the space ship the velocity is v X ', which means that 
the displacement x is equal to the velocity times the time: 

*' = v X ’f. (16.3) 


Now we have only to calculate what the position and time are from the point of 
view of the outside observer for an object which has the relation (16.2) between 
x' and t f . So we simply substitute (16.3) into (16.2), and obtain 


v X ’t ; + uf 
Vl - u*/c* 


06.4) 


But here we find x expressed in terms of In order to get the velocity as seen by 
the man on the outside, we must divide his distance by his time , not by the other 
man's time! So we must also calculate the time as seen from the outside, which is 


f = t f + u(v x >t f )/c 2 ^ 
\/l — U z /c 2 

Now we must find the ratio of x to t , which is 

_ X _ u + v x - 
v * - 7 " l 4- Htv/c 2 ’ 


(16.5) 


(16.6) 


the square roots having cancelled. This is the law that we seek: the resultant ve- 
locity, the “summing” of two velocities, is not just the algebraic sum of two veloc- 
ities (we know that it cannot be or we get in trouble), but is “corrected” by 
1 + uv/c 2 . 

Now let us see what happens. Suppose that you are moving inside the space 
ship at half the speed of light, and that the space ship itself is going at half the speed 
of light. Thus u is \c and v is ^c, but in the denominator uv is one-fourth, so that 

_ jc + he _ 4 c 
- T+T - 5 

So, in relativity, “half” and “half” does not make “one,” it makes only “4/5.” 
Of course low velocities can be added quite easily in the familiar way, because so 
long as the velocities are small compared with the speed of light we can forget 
about the (1 + uv/c 2 ) factor; but things are quite different and quite interesting 
at high velocity. 

Let us take a limiting case. Just for fun, suppose that inside the space ship the 
man was observing light itself. In other words, v = c, and yet the space ship is 
moving. How will it look to the man on the ground? The answer will be 

u + c u + c 

V = T ~ i 71> = C j = C. 

1 + uc/c 2 u + c 


Therefore, if something is moving at the speed of light inside the ship, it will appear 
to be moving at the speed of light from the point of view of the man on the ground 
too! This is good, for it is, in fact, what the Einstein theory of relativity was 
designed to do in the first place — so it had better work! 

Of course, there are cases in which the motion is not in the direction of the 
uniform translation. For example, there may be an object inside the ship which 
is just moving “upward” with the velocity v y * with respect to the ship, and the ship 
is moving “horizontally.” Now, we simply go through the same thing, only using 
y* s instead of x’s, with the result 


so that if v x > = 0, 


y = / = Vy’S, 


(16.7) 


Vy — j = iy Vl — u 2 /c 2 . 
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Fig 16-1. Trajectories described by 
a light ray and particle inside a moving 
clock. 



Fig. 16-2. Two views of an elastic 
collision between equal objects moving 
at the same speed in opposite directions. 


Thus a sidewise velocity is no longer ty , but v y >\/Y~^uFJc^. We found this result 
by substituting and combining the transformation equations, but we can also see 
the result directly from the principle of relativity for the following reason (it is 
always good to look again to see whether we can see the reason). We have already 
(Fig. 15-3) discussed how a possible clock might work when it is moving; the 
light appears to travel at an angle at the speed c in the fixed system, while it simply 
goes vertically with the same speed in the moving system. We found that the 
vertical component of the velocity in the fixed system is less than that of light by the 
factor \/\ — u 2 /c% (see Eq. 15-3). But now suppose that we let a material 
particle go back and forth in this same “clock,” but at some integral fraction 
l/n of the speed of light (Fig. 16-1). Then when the particle has gone back and 
forth once, the light will have gone exactly n times. That is, each “click” of the 
“particle” clock will coincide with each nth “click” of the light clock. This fact 
must still be true when the whole system is moving , because the physical phenomenon 
of coincidence will be a coincidence in any frame. Therefore, since the speed c y is 
less than the speed of light, the speed v y of the particle must be slower than the 
corresponding speed by the same square-root ratio! That is why the square root 
appears in any vertical velocity. 


16-4 Relativistic mass 

We learned in the last chapter that the mass of an object increases with 
velocity, but no demonstration of this was given, in the sense that we made no 
arguments analogous to those about the way clocks have to behave. However, 
we can show that, as a consequence of relativity plus a few other reasonable assump- 
tions, the mass must vary in this way. (We have to say “a few other assumptions” 
because we cannot prove anything unless we have some laws which we assume to 
be true, if we expect to make meaningful deductions.) To avoid the need to study 
the transformation laws of force, we shall analyze a collision , where we need know 
nothing about the laws of force, except that we shall assume the conservation of 
momentum and energy. Also, we shall assume that the momentum of a particle 
which is moving is a vector and is always directed in the direction of the velocity. 
However, we shall not assume that the momentum is a constant times the velocity, 
as Newton did, but only that it is some function of velocity. We thus write the 
momentum vector as a certain coefficient times the vector velocity: 

p = m t y. (16.8) 

We put a subscript v on the coefficient to remind us that it is a function of velocity, 
and we shall agree to call this coefficient m v the “mass.” Of course, when the 
velocity is small, it is the same mass that we would measure in the slow-moving 
experiments that we are used to. Now we shall try to demonstrate that the formula 
for m v must be m Q /\/ 1 — v 2 /c‘f by arguing from the principle of relativity that 
the laws of physics must be the same in every coordinate system. 

Suppose that we have two particles, like two protons, that are absolutely 
equal, and they are moving toward each other with exactly equal velocities. Their 
total momentum is zero. Now what can happen? After the collision, their direc- 
tions of motion must be exactly opposite to each other, because if they are not 
exactly opposite, there will be a nonzero total vector momentum, and momentum 
would not have been conserved. Also they must have the same speeds, since they 
are exactly similar objects ; in fact, they must have the same speed they started with, 
since we suppose that the energy is conserved in these collisions. So the diagram of 
an elastic collision, a reversible collision, will look like Fig. 16-2(a): all the arrows 
are the same length, all the speeds are equal. We shall suppose that such collisions 
can always be arranged, that any angle 6 can occur, and that any speed could be 
used in such a collision. Next, we notice that this same collision can be viewed 
differently by turning the axes, and just for convenience we shall turn the axes, 
so that the horizontal splits it evenly, as in Fig. 16-2(b). It is the same collision 
redrawn, only with the axes turned. 
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Now here is the real trick: let us look at this collision from the point of view 
of someone riding along in a car that is moving with a speed equal to the horizontal 
component of the velocity of one particle. Then how does the collision look? 
It looks as though particle 1 is just going straight up, because it has lost its hori- 
zontal component, and it comes straight down again, also because it does not have 
that component. That is, the collision appears as shown in Fig. 16-3 (a). Particle 2, 
however, was going the other way, and as we ride past it appears to fly by at some 
terrific speed and at a smaller angle, but we can appreciate that the angles before 
and after the collision are the same. Let us denote by u the horizontal component 
of the velocity of particle 2, and by w the vertical velocity of particle 1. 

Now the question is, what is the vertical velocity u tan a? If we knew that, we 
could get the correct expression for the momentum, using the law of conservation 
of momentum in the vertical direction. Clearly, the horizontal component of the 
momentum is conserved: it is the same before and after the collision for both 
particles, and is zero for particle 1. So we need use the conservation law only for 
the upward velocity u tan a. But we can get the upward velocity, simply by looking 
at the same collision going the other way! If we look at the collision of Fig. 
16-3(a) from a car to the left moving with speed w, we see the same collision, except 
“turned over,” as shown in Fig. 16— 3(b). Now particle 2 is the one that goes up 
and down with speed w, and particle 1 has picked up the horizontal speed u. Of 
course, now we know what the velocity u tan a is: it is w\/ 1 — u 2 /c 2 (see Eq. 16.7). 
We know that the change in the vertical momentum of the vertically moving par- 
ticle is 

A p = 2 m w w 



Fig. 1 6-3. Two more views of the 
collision, from moving cars. 


(2, because it moves up and back down). The obliquely moving particle has a 
certain velocity v whose components we have found to be u and w\J 1 — u 2 /c 2 > 
and whose mass is m v . The change in vertical momentum of this particle is therefore 
Ap f = 2m v w\/i — u^fc 2 because, in accordance with our assumed law (16.8), 
the momentum component is always the mass corresponding to the magnitude of 
the velocity times the component of the velocity in the direction of interest. Thus 
in order for the total momentum to be zero the vertical momenta must cancel and 
the ratio of the mass moving with speed v and the mass moving with speed >v 
must therefore be 

^ = Vi - k 2 /c 2 . (16.9) 

m v 


Let us take the limiting case that w is infinitesimal. If w is very tiny indeed, it 
is clear that v and u are practically equal. In this case, m w — > m 0 and m v — > m u . 
The grand result is 


m u 


m 0 

Vl — u 2 /c 2 


(16.10) 


It is an interesting exercise now to check whether or not Eq. (16.9) is indeed true for 
arbitrary values of w, assuming that Eq. (16.10) is the right formula for the mass. 
Note that the velocity v needed in Eq. (16.9) can be calculated from the right-angle 
triangle: 

v 2 = u 2 + w 2 ( 1 — w 2 /c 2 ). 


\ W ^ _ 

m 0 

(a) M 


- 2 - BEFORE 

m q 

AFTER 



Fig. 16-4. Two views of an inelastic 
collision between equally massive objects. 


It will be found to check out automatically, although we used it only in the limit 
of small w. 

Now, let us accept that momentum is conserved and that the mass depends 
upon the velocity according to (16.10) and go on to find what else we can conclude. 
Let us consider what is commonly called an inelastic collision. For simplicity, 
we shall suppose that two objects of the same kind, moving oppositely with equal 
speeds w, hit each other and stick together, to become some new, stationary object, 
as shown in Fig. 16-4(a). The mass m of each corresponds to w, which, as we 
know, is m Q /\/l — w 2 /c 2 . If we assume the conservation of momentum and 
the principle of relativity, we can demonstrate an interesting fact about the mass 
of the new object which has been formed. We imagine an infinitesimal velocity 
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u at right angles to w (we can do the same with finite values of u , but it is easier to 
understand with an infinitesimal velocity), then look at this same collision as we 
ride by in an elevator at the velocity — u. What we see is shown in Fig. 16-4(b). 
The composite object has an unknown mass M. Now object 1 moves with an 
upward component of velocity u and a horizontal component which is practically 
equal to h>, and so also does object 2. After the collision we have the mass M 
moving upward with velocity w, considered very small compared with the speed of 
light, and also small compared with w. Momentum must be conserved, so let us 
estimate the momentum in the upward direction before and after the collision. 
Before the collision we have p ~ 2 m w u, and after the collision the momentum is 
evidently p f = M u u , but M u is essentially the same as M 0 because u is so small. 
These momenta must be equal because of the conservation of momentum, and there- 
fore 

M 0 = 2m w . (16.11) 

The mass of the object which is formed when two equal objects collide must be twice 
the mass of the objects which come together. You might say, “Yes, of course, 
that is the conservation of mass.” But not “Yes, of course,” so easily, because 
these masses have been enhanced over the masses that they would be if they were 
standing still, yet they still contribute, to the total Af, not the mass they have when 
standing still, but more . Astonishing as that may seem, in order for the conser- 
vation of momentum to work when two objects come together, the mass that 
they form must be greater than the rest masses of the objects, even though the 
objects are at rest after the collision! 

16-5 Relativistic energy 

In the last chapter we demonstrated that as a result of the dependence of the 
mass on velocity and Newton’s laws, the changes in the kinetic energy of an object 
resulting from the total work done by the forces on it always comes out to be 

A7 1 = (m u - m 0 )c 2 = — ° - m 0 c 2 . (16.12) 

VI — u 2 /c 2 

We even went further, and guessed that the total energy is the total mass times c 2 . 
Now we continue this discussion. 

Suppose that our two equally massive objects that collide can still be “seen” 
inside Af. For instance, a proton and a neutron are “stuck together,” but are still 
moving about inside of Af. Then, although we might at first expect the mass Af 
to be 2m o, we have found that it is not 2m 0 , but 2 m w . Since 2 m w is what is put in, 
but 2m o are the rest masses of the things inside, the excess mass of the composite 
object is equal to the kinetic energy brought in. This means, of course, that 
energy has inertia . In the last chapter we discussed the heating of a gas, and showed 
that because the gas molecules are moving and moving things are heavier, when 
we put energy into the gas its molecules move faster and so the gas gets heavier. 
But in fact the argument is completely general, and our discussion of the inelastic 
collision shows that the mass is there whether or not it is kinetic energy. In other 
words, if two particles come together and produce potential or any other form of 
energy; if the pieces are slowed down by climbing hills, doing work against internal 
forces, or whatever; then it is still true that the mass is the total energy that has 
been put in. So we see that the conservation of mass which we have deduced above 
is equivalent to the conservation of energy, and therefore there is no place in the 
theory of relativity for strictly inelastic collisions, as there was in Newtonian 
mechanics. According to Newtonian mechanics it is all right for two things to 
collide and so form an object of mass 2m 0 which is in no way distinct from the one 
that would result from putting them together slowly. Of course we know from the 
law of conservation of energy that there is more kinetic energy inside, but that 
does not affect the mass, according to Newton’s laws. But now we see that this is 
impossible; because of the kinetic energy involved in the collision, the resulting 
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object will be heavier; therefore, it will be a different object. When we put 
the objects together gently they make something whose mass is 2 m 0 ; when 
we put them together forcefully, they make something whose mass is greater. 
When the mass is different, we can tell that it is different. So, necessarily, the 
conservation of energy must go along with the conservation of momentum in the 
theory of relativity. 

This has interesting consequences. For example, suppose that we have an 
object whose mass M is measured, and suppose something happens so that it flies 
into two equal pieces moving with speed w f so that they each have a mass m w . 
Now suppose that these pieces encounter enough material to slow them up 
until they stop; then they will have mass m 0 . How much energy will they 
have given to the material when they have stopped? Each will give an amount 
(m w — m 0 )c 2 , by the theorem that we proved before. This much energy is left 
in the material in some form, as heat, potential energy, or whatever. Now 2 m w = 
M, so the liberated energy is E = (M — 2 m 0 )c 2 . This equation was used to 
estimate how much energy would be liberated under fission in the atomic bomb, 
for example. (Although the fragments are not exactly equal, they are nearly equal.) 
The mass of the uranium atom was known — it had been measured ahead of time — 
and the atoms into which it split, iodine, xenon, and so on, all were of known mass. 
By masses, we do not mean the masses while the atoms are moving, we mean the 
masses when the atoms are at rest . In other words, both M and m Q are known. 
So by subtracting the two numbers one can calculate how much energy will be 
released if M can be made to split in “half.” For this reason poor old Einstein 
was called the “father” of the atomic bomb in all the newspapers. Of course, 
all that meant was that he could tell us ahead of time how much energy would be 
released if we told him what process would occur. The energy that should be 
liberated when an atom of uranium undergoes fission was estimated about six 
months before the first direct test, and as soon as the energy was in fact liberated, 
someone measured it directly (and if Einstein’s formula had not worked, they would 
have measured it anyway), and the moment they measured it they no longer needed 
the formula. Of course, we should not belittle Einstein, but rather should criticize 
the newspapers and many popular descriptions of what causes what in the history 
of physics and technology. The problem of how to get the thing to occur in an 
effective and rapid manner is a completely different matter. 

The result is just as significant in chemistry. For instance, if we were to weigh 
the carbon dioxide molecule and compare its mass with that of the carbon and 
the oxygen, we could find out how much energy would be liberated when carbon 
and oxygen form carbon dioxide. The only trouble here is that the differences in 
masses are so small that it is technically very difficult to do. 

Now let us turn to the question of whether we should add m 0 c 2 to the kinetic 
energy and say from now on that the total energy of an object is me 2 . First, if 
we can still see the component pieces of rest mass m 0 inside M, then we could 
say that some of the mass M of the compound object is the mechanical rest mass 
of the parts, part of it is kinetic energy of the parts, and part of it is potential 
energy of the parts. But we have discovered, in nature, particles of various kinds 
which undergo reactions just like the one we have treated above, in which with all 
the study in the world, we cannot see the parts inside. For instance, when a A-meson 
disintegrates into two pions it does so according to the law (16.11), but the idea 
that a A' is made out of 2 7r’s is a useless idea, because it also disintegrates into 3 7r’s! 

Therefore we have a new idea: we do not have to know what things are made 
of inside; we cannot and need not identify, inside a particle, which of the energy 
is rest energy of the parts into which it is going to disintegrate. It is not convenient 
and often not possible to separate the total me 2 energy of an object into rest energy 
of the inside pieces, kinetic energy of the pieces, and potential energy of the pieces; 
instead, we simply speak of the total energy of the particle. We “shift the origin” 
of energy by adding a constant m 0 c 2 to everything, and say that the total energy of a 
particle is the mass in motion times c 2 , and when the object is standing still, the 
energy is the mass at rest times c 2 . 
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Finally, we find that the velocity v, momentum P, and total energy E are re- 
lated in a rather simple way. That the mass in motion at speed v is the mass m 0 
at rest divided by \/l — v 2 /c 2 , surprisingly enough, is rarely used. Instead, the 
following relations are easily proved, and turn out to be very useful: 

E 2 - P 2 c 2 = mgc 4 (16.13) 

and 

Pc = Ev/c. (16.14) 
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17 


Space-Time 


17-1 The geometry of space-time 


The theory of relativity shows us that the relationships of positions and times 
as measured in one coordinate system and another are not what we would have 
expected on the basis of our intuitive ideas. It is very important that we thoroughly 
understand the relations of space and time implied by the Lorentz transformation, 
and therefore we shall consider this matter more deeply in this chapter. 

The Lorentz transformation between the positions and times (x, y, z, t) as 
measured by an observer “standing still,” and the corresponding coordinates and 
time (x', y r , z r , t' ) measured inside a “moving” space ship, moving with velocity 
u are 


x - ut 

Vl - « 2 /c 2 ’ 

/ = y, 

z' = z, 

, _ t — ux/c 2 
- Vl - «2/c 2 ' 


(17.1) 


Let us compare these equations with Eq. (11.5), which also relates measurements 
in two systems, one of which in this instance is rotated relative to the other: 


x f = x cos 6 + y sin 6 , 

/ = y cos 6 — x sin 8, (17.2) 

z r — z. 
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In this particular case, Moe and Joe are measuring with axes having an angle 6 
between the x'- and x-axes. In each case, we note that the “primed” quantities are 
“mixtures” of the “unprimed” ones: the new x f is a mixture of x and y, and the 
new y f is also a mixture of x and y . 

An analogy is useful: When we look at an object, there is an obvious thing we 
might call the “apparent width,” and another we might call the “depth.” But the 
two ideas, width and depth, are not fundamental properties of the object, because 
if we step aside and look at the same thing from a different angle, we get a different 
width and a different depth, and we may develop some formulas for computing the 
new ones from the old ones and the angles involved. Equations (17.2) are these 
formulas. One might say that a given depth is a kind of “mixture” of all depth 
and all width. If it were impossible ever to move, and we always saw a given 
object from the same position, then this whole business would be irrelevant — we 
would always see the “true” width and the “true” depth, and they would appear to 
have quite different qualities, because one appears as a subtended optical angle 
and the other involves some focusing of the eyes or even intuition ; they would seem 
to be very different things and would never get mixed up. It is because we can walk 
around that we realize that depth and width are, somehow or other, just two differ- 
ent aspects of the same thing. 

Can we not look at the Lorentz transformations in the same way ? Here also we 
have a mixture — of positions and the time. A difference between a space measure- 
ment and a time measurement produces a new space measurement. In other words, 
in the space measurements of one man there is mixed in a little bit of the time, as 
seen by the other. Our analogy permits us to generate this idea: The “reality” of 
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Fig. 17-1. Three particle paths in 
space-time: (a) a particle at rest at 
x — x 0 ; (b) a particle which starts at 
x = x 0 and moves with constant speed; 
(c) a particle which starts at high speed 
but slows down. 



(o) NOT CORRECT (b) CORRECT 


Fig. 17-2. Two views of a disinte- 
grating particle. 


an object that we are looking at is somehow greater (speaking crudely and intui- 
tively) than its “width” and its “depth” because they depend upon how we look 
at it; when we move to a new position, our brain immediately recalculates the 
width and the depth. But our brain does not immediately recalculate coordinates 
and time when we move at high speed, because we have had no effective experience 
of going nearly as fast as light to appreciate the fact that time and space are also 
of the same nature. It is as though we were always stuck in the position of having 
to look at just the width of something, not being able to move our heads appreci- 
ably one way or the other; if we could, we understand now, we would see some of 
the other man’s time — we would see “behind,” so to speak, a little bit. 

Thus we shall try to think of objects in a new kind of world, of space and time 
mixed together, in the same sense that the objects in our ordinary space-world 
are real, and can be looked at from different directions. We shall then consider 
that objects occupying space and lasting for a certain length of time occupy a kind 
of a “blob” in a new kind of world, and that we look at this “blob” from different 
points of view when we are moving at different velocities. This new world, this 
geometrical entity in which the “blobs” exist by occupying position and taking up a 
certain amount of time, is called space-time . A given point (x, y , z, /) in space-time 
is called an event. Imagine, for example, that we plot the x-positions horizontally, 
y and z in two other directions, both mutually at “right angles” and at “right 
angles” to the paper (!), and time, vertically. Now, how does a moving particle, 
say, look on such a diagram? If the particle is standing still, then it has a certain 
x , and as time goes on, it has the same x , the same x , the same x; so its “path” is 
a line that runs parallel to the f-axis (Fig. 17-1 a). On the other hand, if it drifts 
outward, then as the time goes on x increases (Fig. 17-1 b). So a particle, for ex- 
ample, which starts to drift out and then slows up should have a motion something 
like that shown in Fig. 1 7—1 (c). A patricle, in other words, which is permanent 
and does not disintegrate is represented by a line in space-time. A particle which 
disintegrates would be represented by a forked line, because it would turn into 
two other things which would start from that point. 

What about light? Light travels at the speed c, and that would be represented 
by a line having a certain fixed slope (Fig. 17-1 d). 

Now according to our new idea, if a given event occurs to a particle, say if it 
suddenly disintegrates at a certain space-time point into two new ones which follow 
some new tracks, and this interesting event occurred at a certain value of x and a 
certain value of t , then we would expect that, if this makes any sense, we just have 
to take a new pair of axes and turn them, and that will give us the new t and the 
new x in our new system, as shown in Fig. 17-2(a). But this is wrong, because 
Eq. (17.1) is not exactly the same mathematical transformation as Eq. (17.2). 
Note, for example, the difference in sign between the two, and the fact that one is 
written in terms of cos d and sin 6 , while the other is written with algebraic quanti- 
ties. (Of course, it is not impossible that the algebraic quantities could be written as 
cosine and sine, but actually they cannot.) But still, the two expressions are very 
similar. As we shall see, it is not really possible to think of space-time as a real, 
ordinary geometry because of that difference in sign. In fact, although we shall 
not emphasize this point, it turns out that a man who is moving has to use a set of 
axes which are inclined equally to the light ray, using a special kind of projection 
parallel to the x f - and axes, for his x f and as shown in Fig. 17-2(b). We shall 
not deal with the geometry, since it does not help much; it is easier to work with 
the equations. 

17-2 Space-time intervals 

Although the geometry of space-time is not Euclidean in the ordinary sense, 
there is a geometry which is very similar, but peculiar in certain respects. If this 
idea of geometry is right, there ought to be some functions of coordinates and time 
which are independent of the coordinate system. For example, under ordinary 
rotations, if we take two points, one at the origin, for simplicity, and the other one 
somewhere else, both systems would have the same origin, and the distance from 
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here to the other point is the same in both. That is one property that is inde- 
pendent of the particular way of measuring it. The square of the distance is 
x 2 + y 2 + z 2 . Now what about space-time? It is not hard to demonstrate that 
we have here, also, something which stays the same, namely, the combination 
c 2 t 2 — x 2 — y 2 — z 2 is the same before and after the transformation: 

c 2 /' 2 — x' 2 — y' 2 — z' 2 = c 2 t 2 — x 2 — y 2 — z 2 . (17.3) 


This quantity is therefore something which, like the distance, is “real” in some 
sense; it is called the interval between the two space-time points, one of which is, 
in this case, at the origin. (Actually, of course, it is the interval squared, just as 
x 2 + y 2 + z 2 is the distance squared.) We give it a different name because it is 
in a different geometry, but the interesting thing is only that some signs are reversed 
and there is a c in it. 

Let us get rid of the c; that is an absurdity if we are going to have a wonderful 
space with Jt’s and y 9 s that can be interchanged. One of the confusions that could 
be caused by someone with no experience would be to measure widths, .say, by the 
angle subtended at the eye, and measure depth in a different way, like the strain on 
the muscles needed to focus them, so that the depths would be measured in feet 
and the widths in meters. Then one would get an enormously complicated mess of 
equations in making transformations such as (17.2), and would not be able to see 
the clarity and simplicity of the thing for a very simple technical reason, that the 
same thing is being measured in two different units. Now in Eqs. (17.1) and (17.3) 
nature is telling us that time and space are equivalent; time becomes space; they 
should be measured in the same units . What distance is a “second”? It is easy to 
figure out from (17.3) what it is. It is 3 X 10 8 meters, the distance that light would 
go in one second. In other words, if we were to measure all distances and times 
in the same units, seconds, then our unit of distance would be 3 X 10 8 meters, 
and the equations would be simpler. Or another way that we could make the units 
equal is to measure time in meters. What is a meter of time? A meter of time 
is the time it takes for light to go one meter, and is therefore 1/3 X 10~ 8 sec, or 
3.3 billionths of a second! We would like, in other words, to put all our equations 
in a system of units in which c = 1. If time and space are measured in the same 
units, as suggested, then the equations are obviously much simplified. They are 


x = 


x — ut 


Vl - W 2 

/ = y* 


z' = z, 


f - 


t — ux 


Vl — u 2 

t' 2 - *' 2 - y' 2 - z' 2 « t 2 - x 2 - y 2 - z 2 . 


(17.4) 


(17.5) 


If we are ever unsure or “frightened” that after we have this system with c — 1 
we shall never be able to get our equations right again, the answer is quite the 
opposite. It is much easier to remember them without the c’s in them, and it is 
always easy to put the c’s back, by looking after the dimensions. For instance, in 
Vl “ W 2 , we know that we cannot subtract a velocity squared, which has units, 
from the pure number 1, so we know that we must divide u 2 by c 2 in order to make 
that unitless, and that is the way it goes. 

The difference between space-time and ordinary space, and the character of 
an interval as related to the distance, is very interesting. According to formula 
(17.5), if we consider a point which in a given coordinate system had zero time, and 
only space, then the interval squared would be negative and we would have an 
imaginary interval, the square root of a negative number. Intervals can be either 
real or imaginary in the theory. The square of an interval may be either positive 
or negative, unlike distance, which has a positive square. When an interval is 
imaginary, we say that the two points have a space-like interval between them 
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(instead of imaginary), because the interval is more like space than like time. 
On the other hand, if two objects are at the same place in a given coordinate system, 
but differ only in time, then the square of the time is positive and the distances are 
zero and the interval squared is positive; this is called a time-like interval. In our 
diagram of space-time, therefore, we would have a representation something like 
this: at 45° there are two lines (actually, in four dimensions these will be “cones,” 
called light cones) and points on these lines are all at zero interval from the origin. 
Where light goes from a given point is always separated from it by a zero interval, 
as we see from Eq. (17.5). Incidentally, we have just proved that if light travels 
with speed c in one system, it travels with speed c in another, for if the interval is 
the same in both systems, i.e., zero in one and zero in the other, then to state that 
the propagation speed of light is invariant is the same as saying that the interval 
is zero. 


17-3 Past, present, and future 



Fig. 17-3. 
surrounding a 


The space-time 
point at the origin. 


region 


The space-time region surrounding a given space-time point can be separated 
into three regions, as shown in Fig. 17-3. In one region we have space-like inter- 
vals, and in two regions, time-like intervals. Physically, these three regions into 
which space-time around a given point is divided have an interesting physical 
relationship to that point: a physical object or a signal can get from a point in 
region 2 to the event 0 by moving along at a speed less than the speed of light. 
Therefore events in this region can affect the point 0, can have an influence on it 
from the past. In fact, of course, an object at P on the negative t-axis is precisely 
in the “past” with respect to 0; it is the same space-point as 0, only earlier. What 
happened there then, affects 0 now. (Unfortunately, that is the way life is.) An- 
other object at Q can get to 0 by moving with a certain speed less than c, so if this 
object were in a space ship and moving, it would be, again, the past of the same 
space-point. That is, in another coordinate system, the axis of time might go 
through both 0 and Q. So all points of region 2 are in the “past” of 0 , and any- 
thing that happens in this region can affect 0. Therefore region 2 is sometimes 
called the affective past , or affecting past; it is the locus of all events which can 
affect point 0 in any way. 

Region 3, on the other hand, is a region which we can affect from 0, we can 
“hit” things by shooting “bullets” out at speeds less than c. So this is the world 
whose future can be affected by us, and we may call that the affective future . Now 
the interesting thing about all the rest of space-time, i.e., region 1, is that we can 
neither affect it now from 0, nor can it affect us now at 0, because nothing can go 
faster than the speed of light. Of course, what happens at R can affect us later ; 
that is, if the sun is exploding “right now,” it takes eight minutes before we know 
about it, and it cannot possibly affect us before then. 

What we mean by “right now” is a mysterious thing which we cannot define 
and we cannot affect, but it can affect us later, or we could have affected it if we 
had done something far enough in the past. When we look at the star Alpha 
Centauri, we see it as it was four years ago; we might wonder what it is like “now.” 
“Now” means at the same time from our special coordinate system. We can only 
see Alpha Centauri by the light that has come from our past, up to four years ago, 
but we do not know what it is doing “now”; it will take four years before what it 
is doing “now” can affect us. Alpha Centauri “now” is an idea or concept of our 
mind; it is not something that is really definable physically at the moment, because 
we have to wait to observe it; we cannot even define it right “now.” Furthermore, 
the “now” depends on the coordinate system. If, for example, Alpha Centauri 
were moving, an observer there would not agree with us because he would put 
his axes at an angle, and his “now” would be a different time. We have already 
talked about the fact that simultaneity is not a unique thing. 

There are fortune tellers, or people who tell us they can know the future, and 
there are many wonderful stories about the man who suddenly discovers that he 
has knowledge about the affective future. Well, there are lots of paradoxes pro- 
duced by that because if we know something is going to happen, then we can make 
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sure we will avoid it by doing the right thing at the right time, and so on. But 
actually there is no fortune teller who can even tell us the presentl There is no one 
who can tell us what is really happening right now, at any reasonable distance, 
because that is unobservable. We might ask ourselves this question, which we 
leave to the student to try to answer: Would any paradox be produced if it were 
suddenly to become possible to know things that are in the space-like intervals of 
region 1? 


17-4 More about four-vectors 

Let us now return to our consideration of the analogy of the Lorentz trans- 
formation and rotations of the space axes. We have learned the utility of collecting 
together other quantities which have the same transformation properties as the 
coordinates, to form what we call vectors , directed lines. In the case of ordinary 
rotations, there are many quantities that transform the same way as x;, y 9 and z 
under rotation: for example, the velocity has three components, an x;, y 9 and 
z-component; when seen in a different coordinate system, none of the components 
is the same, instead they are all transformed to new values. But, somehow or other, 
the velocity “itself” has a greater reality than do any of its particular components, 
and we represent it by a directed line. 

We therefore ask: Is it or is it not true that there are quantities which transform, 
or which are related, in a moving system and in a nonmoving system, in the same 
way as x;, y 9 z, and /? From our experience with vectors, we know that three of 
the quantities, like x, y, z, would constitute the three components of an ordinary 
space-vector, but the fourth quantity would look like an ordinary scalar under 
space rotation, because it does not change so long as we do not go into a moving 
coordinate system. Is it possible, then, to associate with some of our known 
“three-vectors” a fourth object, that we could call the “time component,” in such a 
manner that the four objects together would “rotate” the same way as position 
and time in space-time? We shall now show that there is, indeed, at least one such 
thing (there are many of them, in fact): the three components of momentum , and the 
energy as the time component , transform together to make what we call a “four- 
vector.” In demonstrating this, since it is quite inconvenient to have to write c’s 
everywhere, we shall use the same trick concerning units of the energy, the mass, 
and the momentum, that we used in Eq. (17.4). Energy and mass, for example' 
differ only by a factor c 2 which is merely a question of units, so we can say energy 
is the mass. Instead of having to write the c 2 , we put E = m, and then, of course, 
if there were any trouble we would put in the right amounts of c so that the units 
would straighten out in the last equation, but not in the intermediate ones. 

Thus our equations for energy and momentum are 

E = m = m 0 /y/ 1 — v 2 , 

(17.6) 

p = mv = wtov/V 1 — v 2 . 

Also in these units, we have 

E 2 - p 2 = mi (17.7) 

For example, if we measure energy in electron volts, what does a mass of 1 electron 
volt mean? It means the mass whose rest energy is 1 electron volt, that is, m 0 c 2 

is one electron volt. For example, the rest mass of an electron is 0.51 1 X 10 6 ev. 

Now what would the momentum and energy look like in a new coordinate 
system? To find out, we shall have to transform Eq. (17.6), which we can do 
because we know how the velocity transforms. Suppose that, as we measure it, an 
object has a velocity v 9 but we look upon the same object from the point of view 
of a space ship which itself is moving with a velocity w, and in that system we use a 
prime to designate the corresponding thing. In order to simplify things at first, 
we shall take the case that the velocity v is in the direction of «. (Later, we can do the 
more general case.) What is v r , the velocity as seen from the space ship ? It is the 
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composite velocity, the “difference” between v and u. 
out before. 


By the law which we worked 
(17.8) 


Now let us calculate the new energy E\ the energy as the fellow in the space ship 
would see it. He would use the same rest mass, of course, but he would use v f for 
the velocity. What we have to do is square i>\ subtract it from one, take the square 
root, and take the reciprocal: 


Therefore 


v 


/2 


1 - V 1 


,/2 


v 2 — 2 uv + u 2 


1 - 

2 uv + u 2 v 2 1 

1 - 

2 uv + u 2 v 2 — v 2 + 2 uv 


1 — 2 uv + u 2 v 2 

1 - 

2 2 i 2 2 

v — U + U V 

1 

— 2 uv + u 2 v 2 

(1 - 

- y 2 )(l — « 2 ) 


(1 — uv) 2 


1 1 — uv 

Vl — v' 2 \/l — V 2 Vl “ u 2 


(17.9) 


The energy E ' is then simply m 0 times the above expression. But we want to 
express the energy in terms of the unprimed energy and momentum, and we note 
that 

F , = m 0 - m 0 uv = (m 0 /\/l - v 2 ) - (m 0 v/VY~^v 2 ) u 
Vl — v 2 y/\ — u 2 Vl — u 2 


or 

E' = - E -~. u . Px = , (17.10) 

Vi - « 2 

which we recognize as being exactly of the same form as 


f = 


t — ux 
VT-u 2 ‘ 


Next we must find the new momentum p' x . This is just the energy E times v', and 
is also simply expressed in terms of E and p : 


Px 


Thus 


EV = 


m 0 (l — uv) 

Vl - V 2 Vl - U 2 


v — u 
(1 - uv) 


Px = 


Px - uE ^ 
V 1 — u 2 


m 0 v — rripu 
V 1 — V 2 Vl — u 2 


(17.11) 


which we recognize as being of precisely the same form as 


, x — ut 

x “ W^ 2 ' 

Thus the transformations for the new energy and momentum in terms of 
the old energy and momentum are exactly the same as the transformations for 
in terms of t and x , and x f in terms of a: and /: all we have to do is, every time we 
see t in (17.4) substitute E , and every time we see x substitute p x , and then the 
equations (17.4) will become the same as Eqs. (17.10) and (17.11). This would 
imply, if everything works right, an additional rule that p f y = p y and that p f z — p z . 
To prove this would require our going back and studying the case of motion up 
and down. Actually, we did study the case of motion up and down in the last 
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chapter. We analyzed a complicated collision and we noticed that, in fact, the 
transverse momentum is not changed when viewed from a moving system; so we 
have already verified that p y — p y and p z = p z . The complete transformation, 

p x - uE 

Vl - W 2 5 




Px = 


P'z = />* 

E , = E - up x 
Vl — M 2 


(17.12) 


In these transformations, therefore, we have discovered four quantities which 
transform like x, y, z, and t , and which we call the four-vector momentum . Since 
the momentum is a four-vector, it can be represented on a space-time diagram of a 
moving particle as an “arrow” tangent to the path, as shown in Fig. 17-4. This 
arrow has a time component equal to the energy, and its space components repre- 
sent its three-vector momentum; this arrow is more “real” than either the energy or 
the momentum, because those just depend on how we look at the diagram. 


17-5 Four-vector algebra 


The notation for four-vectors is different than it is for three-vectors. In the 
case of three-vectors, if we were to talk about the ordinary three-vector momentum 
we would write it p. If we wanted to be more specific, we could say it has three 
components which are, for the axes in question, p x , p y , and p z , or we could simply 
refer to a general component as pi , and say that i could either be x, y , or z, and that 
these are the three components; that is, imagine that / is any one of three directions, 
x , y , or z. The notation that we use for four-vectors is analogous to this: we write 
Pn for the four-vector, and p stands for the four possible directions t , x , y, or z. 

We could, of course, use any notation we want; do not laugh at notations; 
invent them, they are powerful. In fact, mathematics is, to a large extent, invention 
of better notations. The whole idea of a four-vector, in fact, is an improvement in 
notation so that the transformations can be remembered easily. A M , then, is a 
general four-vector, but for the special case of momentum, the p t is identified as 
the energy, p x is the momentum in the x-direction, p y is that in the ^-direction, and 
p z is that in the z-direction. To add four-vectors, we add the corresponding com- 
ponents. 

If there is an equation among four-vectors, then the equation is true for 
each component . For instance, if the law of conservation of three-vector momentum 
is to be true in particle collisions, i.e., if the sum of the momenta for a large number 
of interacting or colliding particles is to be a constant, that must mean that the 
sums of all momenta in the x-direction, in thej-direction, and in the z-direction, for 
all the particles, must each be constant. This law alone would be impossible in 
relativity because it is incomplete ; it is like talking about only two of the components 
of a three-vector. It is incomplete because if we rotate the axes, we mix the various 
components, so we must include all three components in our law. Thus, in rela- 
tivity, we must complete the law of conservation of momentum by extending it to 
include the time component. This is absolutely necessary to go with the other 
three, or there cannot be relativistic invariance. The conservation of energy is the 
fourth equation which goes with the conservation of momentum to make a valid 
four-vector relationship in the geometry of space and time. Thus the law of con- 
servation of energy and momentum in four-dimensional notation is 


2 a = X) /»#. 

particles particles 

in out 

or, in a slightly different notation 



i j 


(17.13) 


(17.14) 


Fig. 17-4. 
of a particle. 



The four-vector momentum 
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where i = 1,2,... refers to the particles going into the collision, j = 1,2,... 
refers to the particles coming out of the collision, and p = x, y, z, or t. You say, 
“In which axes?” It makes no difference. The law is true for each component, 
using any axes. 

In vector analysis we discussed one other thing, the dot product of two vectors. 
Let us now consider the corresponding thing in space-time. In ordinary rotation 
we discovered there was an unchanged quantity x 2 + y 2 + z 2 . In four dimen- 
sions, we find that the corresponding quantity is t 2 — x 2 — y 2 — z 2 (Eq. 17.3). 
How can we write that? One way would be to write some kind of four-dimensional 
thing with a square dot between, like A? <S> one of the notations which is 
actually used is 

Yi ' = Af - Al - A* - Ai (17.15) 

The prime on £ means that the first term, the “time” term, is positive, but the 
other three terms have minus signs. This quantity, then, will be the same in any 
coordinate system, and we may call it the square of the length of the four-vector. 
For instance, what is the square of the length of the four-vector momentum of a 
single particle? This will be equal to p 2 - p\ - p\ - p 2 or, in other words, 
E 2 - p 2 , because we know that p t is E. What is E 2 - p 2 l It must be something 
which is the same in every coordinate system. In particular, it must be the same 
for a coordinate system which is moving right along with the particle, in which 
the particle is standing still. If the particle is standing still, it would have no 
momentum. So in that coordinate system, it is purely its energy, which is the same 
as its rest mass. Thus E 2 - p 2 = ml, So we see that the square of the length of 
this vector, the four-vector momentum, is equal to m\. 

From the square of a vector, we can go on to invent the “dot product,” or the 
product which is a scalar: if is one four-vector and b M is another four-vector, then 
the scalar product is 

Qfibp @tbt &xbx a z b z . (17,16) 

It is the same in all coordinate systems. 

Finally, we shall mention certain things whose rest mass m 0 is zero. A photon 
of light, for example. A photon is like a particle, in that it carries an energy and 
a momentum. The energy of a photon is a certain constant, called Planck’s con- 
stant, times the frequency of the photon: E = hv. Such a photon also carries a 
momentum, and the momentum of a photon (or of any other particle, in fact) is h 
divided by the wavelength :p = h/\. But, for a photon, there is a definite relation- 
ship between the frequency and the wavelength: v = c/X. (The number of waves 
per second, times the wavelength of each, is the distance that the light goes in 
one second, which, of course, is c.) Thus we see immediately that the energy of a 
photon must be the momentum times c, or if c = 1, the energy and momentum 
are equal That is to say, the rest mass is zero. Let us look at that again; that is 
quite curious. If it is a particle of zero rest mass, what happens when it stops? 
It nev er stops ! It always goes at the speed c. The usual formula for energy is 
WoA/1 — v 2 . Now can we say that m 0 = 0 and v = 1, so the energy is 0? We 
cannot say that it is zero; the photon really can (and does) have energy even though 
it has no rest mass, but this it possesses by perpetually going at the speed of light! 

We also know that the momentum of any particle is equal to its total energy 
times its velocity: if c = 1, p = vE or, in ordinary units, p — vE/c 2 . For any 
particle moving at the speed of light,/? = Ei£ c = 1. The formulas for the energy 
of a photon as seen from a moving system are, of course, given by Eq. (17.12), but 
for the momentum we must substitute the energy times c (or times 1 in this case). 
The different energies after transformation means that there are different frequen- 
cies. This is called the Doppler effect, and one can calculate it easily from Eq. 
(17.12), using also E = p and E = hv. 

As Minkowski said, “Space of itself, and time of itself will sink into mere 
shadows, and only a kind of union between them shall survive.” 
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IS 


Rotation in Two Dimension s 


1&-1 The center of mass 

In the previous chapters we have been studying the mechanics of points, or 
small particles whose internal structure does not concern us. For the next few 
chapters we shall study the application of Newton’s laws to more complicated 
things. When the world becomes more complicated, it also becomes more inter- 
esting, and we shall find that the phenomena associated with the mechanics of a 
more complex object than just a point are really quite striking. Of course these 
phenomena involve nothing but combinations of Newton’s laws, but it is some- 
times hard to believe that only F = ma is at work. 

The more complicated objects we deal with can be of several kinds, water 
flowing, galaxies whirling, and so on. The simplest “complicated object to analyze, 
at the start, is what we call a rigid body, a solid object that is turning as it moves 
about. However, even such a simple object may have a most complex motion, and 
we shall therefore first consider the simplest aspects of such motion, in which an 
extended body rotates about a fixed axis. A given point on such a body then moves 
in a plane perpendicular to this axis. Such rotation of a body about a fixed axis is 
called plane rotation or rotation in two dimensions. We shall later generalize the 
results to three dimensions, but in doing so we shall find that, unlike the case of 
ordinary particle mechanics, rotations are subtle and hard to understand unless 
we first get a solid grounding in two dimensions. 

The first interesting theorem concerning the motion of complicated objects 
can be observed at work if we throw an object made of a lot of blocks and spokes, 
held together by strings, into the air. Of course we know it goes in a parabola, 
because we studied that for a particle. But now our object is not a particle; it 
wobbles and it jiggles, and so on. It does go in a parabola though; one can see 
that. What goes in a parabola? Certainly not the point on the corner of the block, 
because that is jiggling about; neither is it the end of the wooden stick, or the middle 
of the wooden stick, or the middle of the block. But something goes in a parabola, 
there is an effective “center” which moves in a parabola. So our first theorem 
about complicated objects is to demonstrate that there is a mean position which is 
mathematically definable, but not necessarily a point of the material itself, which 
goes in a parabola. That is called the theorem of the center of the mass, and the 
proof of it is as follows. 

We may consider any object as being made of lots of little particles, the atoms, 
with various forces among them. Let i represent an index which defines one of the 
particles. (There are millions of them, so i goes to 10 23 , or something.) Then the 
force on the zth particle is, of course, the mass times the acceleration of that 
particle: 

F t = m i{d 2 n/dt 2 ). (18.1) 


18-1 The center of mass 

18-2 Rotation of a rigid body 

18-3 Angular momentum 

18-4 Conservation of angular 
momentum 


In the next few chapters our moving objects will be ones in which all the 
parts are moving at speeds very much slower than the speed of light, and we shall 
use the nonrelativistic approximation for all quantities. In these circumstances 
the mass is constant, so that 

F»- = d 2 (miTi)/dt 2 (18.2) 


If we now add the force on all the particles, that is, if we take the sum of all the 
F/s for all the different indexes, we get the total force, F. On the other side of the 
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equation, we get the same thing as though we added before the differentiation: 

V Fi = F - ALt'WJ*) B (1.8.3) 

i dt 2 

Therefore the total force is the second derivative of the masses times their positions, 
added together. 

Now the total force on all the particles is the same as the external force. Why? 
Although there are all kinds of forces on the particles because of the strings, the 
wigglings, the pullings and pushings, and the atomic forces, and who knows what, 
and we have to add all these together, we are rescued by Newton’s Third Law. 
Between any two particles the action and reaction are equal, so that when we add 
all the equations together, if any two particles have forces between them it cancels 
out in the sum; therefore the net result is only those forces which arise from other 
particles which are not included in whatever object we decide to sum over. So if 
Eq. (18.3) is the sum over a certain number of the particles, which together are 
called “the object,” then the external force on the total object is equal to the sum 
of all the forces on all its constituent particles. 

Now it would be nice if we could write Eq. (18.3) as the total mass times some 
acceleration. We can. Let us say M is the sum of all the masses, i.e., the total mass. 
Then if we define a certain vector R to be 

R = ^2 mpi/M, 08.4) 

Eq. (18.3) will be simply 

F = d 2 (MR)/dt 2 = M(d 2 R/dt 2 ), (18.5) 

since M is a constant. Thus we find that the external force is the total mass times 
the acceleration of an imaginary point whose location is R. This point is called 
the center of mass of the body. It is a point somewhere in the “middle” of the 
object, a kind of average r in which the different r/s have weights or importances 
proportional to the masses. 

We shall discuss this important theorem in more detail in a later chapter, and 
we shall therefore limit our remarks to two points: First, if the external forces are 
zero, if the object were floating in empty space, it might whirl, and jiggle, and twist, 
and do all kinds of things. But the center of mass, this artificially invented, cal- 
culated position, somewhere in the middle, will move with a constant velocity. 
In particular, if it is initially at rest, it will stay at rest. So if we have some kind 
of a box, perhaps a space ship, with people in it, and we calculate the location of 
the center of mass and find it is standing still, then the center of mass will continue 
to stand still if no external forces are acting on the box. Of course, the space ship 
may move a little in space, but that is because the people are walking back and 
forth inside; when one walks toward the front, the ship goes toward the back so as 
to keep the average position of all the masses in exactly the same place. 

Is rocket propulsion therefore absolutely impossible because one cannot move 
the center of mass? No; but of course we find that to propel an interesting part 
of the rocket, an uninteresting part must be thrown away. In other words, if we 
start with a rocket at zero velocity and we spit some gas out the back end, then this 
little blob of gas goes one way as the rocket ship goes the other, but the center of 
mass is still exactly where it was before. So we simply move the part that we are 
interested in against the part we are not interested in. 

The second point concerning the center of mass, which is the reason we 
introduced it into our discussion at this time, is that it may be treated separately 
from the “internal” motions of an object, and may therefore be ignored in our 
discussion of rotation. 

18-2 Rotation of a rigid body 

Now let us discuss rotations. Of course an ordinary object does not simply 
rotate, it wobbles, shakes, and bends, so to simplify matters we shall discuss the 
motion of a nonexistent ideal object which we call a rigid body . This means an 


18-2 



object in which the forces between the atoms are so strong, and of such character, 
that the little forces that are needed to move it do not bend it. Its shape stays 
essentially the same as it moves about. If we wish to study the motion of such a 
body, and agree to ignore the motion of its center of mass, there is only one thing 
left for it to dp, and that is to turn. We have to describe that. How? Suppose 
there is some line in the body which stays put (perhaps it includes the center of 
mass and perhaps not), and the body is rotating about this particular line as an 
axis. How do we define the rotation? That is easy enough, for if we mark a point 
somewhere on the object, anywhere except on the axis, we can always tell exactly 
where the object is, if we only know where this point has gone to. The only thing 
needed to describe the position of that point is an angle. So rotation consists of a 
study of the variations of the angle with time. 

In order to study rotation, we observe the angle through which a body has 
turned. Of course, we are not referring to any particular angle inside the object 
itself ; it is not that we draw some angle on the object. We are talking about the 
angular change of the position of the whole thing, from one time to another. 

First, let us study the kinematics of rotations. The angle will change with time, 
and just as we talked about position and velocity in one dimension, we may talk 
about angular position and angular velocity in plane rotation. In fact, there is a 
very interesting relationship between rotation in two dimensions and one-dimen- 
sional displacement, in which almost every quantity has its analog. First, we have 
the angle 0 which defines how far the body has gone around ; this replaces the 
distance y, which defines how far it has gone along. In the same manner, we have a 
velocity of turning, w = dd/dt, which tells us how much the angle changes in a 
second, just as u = ds/dt describes how fast a thing moves, or how far it moves 
in a second. If the angle is measured in radians, then the angular velocity w will 
be so and so many radians per second. The greater the angular velocity, the faster 
the object is turning, the faster the angle changes. We can go on: we can differ- 
entiate the angular velocity with respect to time, and we can call a = dco/dt = 
d 2 d/dt 2 the angular acceleration. That would be the analog of the ordinary accel- 
eration. 

Now of course we shall have to relate the dynamics of rotation to the laws of 
dynamics of the particles of which the object is made, so we must find out how a 
particular particle moves when the angular velocity is such and such. To do this, 
let us take a certain particle which is located at a distance r from the axis and say 
it is in a certain location P(x, y) at a given instant, in the usual manner (Fig. 18-1). 
If at a moment A t later the angle of the whole object has turned through A0, then 
this particle is carried with it. It is at the same radius away from O as it was before, 
but is carried to Q. The first thing we would like to know is how much the distance 
x changes and how much the distance y changes. If OP is called r, then the length 
PQ is r A 0, because of the way angles are defined. The change in x, then, is simply 
the projection of r A 0 in the x-direction : 



Fig. 18-1. Kinematics of two-dimen- 
sional rotation. 


A* = —PQ sin 0 = —r A0 • (y/r) — —y A0. (18.6) 

Similarly, 

Ay = +xA 0. (18.7) 

If the object is turning with a given angular velocity w, we find, by dividing both 
sides of (18.6) and (18.7) by At, that the velocity of the particle is 

v x - -wy and v v = +wx. (18.8) 

Of course if we want to find the magnitude of the velocity, we just write 

v = Vp* _|_ „2 = V u 2 y 2 + (a 2 x‘ 2 = uy/x 2 + y 1 = c or. (ig 9 ) 

It should not be mysterious that the value of the magnitude of this velocity is 
wr; in fact, it should be self-evident, because the distance that it moves is r Ad and 
the distance it moves per second is r Ad/ At, or rco. 
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Let us now move on to consider the dynamics of rotation. Here a new 
concept, force , must be introduced. Let us inquire whether we can invent something 
which we shall call the torque (L. torquere , to twist) which bears the same relation- 
ship to rotation as force does to linear movement. A force is the thing that is 
needed to make linear motion, and the thing that makes something rotate is a 
“rotary force” or a “twisting force,” i.e., a torque. Qualitatively, a torque is a 
“twist”; what is a torque quantitatively? We shall get to the theory of torques 
quantitatively by studying the work done in turning an object, for one very nice 
way of defining a force is to say how much work it does when it acts through a 
given displacement. We are going to try to maintain the analogy between linear 
and angular quantities by equating the work that We do when we turn something a 
little bit when there are forces acting on it, to the torque times the angle it turns 
through. In other words, the definition of the torque is going to be so arranged 
that the theorem of work has an absolute analog: force times distance is work, and 
torque times angle is going to be work. That tells us what torque is. Consider, 
for instance, a rigid body of some kind with various forces acting on it, and an 
axis about which the body rotates. Let us at first concentrate on one force and 
suppose that this force is applied at a certain point (x, y). How much work would 
be done if we were to turn the object through a very small angle? That is easy. 
The work done is 

AW = F x Ax + F y Ay . (18.10) 

We need only to substitute Eqs. (18.6) and (18.7) for Ax and Ay to obtain 

AW = {xF y - yF x )Ad . (18.11) 

That is, the amount of work that we have done is, in fact, equal to the angle through 
which we have turned the object, multiplied by a strange-looking combination of 
the force and the distance. This “strange combination” is what we call the torque. 
So, defining the change in work as the torque times the angle, we now have the 
formula for torque in terms of the forces. (Obviously, torque is not a completely 
new idea independent of Newtonian mechanics — torque must have a definite 
definition in terms of the force.) 

When there are several forces acting, the work that is done is, of course, the 
sum of the works done by all the forces, so that A IF will be a whole lot of terms, 
all added together, for all the forces, each of which is proportional , however , to Ad . 
We can take the A 6 outside and therefore can say that the change in the work is 
equal to the sum of all the torques due to all the different forces that are acting, 
times AS. This sum we might call the total torque, r. Thus torques add by the 
ordinary laws of algebra, but we shall later see that this is only because we are 
working in a plane. It is like one-dimensional kinematics, where the forces simply 
add algebraically, but only because they are all in the same direction. It is more 
complicated in three dimensions. Thus, for two-dimensional rotation, 

Ti = XiFyi - y { F xi (18.12) 

and 

r = X>- (18.13) 

It must be emphasized that the torque is about a given axis. If a different ( axis is 
chosen, so that all the x* and yi are changed, the value of the torque is (usually) 
changed too. 

Now we pause briefly to note that our foregoing introduction of torque, 
through the idea of work, gives us a most important result for an object in equilib- 
rium: if all the forces on an object are in balance both for translation and rotation, 
not only is the net force zero, but the total of all the torques is also zero, because 
if an object is in equilibrium, no work is done by the forces for a small displacement . 
Therefore, since AW = r Ad = 0, the sum of all the torques must be zero. So 
there are two conditions for equilibrium: that the sum of the forces is zero, and 
that the sum of the torques is zero. Prove that it suffices to be sure that the sum 
of torques about any one axis (in two dimensions) is zero. 
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Now let us consider a single force, and try to figure out, geometrically, what 
this strange thing xF y — yF x amounts to. In Fig. 18-2 we see a force F acting at 
a point r. When the object has rotated through a small angle A0, the work done, 
of course, is the component of force in the direction of the displacement times the 
displacement. In other words, it is only the tangential component of the force 
that counts, and this must be multiplied by the distance r AS . Therefore we see 
that the torque is also equal to the tangential component of force (perpendicular 
to the radius) times the radius. That makes sense in terms of our ordinary idea 
of the torque, because if the force were completely radial, it would not put any 
“twist” on the body; it is evident that the twisting effect should involve only the 
part of the force which is not pulling out from the center, and that means the 
tangential component. Furthermore, it is clear that a given force is more effective 
on a long arm than near the axis. In fact, if we take the case where we push right 
on the axis, we are not twisting at all! So it makes sense that the amount of twist, 
or torque, is proportional both to the radial distance and to the tangential com- 
ponent of the force. 

There is still a third formula for the torque which is very interesting. We have 
just seen that the torque is the force times the radius times the sine of the angle a, 
in Fig. 18-2. But if we extend the line of action of the force and draw the line OS , 
the perpendicular distance to the line of action of the force (the lever arm of the 
force) we notice that this lever arm is shorter than r in just the same proportion 
as the tangential part of the force is less than the total force. Therefore the formula 
for the torque can also be written as the magnitude of the force times the length 
of the lever arm. 

The torque is also often called the moment of the force. The origin of this 
term is obscure, but it may be related to the fact that “moment” is derived from 
the Latin movimentum , and that the capability of a force to move an object (using 
the force on a lever or crowbar) increases with the length of the lever arm. In 
mathematics ‘moment” means weighted by how far away it is from an axis. 



Fig. 18—2. The torque produced by 
a force. 


18-3 Angular momentum 

Although we have so far considered only the special case of a rigid body, the 
properties of torques and their mathematical relationships are interesting also 
even when an object is not rigid. In fact, we can prove a very remarkable theorem: 
just as external force is the rate of change of a quantity /?, which we call the total 
momentum of a collection of particles, so the external torque is the rate of change 
of a quantity L which we call the angular momentum of the group of particles. 

To prove this, we shall suppose that there is a system of particles on which 
there are some forces acting and find out what happens to the system as a result of 
the torques due to these forces. First, of course, we should consider just one 
particle. In Fig. 18-3 is one particle of mass m, and an axis 0 \ the particle is not 
necessarily rotating in a circle about O, it may be moving in an ellipse, like a planet 
going around the sun, or in some other curve. It is moving somehow, and there 
are forces on it, and it accelerates according to the usual formula that the x-com- 
ponent of force is the mass times the x-component of acceleration, etc. But let us 
see what the torque does. The torque equals xF y — yF x , and the force in the 
x- or ^-direction is the mass times the acceleration in the x- or y-direction : 



Fig. 18-3. 
an axis O. 


A particle moves about 


t xF y yF x 

= xm(d 2 y/dt 2 ) — ym(d 2 x/dt 2 ). (18.14) 


Now, although this does not appear to be the derivative of any simple quantity, it 
is in fact the derivative of the quantity xm(dy/dt) — ym(dx/dt ): 



(18.15) 
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So it is true that the torque is the rate of change of something with time! So we 
pay attention to the “something,” we give it a name; we call it L, the angular 
momentum: 

L = xm(dy/dt) — ym(dx/dt) 

= xp y — yp x . (18.16) 

Although our present discussion is nonrelativistic, the second form for L 
given above is relativisticalty correct. So we have found that there is also a rota- 
tional analog for the momentum, and that this analog, the angular momentum, is 
given by an expression in terms of the components of linear momentum that is 
just like the formula for torque in terms of the force components! Thus, if we want 
to know the angular momentum of a particle about an axis, we take only the 
component of the momentum that is tangential, and multiply it by the radius. In 
other words, what counts for angular momentum is not how fast it is going away 
from the origin, but how much it is going around the origin. Only the tangential 
part of the momentum counts for angular momentum. Furthermore, the farther 
out the line of the momentum extends, the greater the angular momentum. And 
also, because the geometrical facts are the same whether the quantity is labeled 
p or F, it is true that there is a lever arm (not the same as the lever arm of the force 
on the particle!) which is obtained by extending the line of the momentum and 
finding the perpendicular distance to the axis. Thus the angular momentum is the 
magnitude of the momentum times the momentum lever arm. So we have three 
formulas for angular momentum, just as we have three formulas for the torque: 

L = xp y — yp x 

= ^tang 

= p- lever arm. (18.17) 

Like torque, angular momentum depends upon the position of the axis about 
which it is to be calculated. 

Before proceeding to a treatment of more than one particle, let us apply the 
above results to a planet going around the sun. In which direction is the force? 
The force is toward the sun. What, then, is the torque on the object? Of course, 
this depends upon where we take the axis, but we get a very simple result if we 
take it at the sun itself, for the torque is the force times the lever arm, or the com- 
ponent of force perpendicular to r, times r. But there is no tangential force, so 
there is no torque about an axis at the sun! Therefore, the angular momentum of 
the planet going around the sun must remain constant. Let us see what that means. 
The tangential component of velocity, times the mass, times the radius, will be 
constant, because that is the angular momentum, and the rate of change of the 
angular momentum is the torque, and, in this problem, the torque is zero. Of 
course since the mass is also a constant, this means that the tangential velocity 
times the radius is a constant. But this is something we already knew for the motion 
of a planet. Suppose we consider a small amount of time At. How far will the 
planet move when it moves from P to Q (Fig. 18-3)? How much area will it sweep 
through? Disregarding the very tiny area QQ'P compared with the much larger 
area OPQ , it is simply half the base PQ times the height, OR. In other words, the 
area that is swept through in unit time will be equal to the velocity times the lever 
arm of the velocity (times one-half). Thus the rate of change of area is proportional 
to the angular momentum, which is constant. So Kepler’s law about equal areas 
in equal times is a word description of the statement of the law of conservation of 
angular momentum, when there is no torque produced by the force. 

18-4 Conservation of angular momentum 

Now we shall go on to consider what happens when there is a large number 
of particles, when an object is made of many pieces with many forces acting between 
them and on them from the outside. Of course, we already know that, about any 
given fixed axis, the torque on the /th particle (which is the force on the ith particle 
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times the lever arm of that force) is equal to the rate of change of the angular 
momentum of that particle, and that the angular momentum of the ith particle 
is its momentum times its momentum lever arm. Now suppose we add the torques 
n for all the particles and call it the total torque t. Then this will be the rate of 
change of the sum of the angular momenta of all the particles and that defines 
a new quantity which we call the total angular momentum L. Just as the total 
momentum of an object is the sum of the momenta of all the parts, so the angular 
momentum is the sum of the angular momenta of all the parts. Then the rate of 
change of the total L is the total torque: 



Now it might seem that the total torque is a complicated thing. There are all 
those internal forces and all the outside forces to be considered. But, if we take 
Newton’s law of action and reaction to say, not simply that the action and reaction 
are equal, but also that they are directed exactly oppositely along the same line 
(Newton may or may not actually have said this, but he tacitly assumed it), then 
the two torques on the reacting objects, due to their mutual interaction, will be 
equal and opposite because the lever arms for any axis are equal. Therefore the 
internal torques balance out pair by pair, and so we have the remarkable theorem 
that the rate of change of the total angular momentum about any axis is equal to 
the external torque about that axis ! 

r = = r ext = dL/dt. (18.19) 

Thus we have a very powerful theorem concerning the motion of large collections 
of particles, which permits us to study the over-all motion without having to look 
at the detailed machinery inside. This theorem is true for any collection of objects, 
whether they form a rigid body or not. 

One extremely important case of the above theorem is the law of conservation 
of angular momentum: if no external torques act upon a system of particles, the 
angular momentum remains constant. 

A special case of great importance is that of a rigid body, that is, an object of a 
definite shape that is just turning around. Consider an object that is fixed in its 
geometrical dimensions, and which is rotating about a fixed axis. Various parts of 
the object bear the same relationship to one another at all times. Now let us try 
to find the total angular momentum of this object. If the mass of one of its particles 
is m iy and its position or location is at (x it yi), then the problem is to find the 
angular momentum of that particle, because the total angular momentum is the 
sum of the angular momenta of all such particles in the body. For an object going 
around in a circle, the angular momentum, of course, is the mass times the velocity 
times the distance from the axis, and the velocity is equal to the angular velocity 
times the distance from the axis: 

Li = m{ 0 iri = w 4 r?w, (18.20) 

or, summing over all the particles /, we get 

L = /«, 

where 

i = ^2 

i 

This is the analog of the law that the momentum is mass times velocity. 
Velocity is replaced by angular velocity, and we see that the mass is replaced by 
a new thing which we call the moment of inertia /, which is analogous to the mass. 
Equations (18.21) and (18.22) say that a body has inertia for turning which depends, 
not just on the masses, but on how far away they are from the axis. So, if we have 
two objects of the same mass, when we put the masses farther away from the axis, 
the inertia for turning will be higher. This is easily demonstrated by the apparatus 
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(18.21) 

(18.22) 




Fig. 18-4. The “inertia for turning" 
depends upon the lever arm of the masses. 


shown in Fig. 18-4, where a weight M is kept from falling very fast because it has 
to turn the large weighted rod. At first, the masses m are close to the axis, and 
M speeds up at a certain rate. But when we change the moment of inertia by 
putting the two masses m much farther away from the axis, then we see that M 
accelerates much less rapidly than it did before, because the body has much more 
inertia against turning. The moment of inertia is the inertia against turning, and 
is the sum of the contributions of all the masses, times their distances squared , 
from the axis. 

There is one important difference between mass and moment of inertia which 
is very dramatic. The mass of an object never changes, but its moment of inertia 
can be changed. If we stand on a frictionless rotatable stand with our arms out- 
stretched, and hold some weights in our hands as we rotate slowly, we may change 
our moment of inertia by drawing our arms in, but our mass does not change. 
When we do this, all kinds of wonderful things happen, because of the law of the 
conservation of angular momentum: If the external torque is zero, then the angular 
momentum, the moment of inertia times omega, remains constant. Initially, we 
were rotating with a large moment of inertia 7 X at a low angular velocity and 
the angular momentum was Then we changed our moment of inertia by 

pulling our arms in, say to a smaller value 7 2 . Then the product 7co, which has to 
stay the same because the total angular momentum has to stay the same, was 
7 2 co 2 . So 7iC0i = 7 2 a> 2 * That is, if we reduce the moment of inertia, we have to 
increase the angular velocity. 
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19 


Center of Mass; Moment of Inertia 


19-1 Properties of the center of mass 


In the previous chapter we found that if a great many forces are acting on a 
complicated mass of particles, whether the particles comprise a rigid or a nonrigid 
body, or a cloud of stars, or anything else, and we find the sum of all the forces 
(that is, of course, the external forces, because the internal forces balance out), then 
if we consider the body as a whole, and say it has a total mass M, there is a certain 
point “inside” the body, called the center of mass , such that the net resulting 
external force produces an acceleration of this point, just as though the whole 
mass were concentrated there. Let us now discuss the center of mass in a little 
more detail. 

The location of the center of mass (abbreviated CM) is given by the equation 


Rom = 


2 miti 

2 


(19.1) 


19-1 Properties of the center of mass 
19-2 Locating the center of mass 
19-3 Finding the moment of inertia 
19-4 Rotational kinetic energy 


This is, of course, a vector equation which is really three equations, one for each of 
the three directions. We shall consider only the x-direction, because if we can 
understand that one, we can understand the other two. What does — 
YjniXilYJfti mean? Suppose for a moment that the object is divided into little 
pieces, all of which have the same mass m; then the total mass is simply the number 
N of pieces times the mass of one piece, say one gram, or any unit. Then this 
equation simply says that we add all the jt’s, and then divide by the number of 
things that we have added: X CM = m^Xi/mN = J^Xi/N. In other words, 
A'cm is the average of all the x% if the masses are equal. But suppose one of them 
were twice as heavy as the others. Then in the sum, that x would come in twice. 
This is easy to understand, for we can think of this double mass as being split 
into two equal ones, just like the others; then in taking the average, of course, we 
have to count that x twice because there are two masses there. Thus X is the 
average position, in the ^-direction, of all the masses, every mass being counted a 
number of times proportional to the mass, as though it were divided into “little 
grams.” From this it is easy to prove that X must be somewhere between the 
largest and the smallest x, and, therefore lies inside the envelope including the 
entire body. It does not have to be in the material of the body, for the body could 
be a circle, like a hoop, and the center of mass is in the center of the hoop, not in 
the hoop itself. 

Of course, if an object is symmetrical in some way, for instance, a rectangle, 
so that it has a plane of symmetry, the center of mass lies somewhere on the plane 
of symmetry. In the case of a rectangle there are two planes, and that locates it 
uniquely. But if it is just any symmetrical object, then the center of gravity lies 
somewhere on the axis of symmetry, because in those circumstances there are as 
many positive as negative x’s. 

Another interesting proposition is the following very curious one. Suppose 
that we imagine an object to be made of two pieces, A and B (Fig. 19-1). Then 
the center of mass of the whole object can be calculated as follows. First, find the 
center of mass of piece A , and then of piece B. Also, find the total mass of each 
piece, Ma and Mb. Then consider a new problem, in which a point mass Ma is 
at the center of mass of object A, and another point mass M B is at the center of 
mass of object B. The center of mass of these two point masses is then the center 
of mass of the whole object. In other words, if the centers of mass of various parts 



Fig. 1 9-1 . The CM of a compound 
body lies on the line joining the CM’s of 
the two composite parts. 
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of an object have been worked out, we do not have to start all over again to find 
the center of mass of the whole object; we just have to put the pieces together, 
treating each one as a point mass situated at the center of mass of that piece. 
Let us see why that is. Suppose that we wanted to calculate the center of mass of 
a complete object, some of whose particles are considered to be members of 
object A and some members of object B. The total sum YjniXi can then be split 
into two pieces — the sum J^AWiXi for the A object only, and the sum £ 
for object B only. Now if we were computing the center of mass of object A alone, 
we would have exactly the first of these sums, and we know that this by itself is 
Ma Xa , the total mass of all the particles in A times the position of the center of 
mass of A , because that is the theorem of the center of mass, applied to object A . 
In the same manner, just by looking at object B , we get M B X B , and of course, 
adding the two yields MX: 

MX C u = rriiXi + ^ m&i 

A B 

= M a X a + M b X b . (19.2) 

Now since M is evidently the sum of M A and M Bi we see that Eq. (19.2) can be 
interpreted as a special example of the center of mass formula for two point objects, 
one of mass M A located at X A and the other of mass M B located at X B . 

The theorem concerning the motion of the center of mass is very interesting, 
and has played an important part in the development of our understanding of 
physics. Suppose we assume that Newton’s law is right for the small component 
parts of a much larger object. Then this theorem shows that Newton’s law is also 
correct for the larger object, even if we do not study the details of the object, but 
only the total force acting on it and its mass. In other words, Newton’s law has 
the peculiar property that if it is right on a certain small scale, then it will be right 
on a larger scale. If we do not consider a baseball as a tremendously complex 
thing, made of myriads of interacting particles, but study only the motion of the 
center of mass and the external forces on the ball, we find F = m a, where F is the 
external force on the baseball, m is its mass, and a is the acceleration of its center 
of mass. So F = ma is a law which reproduces itself on a larger scale. (There 
ought to be a good word, out of the Greek, perhaps, to describe a law which 
reproduces the same law on a larger scale.) 

Of course, one might suspect that the first laws that would be discovered by 
human beings would be those that would reproduce themselves on a larger scale. 
Why? Because the actual scale of the fundamental gears and wheels of the universe 
are of atomic dimensions, which are so much finer than our observations that we 
are nowhere near that scale in our ordinary observations. So the first things that we 
would discover must be true for objects of no special size relative to an atomic 
scale. If the laws for small particles did not reproduce themselves on a larger scale, 
we would not discover those laws very easily. What about the reverse problem? 
Must the laws on a small scale be the same as those on a larger scale? Of course 
it is not necessarily so in nature, that at an atomic level the laws have to be the same 
as on a large scale. Suppose that the true laws of motion of atoms were given by 
some strange equation which does not have the property that when we go to a 
larger scale we reproduce the same law, but instead has the property that if we 
go to a larger scale, we can approximate it by a certain expression such that, if we 
extend that expression up and up, it keeps reproducing itself on a larger and larger 
scale. That is possible, and in fact that is the way it works. Newton’s laws are the 
“tail end” of the atomic laws, extrapolated to a very large size. The actual laws of 
motion of particles on a fine scale are very peculiar, but if we take large numbers of 
them and compound them, they approximate, but only approximate, Newton’s 
laws. Newton’s laws then permit us to go on to a higher and higher scale, and 
it still seems to be the same law. In fact, it becomes more and more accurate as 
the scale gets larger and larger. This self-reproducing factor of Newton’s laws is 
thus really not a fundamental feature of nature, but is an important historical 
feature. We would never discover the fundamental laws of the atomic particles at 
first observation because the first observations are much too crude. In fact, it turns 
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out that the fundamental atomic laws, which we call quantum mechanics, are quite 
different from Newton’s laws, and are difficult to understand because all our direct 
experiences are with large-scale objects and the small-scale atoms behave like 
nothing we see on a large scale. So we cannot say, “An atom is just like a planet 
going around the sun,” or anything like that. It is like nothing we are familiar with 
because there is nothing like it . As we apply quantum mechanics to larger and larger 
things, the laws about the behavior of many atoms together do not reproduce 
themselves, but produce new laws , which are Newton’s laws, which then continue 
to reproduce themselves from, say, micro-microgram size, which still is billions 
and billions of atoms, on up to the size of the earth, and above. 

Let us now return to the center of mass. The center of mass is sometimes 
called the center of gravity, for the reason that, in many cases, gravity may be 
considered uniform. Let us suppose that we have small enough dimensions that 
the gravitational force is not only proportional to the mass, but is everywhere 
parallel to some fixed line. Then consider an object in which there are gravitational 
forces on each of its constituent masses. Let mi be the mass of one part. Then the 
gravitational force on that part is mi times g. Now the question is, where can we 
apply a single force to balance the gravitational force on the whole thing, so that the 
entire object, if it is a rigid body, will not turn? The answer is that this force must 
go through the center of mass, and we show this in the following way. In order that 
the body will not turn, the torque produced by all the forces must add up to zero, 
because if there is a torque, there is a change of angular momentum, and thus a 
rotation. So we must calculate the total of all the torques on all the particles, and 
see how much torque there is about any given axis ; it should be zero if this axis is 
at the center of mass. Now, measuring x horizontally and y vertically, we know 
that the torques are the forces in the j-direction, times the lever arm x (that is to 
say, the force times the lever arm around which we want to measure the torque). 
Now the total torque is the sum 

r = L m iS x i = 8 L miXi ’ (19.3) 

so if the total torque is to be zero, the sum YjniXi must be zero. But £ ntiXi = MX , 
the total mass times the distance of the center of mass from the axis. Thus the 
x-distance of the center of mass from the axis is zero. 

Of course, we have checked the result only for the ^-distance, but if we use 
the true center of mass the object will balance in any position, because if we turned 
it 90 degrees, we would have y * s instead of jc’s. In other words, when an object 
is supported at its center of mass, there is no torque on it because of a parallel 
gravitational field. In case the object is so large that the nonparallelism of the 
gravitational forces is significant, then the center where one must apply the balanc- 
ing force is not simple to describe, and it departs slightly from the center of mass. 
That is why one must distinguish between the center of mass and the center of 
gravity. The fact that an object supported exactly at the center of mass will balance 
in all positions has another interesting consequence. If, instead of gravitation, 
we have a pseudoforce due to acceleration, we may use exactly the same mathe- 
matical procedure to find the position to support it so that there are no torques 
produced by the inertial force of acceleration. Suppose that the object is held in 
some manner inside a box, and that the box, and everything contained in it, is 
accelerating. We know that, from the point of view of someone at rest relative to 
this accelerating box, there will be an effective force due to inertia. That is, to 
make the object go along with the box, we have to push on it to accelerate it, and 
this force is “balanced” by the “force of inertia,” which is a pseudoforce equal to 
the mass times the acceleration of the box. To the man in the box, this is the same 
situation as if the object were in a uniform gravitational field whose “g” value is 
equal to the acceleration a . Thus the inertial force due to accelerating an object has 
no torque about the center of mass. 

This fact has a very interesting consequence. In an inertial frame that is not 
accelerating, the torque is always equal to the rate of change of the angular mo- 
mentum. However, about an axis through the center of mass of an object which 
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is accelerating, it is still true that the torque is equal to the rate of change of the 
angular momentum. Even if the center of mass is accelerating, we may still choose 
one special axis, namely, one passing through the center of mass, such that it will 
still be true that the torque is equal to the rate of change of angular momentum 
around that axis. Thus the theorem that torque equals the rate of change of angular 
momentum is true in two general cases: (1) a fixed axis in inertial space, (2) an axis 
through the center of mass, even though the object may be accelerating. 


19-2 Locating the center of mass 



Fig. 1 9-2. A right triangle and a 
right circular cone generated by ro- 
tating the triangle. 


The mathematical techniques for the calculation of centers of mass are in the 
province of a mathematics course, and such problems provide good exercise in 
integral calculus. After one has learned calculus, however, and wants to know 
how to locate centers of mass, it is nice to know certain tricks which can be used 
to do so. One such trick makes use of what is called the theorem of Pappus. It 
works like this: if we take any closed area in a plane and generate a solid by moving 
it through space such that each point is always moved perpendicular to the plane 
of the area, the resulting solid has a total volume equal to the area of the cross 
section times the distance that the center of mass moved! Certainly this is true if 
we move the area in a straight line perpendicular to itself, but if we move it in a 
circle or in some other curve, then it generates a rather peculiar volume. For a 
curved path, the outside goes around farther, and the inside goes around less, and 
these effects balance out. So if we want to locate the center of mass of a plane 
sheet of uniform density, we can remember that the volume generated by spinning 
it about an axis is the distance that the center of mass goes around, times the area 
of the sheet. 

For example, if we wish to find the center of mass of a right triangle of base 
D and height H (Fig. 19-2), we might solve the problem in the following way. 
Imagine an axis along H , and rotate the triangle about that axis through a full 
360 degrees. This generates a cone. The distance that the ^-coordinate of the 
center of mass has moved is 2ttx. The area which is being moved is the area of 
the triangle, \HD. So the x-distance of the center of mass times the area of the 
triangle is the volume swept out, which is of course ttD 2 H/3. Thus (2irx)(^HD) = 
l/3wD 2 H, or x = D/ 3. In a similar manner, by rotating about the other axis, or by 
symmetry, we find y = H/ 3. In fact, the center of mass of any uniform triangular 
area is where the three medians, the lines from the vertices through the centers of 
the opposite sides, all meet. That point is 1/3 of the way along each median. Clue: 
Slice the triangle up into a lot of little pieces, each parallel to a base. Note that the 
median line bisects every piece, and therefore the center of mass must lie on this line. 

Now let us try a more complicated figure. Suppose that it is desired to find 
the position of the center of mass of a uniform semicircular disc — a disc sliced in 
half. Where is the center of mass? For a full disc, it is at the center, of course, but 
a half-disc is more difficult. Let r be the radius and x be the distance of the center 
of mass from the straight edge of the disc. Spin it around this edge as axis to 
generate a sphere. Then the center of mass has gone around 2ttx, the area is 
irr 2 /2 (because it is only half a circle). The volume generated is, of course, 47rr 3 /3, 
from which we find that 

(2Trx)(^7rr 2 ) = 47rr 3 /3, 
or 

x — 4 t/3t. 


There is another theorem of Pappus which is a special case of the above one, 
and therefore equally true. Suppose that, instead of the solid semicircular disc, 
we have a semicircular piece of wire with uniform mass density along the wire, 
and we want to find its center of mass. In this case there is no mass in the interior, 
only on the wire. Then it turns out that the area which is swept by a plane curved 
line, when it moves as before, is the distance that the center of mass moves times 
the length of the line. (The line can be thought of as a very narrow area, and the 
previous theorem can be applied to it.) 
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19-3 Finding the moment of inertia 

Now let us discuss the problem of finding the moments of inertia of various 
objects. The formula for the moment of inertia about the z-axis of an object is 

I = ^2 mfx 2 + y 2 ) 
or 

/ = J(x 2 + y 2 ) dm = f (x 2 + y 2 )p dv. (19.4) 


That is, we must sum the masses, each one multiplied by the square of its distance 
(x 2 + yf) from the axis. Note that it is not the three-dimensional distance, only 
the two-dimensional distance squared, even for a three-dimensional object. For 
the most part, we shall restrict ourselves to two-dimensional objects, but the 
formula for rotation about the z-axis is just the same in three dimensions. 

As a simple example, consider a rod rotating about a perpendicular axis 
through one end (Fig. 19-3). Now we must sum all the masses times the x-distances 
squared (the y* s being all zero in this case). What we mean by “the sum,” of 
course, is the integral of x 2 times the little elements of mass. If we divide the rod 
into small elements of length dx , the corresponding elements of mass are propor- 
tional to dx , and if dx were the length of the whole rod the mass would be M. 
Therefore 

dm — M dx/L 

and so 


/ = 



M dx 
~~L~ 



ML 2 
3 


(19.5) 




dx 


Fig. 19-3. A straight rod of length 
L rotating about an axis through one end. 


The dimensions of moment of inertia are always mass times length squared, so 
all we really had to work out was the factor 1/3. 

Now what is I if the rotation axis is at the center of the rod? We could just 
do the integral over again, letting x range from — \L to +^L. But let us notice a 
few things about the moment of inertia. We can imagine the rod as two rods, 
each of mass M/2 and length L/2; the moments of inertia of the two small rods 
are equal, and are both given by the formula (19.5). Therefore the moment of 
inertia is 


2(M/2)(L/2f = ML 2 
1 " 3 12 


(19.6) 


Thus it is much easier to turn a rod about its center, than to swing it around an end. 

Of course, we could go on to compute the moments of inertia of various other 
bodies of interest. However, while such computations provide a certain amount 
of important exercise in the calculus, they are not basically of interest to us as 
such. There is, however, an interesting theorem which is very useful. Suppose 
we have an object, and we want to find its moment of inertia around some axis. 
That means we want the inertia needed to carry it by rotation about that axis. 
Now if we support the object on pivots at the center of mass, so that the object 
does not turn as it rotates about the axis (because there is no torque on it from 
inertial effects, and therefore it will not turn when we start moving it), then the 
forces needed to swing it around are the same as though all the mass were concen- 
trated at the center of mass, and the moment of inertia would be simply / 1 = 
MBfm where R c m is the distance from the axis to the center of mass. But of 
course that is not the right formula for the moment of inertia of an object which 
is really being rotated as it revolves, because not only is the center of it moving in 
a circle, which would contribute an amount f to the moment of inertia, but also 
we must turn it about its center of mass. So it is not unreasonable that we must 
add to 1 1 the moment of inertia I c about the center of mass. So it is a good guess 
that the total moment of inertia about any axis will be 


I ~ h + MRcm- 


(19.7) 
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This theorem is called the parallel-axis theorem , and may be easily proved. 
The moment of inertia about any axis is the mass times the sum of the x/s and 
the j/s, each squared: / = L (xf + We shall concentrate on the jc’s, but 

of course the /s work the same way. Now x is the distance of a particular point 
mass from the origin, but let us consider how it would look if we measured x' from 
the CM, instead of x from the origin. To get ready for this analysis, we write 

Xi = x\ + A’cm- 

Then we just square this to find 

x i = x 'i + 2Xc wx'i + Xcm- 

So, when this is multiplied by m { and summed over all i, what happens? Taking 
the constants outside the summation sign, we get 

!x = XI miX> ? + 2X cu X m i x 'i + x cu X] m i- 

The third sum is easy; it is just MI c 2 m . In the second sum there are two pieces, one 
of them is £ which is the total mass times the x'-coordinate of the center of 
mass. But this contributes nothing, because x' is measured from the center of mass, 
and in these axes the average position of all the particles, weighted by the masses, 
is zero. The first sum, of course, is the x part of I c . Thus we arrive at Eq. (19.7), 
just as we guessed. 

Let us check (19.7) for one example. Let us just see whether it works for the 
rod. For an axis through one end, the moment of inertia should be mL 2 / 3, for 
we calculated that. The center of mass of a rod, of course, is in the center of the 
rod, at a distance L/2, Therefore we should find that ML 2 / 3 = ML 2 /\2 + 
M(L/2) 2 . Since one-quarter plus one-twelfth is one-third, we have made no 
fundamental error. 

Incidentally, we did not really need to use an integral to find the moment of 
inertia (19.5). If we simply assume that it is ML 2 times 7 , an unknown coefficient, 
and then use the argument about the two halves to get for (19.6), then from our 
argument about transferring the axes we could prove that 7 = J7 + J, so 7 
must be 1/3. There is always another way to do it! 

In applying the parallel-axis theorem, it is of course important to remember 
that the axis for I c must be parallel to the axis about which the moment of inertia 
is wanted. 

One further property of the moment of inertia is worth mentioning because 
it is often helpful in finding the moment of inertia of certain kinds of objects. 
This property is that if one has a plane figure and a set of coordinate axes with 
origin in the plane and z-axis perpendicular to the plane, then the moment of 
inertia of this figure about the z-axis is equal to the sum of the moments of inertia 
about the x- and j-axes. This is easily proved by noting that 

h = X] Witfi + Z i) = X 

(since z t - = 0). Similarly, 

h = X] mi ( X i + Z i) = XI 

but 

= X I + y 2 *) = X] + X mi y 2 i 

— Ix + Iy 

As an example, the moment of inertia of a uniform rectangular plate of mass 
M, width w, and length L, about an axis perpendicular to the plate and through its 
center is simply 

I = M(w 2 + L 2 )/12, 

because its moment of inertia about an axis in its plane and parallel to its length 
is A/h> 2 /12, i.e., just as for a rod of length w, and the moment of inertia about the 
other axis in its plane is ML 2 / 12, just as for a rod of length L. 
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To summarize, the moment of inertia of an object about a given axis, which 
we shall call the z-axis, has the following properties: 

(1) The moment of inertia is 

h = X) + J'i) = /(* 2 + /) dm ' 

i 

(2) If the object is made of a number of parts, each of whose moment of inertia 
is known, the total moment of inertia is the sum of the moments of inertia 
of the pieces. 

(3) The moment of inertia about any given axis is equal to the moment of 
inertia about a parallel axis through the CM plus the total mass times the 
square of the distance from the axis to the CM. 

(4) If the object is a plane figure, the moment of inertia about an axis perpendicu- 
lar to the plane is equal to the sum of the moments of inertia about any two 
mutually perpendicular axes lying in the plane and intersecting at the 
perpendicular axis. 

The moments of inertia of a number of elementary shapes having uniform mass 
densities are given in Table 19-1, and the moments of inertia of some other objects, 
which may be deduced from Table 19-1, using the above properties, are given in 
Table 19-2. 

Table 19-1 


Object 

z-axis 

u 

Thin rod, length L 

JL rod at center 

ML 2 / 12 

Thin concentric 
circular ring, radii 

X ring at center 

M(rl + ii)/2 

r\ and /*2 

Sphere, radius r 

through center 

2Mr 2 / 5 


Table 19-2 


Object 

z-axis 

h 

Rect. sheet, sides a, b 

|| b at center 

Ma 2 / 12 

Rect. sheet, sides a , b 

X sheet at 
center 

M(a 2 + b 2 )/ 12 

Thin annular ring, 
radii ri, r 2 

any diameter 

M(r\ + r|)/4 

Rect. parallelepiped, 
sides a, b , c 

c, through 
center 

M{a 2 + b 2 )/ 12 

Rt. circ. cyl., radius 
r, length L 

|| L , through 
center 

Mr 2 / 2 

Rt. circ. cyl., radius 
/*, length L 

X L, through 
center 

M(r 2 /4 + L 2 / 12) 


19-4 Rotational kinetic energy 

Now let us go on to discuss dynamics further. In the analogy between linear 
motion and angular motion that we discussed in Chapter 18, we used the work 
theorem, but we did not talk about kinetic energy. What is the kinetic energy of a 
rigid body, rotating about a certain axis with an angular velocity co? We can im- 
mediately guess the correct answer by using our analogies. The moment of inertia 
corresponds to the mass, angular velocity corresponds to velocity, and so the 
kinetic energy ought to be ^/co 2 , and indeed it is, as will now be demonstrated. 
Suppose the object is rotating about some axis so that each point has a velocity 
whose magnitude is cor*, where ri is the radius from the particular point to the axis. 
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Then if m, is the mass of that point, the total kinetic energy of the whole thing is 
just the sum of the kinetic energies of all of the little pieces: 

T = \ nn(riu) 2 . 

Now w 2 is a constant, the same for all points. Thus 

T = W 2 Yj imr 2 = \Iu 2 . (19.8) 

At the end of Chapter 18 we pointed out that there are some interesting 
phenomena associated with an object which is not rigid, but which changes from 
one rigid condition with a definite moment of inertia, to another rigid condition. 
Namely, in our example of the turntable, we had a certain moment of inertia 7 X 
with our arms stretched out, and a certain angular velocity « x . When we pulled 
our arms in, we had a different moment of inertia, 7 2 , and a different angular veloc- 
ity. w 2 , but again we were “rigid.” The angular momentum remained constant, 
since there was no torque about the vertical axis of the turntable. This means that 
I i w i = A 2 co 2 . Now what about the energy? That is an interesting question. 
With our arms pulled in, we turn faster, but our moment of inertia is less, and it 
looks as though the energies might be equal. But they are not, because what does 
balance is 7w, not Ius 2 . So if we compare the kinetic energy before and after, the 
kinetic energy before is 2 / x w i = \Lu> x , where L = 7 X « X = / 2 w 2 is the angular 
momentum. Afterward, by the same argument, we have T = \Lu> 2 , and since 
co 2 > co x the kinetic energy of rotation is greater than it was before. So we had a 
certain energy when our arms were out, and when we pulled them in, we were turn- 
ing faster and had more kinetic energy. What happened to the theorem of the 
conservation of energy? Somebody must have done some work. We did work! 
When did we do any work? When we move a weight horizontally, we do not do 
any work. If we hold a thing out and pull it in, we do not do any work. But that 
is when we are not rotating! When we are rotating, there is centrifugal force on 
the weights. They are trying to fly out, so when we are going around we have to 
pull the weights in against the centrifugal force. So, the work we do against the 
centrifugal force ought to agree with the difference in rotational energy, and of 
course it does. That is where the extra kinetic energy comes from. 

There is still another interesting feature which we can treat only descriptively, 
as a matter of general interest. This feature is a little more advanced, but is worth 
pointing out because it is quite curious and produces many interesting effects. 

Consider that turntable experiment again. Consider the body and the arms 
separately, from the point of view of the man who is rotating. After the weights 
are pulled in, the whole object is spinning faster, but observe, the central part of 
the body is not changed, yet it is turning faster after the event than before. So, if 
we were to draw a circle around the inner body, and consider only objects inside the 
circle, their angular momentum would change-, they are going faster. Therefore 
there must be a torque exerted on the body while we pull in our arms. No torque 
can be exerted by the centrifugal force, because that is radial. So that means 
that among the forces that are developed in a rotating system, centrifugal force is 
not the entire story, there is another force. This other force is called Coriolis force, 
and it has the very strange property that when we move something in a rotating 
system, it seems to be pushed sidewise. Like the centrifugal force, it is an apparent 
force. But if we live in a system that is rotating, and move something radially, we 
find that we must also push it sidewise to move it radially. This sidewise push which 
we have to exert is what turned our body around. 

Now let us develop a formula to show how this Coriolis force really works. 
Suppose Moe is sitting on a carousel that appears to him to be stationary. But from 
the point of view of Joe, who is standing on the ground and who knows the right 
laws of mechanics, the carousel is going around. Suppose that we have drawn a 
radial line on the carousel, and that Moe is moving some mass radially along this 
line. We would like to demonstrate that a sidewise force is required to do that. 
We can do this by paying attention to the angular momentum of the mass. It is 
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always going around with the same angular velocity to, so that the angular mo- 
mentum is 

L = mvt^r — moor * r — moor 2 . 


So when the mass is close to the center, it has relatively little angular momentum, 
but if we move it to a new position farther out, if we increase r, m has more angular 
momentum, so a torque must be exerted in order to move it along the radius. 
(To walk along the radius in a carousel, one has to lean over and push sidewise. 
Try it sometime.) The torque that is required is the rate of change of L with time 
as m moves along the radius. If m moves only along the radius, omega stays con- 
stant, so that the torque is 


„ dL d(moor 2 ) 

T = Fcr = li = dt 


Imoor 


dr 

dt 


where F c is the Coriolis force. What we really want to know is what sidewise force 
has to be exerted by Moe in order to move m out at speed v r = dr/dt. This is 
F c = r/r — 2moov r . 

Now that we have a formula for the Coriolis force, let us look at the situation 
a little more carefully, to see whether we can understand the origin of this force 
from a more elementary point of view. We note that the Coriolis force is the same 
at every radius, and is evidently present even at the origin! But it is especially 
easy to understand it at the origin, just by looking at what happens from the in- 
ertial system of Joe, who is standing on the ground. Figure 19-4 shows three 
successive views of m just as it passes the origin at t — 0. Because of the rotation 
of the carousel, we see that m does not move in a straight line, but in a curved path 
tangent to a diameter of the carousel where r = 0. In order for m to go in a curve, 
there must be a force to accelerate it in absolute space. This is the Coriolis force. 

This is not the only case in which the Coriolis force occurs. We can also 
show that if an object is moving with constant speed around the circumference of 
a circle, there is also a Coriolis force. Why? Moe sees a velocity v M around the 
circle. On the other hand, Joe sees m going around the circle with the velocity 
vj = vm H - o)r, because m is also carried by the carousel. Therefore we know what 
the force really is, namely, the total centripetal force due to the velocity vj , or 
mvj/r; that is the actual force. Now from Moe’s point of view, this centripetal 
force has three pieces. We may write it all out as follows: 



Fig. 1 9-4. Three successive views of 
a point moving radially on a rotating 
turntable. 


F r = - 


mv 2 mvjf * 2 

= 2mv M oo — moo r. 


Now, F r is the force that Moe would see. Let us try to understand it. Would Moe 
appreciate the first term? “Yes,” he would say, “even if I were not turning, there 
would be a centripetal force if I were to run around a circle with velocity v M ” 
This is simply the centripetal force that Moe would expect, having<nothing to do 
with rotation. In addition, Moe is quite aware that there is another centripetal 
force that would act even on objects which are standing still On his carousel. This 
is the third term. But there is another term in addition to these, namely the second 
term, which is again 2moov. The Coriolis force F c was tangential when the velocity 
was radial, and now it is radial when the velocity is tangential. In fact, one ex- 
pression has a minus sign relative to the other. The force is always in the same 
direction, relative to the velocity, no matter in which direction the velocity is. 
The force is at right angles to the velocity, and of magnitude 2 moov. 
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20 


Rotation in space 


20-1 Torques in three dimensions 

In this chapter we shall discuss one of the most remarkable and amusing 
consequences of mechanics, the behavior of a rotating wheel. In order to do this 
we must first extend the mathematical formulation of rotational motion, the 
principles of angular momentum, torque, and so on, to three-dimensional space. 
We shall not use these equations in all their generality and study all their conse- 
quences, because this would take many years, and we must soon turn to other 
subjects. In an introductory course we can present only the fundamental laws and 
apply them to a very few situations of special interest. 

First, we notice that if we have a rotation in three dimensions, whether of a 
rigid body or any other system, what we deduced for two dimensions is still right. 
That is, it is still true that xF y — yF x is the torque “in the xy- plane,” or the 
torque “around the z-axis.” It also turns out that this torque is still equal to the rate 
of change of xp y - yp x , for if we go back over the derivation of Eq. ( 18 . 15 ) from 
Newton’s laws we see that we did not have to assume that the motion was in a 
plane; when we differentiate xp y - yp x , we get xF y — yF x , so this theorem is 
still right. The quantity xp y — yp x , then, we call the angular momentum belonging 
to the xjy-plane, or the angular momentum about the z-axis. This being true, we 
can use any other pair of axes and get another equation. For instance, we can 
use the .yz-plane, and it is clear from symmetry that if we just substitute y for x 
and z for y, we would find yF z — zF y for the torque and yp 2 — zp y would be the 
angular momentum associated with the jz-plane. Of course we could have another 
plane, the zx-plane, and for this we would find zF x — xF z = d/dt (zp x — xp z ). 

That these three equations can be deduced for the motion of a single particle 
is quite clear. Furthermore, if we added such things as xp y — yp x together for 
many particles and called it the total angular momentum, we would have three 
kinds for the three planes xy, yz , and zx, and if we did the same with the forces, 
we would talk about the torque in the planes xy, yz, and zx also. Thus we would 
have laws that the external torque associated with any plane is equal to the rate 
of change of the angular momentum associated with that plane. This is just a 
generalization of what we wrote in two dimensions. 

But now one may say, “Ah, but there are more planes; after all, can we not 
take some other plane at some angle, and calculate the torque on that plane from 
the forces? Since we would have to write another set of equations for every such 
plane, we would have a lot of equations!” Interestingly enough, it turns out that 
if we were to work out the combination x'/y — y f F x > for another plane, measuring 
the x f , F y >, etc., in that plane, the result can be written as some combination of the 
three expressions for the xy-, yz- and zx-planes. There is nothing new. In other 
words, if we know what the three torques in the xy-, yz-, and zx-planes are, then 
the torque in any other plane, and correspondingly the angular momentum also, 
can be written as some combination of these: six percent of one and ninety-two 
percent of another, and so on. This property we shall now analyze. 

Suppose that in the xyz-axes, Joe has worked out all his torques and his angu- 
lar momenta in his planes. But Moe has axes x f , y' , z' in some other direction. To 
make it a little easier, we shall suppose that only the x- and y- axes have been turned. 
Moe’s x’ and y f are new, but his z' happens to be the same. That is, he has new 
planes, let us say, for yz and zx. He therefore has new torques and angular momenta 
which he would work out. For example, his torque in the x'/ -plane would be 
equal to x'Ty — y*F x > and so forth. What we must now do is to find the relation- 
ship between the new torques and the old torques, so we will be able to make a 
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connection from one set of axes to the other. Someone may say, “That looks just 
like what we did with vectors.” And indeed, that is exactly what we are intending 

to do. Then he may say, “Well, isn’t torque just a vector?” It does turn out to 

be a vector, but we do not know that right away without making an analysis. So 
in the following steps we shall make the analysis. We shall not discuss every step 
in detail, since we only want to illustrate how it works. The torques calculated 
by Joe are 

T xy — xF y yF x<) 

Tyz = yF z - zFy , (20,1) 

t~zx zF x xF z . 


We digress at this point to note that in such cases as this one may get the wrong sign 
for some quantity if the coordinates are not handled in the right way. Why not write 
Ty Z = zFy - yF z ? The problem arises from the fact that a coordinate system may be 
either right-handed or “left-handed.” Having chosen (arbitrarily) a sign for, say 
T xyi then the correct expressions for the other two quantities may always be found by 
interchanging the letters xyz in either order 


* 



y 


or 




/\ 


Moe now calculates the torques in his system: 


T x'y ' 

= X'Fyr 

- y'F x ~ 

Ty*z’ 

II 

~ Z'Fy. 

T Z ' X ' 

£ 

N 

II 

- xTv 


Now we suppose that one coordinate system is rotated by a fixed angle 0, such 
that the z- and z'-axes are the same. (This angle 0 has nothing to do with rotating 
objects or what is going on inside the coordinate system. It is merely the relation- 
ship between the axes used by one man and the axes used by the other, and is 
supposedly constant.) Thus the coordinates of the two systems are related by 

x' = * cos 6 + y sin 9, 

y' = y cos 0 — x sin 0, (20.3) 

z' = z. 

Likewise, because force is a vector it transforms into the new system in the same 
way as do x, y, and z, since a thing is a vector if and only if the various components 
transform in the same way as *, y, and z: 

F X ' = F x cos 9 + F y sin 0, 

F v' = F y cos 0 - F x sin 0, (20.4) 

F t - = F t . 

Now we can find out how the torque transforms by merely substituting for 
/, and z' the expressions (20.3), and for F x -, F y >, F x . those given by (20.4), all 
into (20.2). So, we have a rather long string of terms for r xV and (rather surpris- 
ingly at first) it turns out that it comes right down to xF y - yF x , which we recog- 
nize to be the torque in the xy-plane: 

r* v = (* cos 9 + y sin 9){F y cos 9 - F z sin 9) 

“O' cos 9 - * sin 0)(F X cos 0 + F y sin 9) 

— xF y (c os 2 9 + sin 2 0) — yF x (sin 2 0 + cos 2 6) 

+xF x {— sin 0cos 9 + sin 9 cos 9) 

+y/v(sin 9 cos 9 — sin 9 cos 9) 

= xF v - yF x = T , 
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That result is clear, for if we only turn our axes in the plane , the twist around z 
in that plane is no different than it was before, because it is the same plane! What 
will be more interesting is the expression for T y > z ', because that is a new plane. 
We now do exactly the same thing with the y'z'- plane, and it comes out as follows: 

Ty' z ’ = (y cos 8 — x sin 6)F Z 

—z(F y cos 8 ~ F x sin 8) 

= ( yF z — zFy) cos 8 + ( zF x — xF z ) sin 8 
= r yz cos 8 + t zx sin 8. (20.6) 

Finally, we do it for zY: 

T Z ' X ’ — z(F x cos 8 + Fy sin 6) 

— (jc cos 6 + y sin 8)F Z 
= (zF x — xF z ) cos 8 — (yF z — zF y ) sin 8 
= t zx cos 8 — Tyz sin 8 . (20.7) 

We wanted to get a rule for finding torques in new axes in terms of torques 
in old axes, and now we have the rule. How can we ever remember that rule? 
If we look carefully at (20.5), (20.6), and (20.7), we see that there is a close relation- 
ship between these equations and the equations for jc, y, and z. If, somehow, we 
could call r xy the z-component of something, let us call it the z-component of r, 
then it would be all right; we would understand (20.5) as a vector transformation, 
since the z-component would be unchanged, as it should be. Likewise, if we 
associate with the jz-plane the jc-component of our newly invected vector, and 
with the zx-plane, the ^-component, then these transformation expressions would 
read 

T Z ' = t z , 

7V = t x cos 8 + t v sin 8 , (20.8) 

?y = t v cos 8 — r x sin 8 , 

which is just the rule for vectors! 

Therefore we have proved that we may identify the combination of xF y — yF x 
with what we ordinarily call the z-component of a certain artificially invented 
vector. Although a torque is a twist on a plane, and it has no a priori vector char- 
acter, mathematically it does behave like a vector. This vector is at right angles to 
the plane of the twist, and its length is proportional to the strength of the twist. The 
three components of such a quantity will transform like a real vector. 

So we represent torques by vectors; with each plane on which the torque is 
supposed to be acting, we associate a line at right angles, by a rule. But “at right 
angles” leaves the sign unspecified. To get the sign right, we must adopt a rule 
which will tell us that if the torque were in a certain sense on the xy- plane, then 
the axis that we want to associate with it is in the “up” z-direction. That is, some- 
body has to define “right” and “left” for us. Supposing that the coordinate system 
is x, y, z in a right-hand system, then the rule will be the following: if we think of 
the twist as if we were turning a screw having a right-hand thread, then the direction 
of the vector that we will associate with that twist is in the direction that the screw 
would advance. 

Why is torque a vector? It is a miracle of good luck that we can associate a 
single axis with a plane, and therefore that we can associate a vector with the 
torque ; it is a special property of three-dimensional space. In two dimensions, the 
torque is an ordinary scalar, and there need be no direction associated with it. 
In three dimensions, it is a vector. If we had four dimensions, we would be in great 
difficulty, because (if we had time, for example, as the fourth dimension) we would 
not only have planes like xy, yz, and zx, we would also have tx~, ty-, and /z-planes. 
There would be six of them, and one cannot represent six quantities as one vector 
in four dimensions. 

We will be living in three dimensions for a long time, so it is well to notice 
that the foregoing mathematical treatment did not depend upon the fact that x 
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was position and F was force; it only depended on the transformation laws for 
vectors. Therefore if, instead of x, we used the ^-component of some other vector, 
it is not going to make any difference. In other words, if we were to calculate 
a x b v — a y b x , where a and b are vectors, and call it the z-component of some new 
quantity c, then these new quantities form a vector c. We need a mathematical 
notation for the relationship of the new vector, with its three components, to the 
vectors a and b. The notation that has been devised for this is c = a X b. We 
have then, in addition to the ordinary scalar product in the theory of vector 
analysis, a new kind of product, called the vector product. Thus, if c = a X b, 
this is the same as writing 

c x — ctybz t 2 x by, 

Cy tz x b x a x b z . (20.9) 

C Z Ct X by Ctyb x . 

If we reverse the order of a and b, calling a, b and b, a, we would have the sign 

of c reversed, because c z would be b x a y — b v a x . Therefore the cross product is 
unlike ordinary multiplication, where ab = ba; for the cross product, b X a = 
— a X b. From this, we can prove at once that if a = b, the cross product is 
zero. Thus, a X a = 0. 

The cross product is very important for representing the features of rotation, 
and it is important that we understand the geometrical relationship of the three 
vectors a, b, and c. Of course the relationship in components is given in Eq. (20.9) 
and from that one can determine what the relationship is in geometry. The answer 
is, first, that the vector c is perpendicular to both a and b. (Try to calculate c ■ a, 
and see if it does not reduce to zero.) Second, the magnitude of c turns out to be 
the magnitude of a times the magnitude of b times the sine of the angle between 
the two. In which direction does c point? Imagine that we turn a into b through 
an angle less than 180°; a screw with a right-hand thread turning in this way will 
advance in the direction of c. The fact that we say a right- hand screw instead of a 
left- hand screw is a convention, and is a perpetual reminder that if a and b are 
“honest” vectors in the ordinary sense, the new kind of “vector” which we have 
created by a X b is artificial, or slightly different in its character from a and b, 
because it was made up with a special rule. If a and b are called ordinary vectors, 
we have a special name for them, we call them polar vectors. Examples of such 
vectors are the coordinate r, force F, momentum p, velocity v, electric field E, etc. ; 
these are ordinary polar vectors. Vectors which involve just one cross product in 
their definition are called axial vectors or pseudo vectors. Examples of pseudovectors 
are, of course, torque t and the angular momentum L. It also turns out that the 
angular velocity u is a pseudovector, as is the magnetic field B. 

In order to complete the mathematical properties of vectors, we should know 
all the rules for their multiplication, using dot and cross products. In our applica- 
tions at the moment, we will need very little of this, but for the sake of completeness 
we shall write down all of the rules for vector multiplication so that we can use 
the results later. These are 

(a) a X (b + c) = a X b + a X c, 

(b) (aa) X b = «(a X b), 

(c) a • (b X c) = (a X b) • c, (20.10) 

(d) a X (b X c) = b(a • c) - c(a • b), 

(e) a X a = 0, 

(f) a ■ (a X b) = 0. 

20-2 The rotation equations using cross products 

Now let us ask whether any equations in physics can be written using the 
cross product. The answer, of course, is that a great many equations can be so 
written. For instance, we see immediately that the torque is equal to the position 
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vector cross the force : 


r = r X F. 


( 20 . 11 ) 


This is a vector summary of the three equations t x — yF z — zF y , etc. By the 
same token, the angular momentum vector, if there is only one particle present, 
is the distance from the origin multiplied by the vector momentum: 


L = r X p. 


( 20 . 12 ) 


For three-dimensional space rotation, the dynamical law analogous to the law 
F = dp/dt of Newton, is that the torque vector is the rate of change with time of 
the angular momentum vector: 

r = dL/dt. (20.13) 


If we sum (20.13) over many particles, the external torque on a system is the rate 
of change of the total angular momentum: 


'Text — dXu^Q^/dt, (20.14) 

Another theorem: If the total external torque is zero, then the total vector 
angular momentum of the system is a constant. This is called the law of conserva- 
tion of angular momentum. If there is no torque on a given system, its angular 
momentum cannot change. 

What about angular velocity? Is it a vector? We have already discussed 
turning a solid object about a fixed axis, but for a moment suppose that we are 
turning it simultaneously about two axes. It might be turning about an axis inside 
a box, while the box is turning about some other axis. The net result of such 
combined motions is that the object simply turns about some new axis! The 
wonderful thing about this new axis is that it can be figured out this way. If the 
rate of turning in the xy-plane is written as a vector in the z-direction whose length 
is equal to the rate of rotation in the plane, and if another vector is drawn in the 
^-direction, say, which is the rate of rotation in the zx-plane, then if we add these 
together as a vector, the magnitude of the result tells us how fast the object is 
turning, and the direction tells us in what plane, by the rule of the parallelogram. 
That is to say, simply, angular velocity is a vector, where we draw the magnitudes 
of the rotations in the three planes as projections at right angles to those planes.* 

As a simple application of the use of the angular velocity vector, we may evalu- 
ate the power being expended by the torque acting on a rigid body. The power, of 
course, is the rate of change of work with time; in three dimensions, the power 
turns out to be P = r • to. 

All the formulas that we wrote for plane rotation can be generalized to three 
dimensions. For example, if a rigid body is turning about a certain axis with 
angular velocity <*>, we might ask, “What is the velocity of a point at a certain 
radial position r?” We shall leave it as a problem for the student to show that the 
velocity of a particle in a rigid body is given by y = « X r, where w is the angular 
velocity and r is the position. Also, as another example of cross products, we had a 
formula for Coriolis force, which can also be written using cross products: 
F c = 2m\ X w. That is, if a particle is moving with velocity v in a coordinate 
system which is, in fact, rotating with angular velocity w, and we want to think in 
terms of the rotating coordinate system, then we have to add the pseudoforce F c . 

20-3 The gyroscope 

Let us now return to the law of conservation of angular momentum. This 
law may be demonstrated with a rapidly spinning wheel, or gyroscope, as follows 
(see Fig. 20-1). If we sit on a swivel chair and hold the spinning wheel with its 
axis horizontal, the wheel has an angular momentum about the horizontal axis. 


* That this is true can be derived by compounding the displacements of the particles 
of the body during an infinitesimal time At. It is not self-evident, and is left to those who 
are interested to try to figure it out. 



BEFORE AFTER 


Fig. 20-1. Before: axis is horizontal; 
moment about vertical axis = 0. After: 
axis is vertical; momentum about vertical 
axis is still zero; man and chair spin in 
direction opposite to spin of the wheel. 
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Angular momentum around a vertical axis cannot change because of the (friction- 
less) pivot of the chair, so if we turn the axis of the wheel into the vertical, then the 
wheel would have angular momentum about the vertical axis, because it is now 
spinning about this axis. But the system (wheel, ourself, and chair) cannot have a 
vertical component, so we and the chair have to turn in the direction opposite 
to the spin of the wheel, to balance it. 

First let us analyze in more detail the thing we have just described. What is 
surprising, and what we must understand, is the origin of the forces which turn 
us and the chair around as we turn the axis of the gyroscope toward the vertical. 
Figure 20-2 shows the wheel spinning rapidly about the j-axis. Therefore its 
angular velocity is about that axis and, it turns out, its angular momentum is like- 
wise in that direction. Now suppose that we wish to rotate the wheel about the 
x-axis at a small angular velocity 0; what forces are required? After a short time 
At, the axis has turned to a new position, tilted at an angle Ad with the horizontal. 
Since the major part of the angular momentum is due to the spin on the axis (very 
little is contributed by the slow turning), we see that the angular momentum vector 
has changed. What is the change in angular momentum? The angular momentum 
does not change in magnitude, but it does change in direction by an amount Ad. 
The magnitude of the vector AL is thus AL = L 0 Ad, so that the torque, which is 
the time rate of change of the angular momentum, is r = AL/At = L 0 A 6/ At = 
L 0 Q. Taking the directions of the various quantities into account, we see that 


r = a X L 0 . 


(20.15) 



Thus, if £2 and L 0 are both horizontal, as shown in the figure, t is vertical. To 
produce such a torque, horizontal forces F and -F must be applied at the ends of 
the axle. How are these forces applied? By our hands, as we try to rotate the 
axis of the wheel into the vertical direction. But Newton’s Third Law demands 
that equal and opposite forces (and equal and opposite torques) act on us. This 
causes us to rotate in the opposite sense about the vertical axis z. 

This result can be generalized for a rapidly spinning top. In the familiar case 
of a spinning top, gravity acting on its center of mass furnishes a torque about the 
point of contact with the floor (see Fig. 20-3). This torque is in the horizontal 
direction, and causes the top to precess with its axis moving in a circular cone about 
the vertical. If a is the (vertical) angular velocity of precession, we again find that 


Fig. 20-3. A rapidly spinning top. 
Note that the direction of the torque 
vector is the direction of the precession. 
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Fig. 20-4. The motion of particles in 
the spinning wheel of Fig. 20-2, whose 
axis is turning, is in curved lines. 


r = <fL/dt = a X L 0 . 

Thus, when we apply a torque to a rapidly spinning top, the direction of the 
precessional motion is in the direction of the torque, or at right angles to the 
forces producing the torque. 

We may now claim to understand the precession of gyroscopes, and indeed 
we do, mathematically. However, this is a mathematical thing which, in a sense, 
appears as a “miracle.” It will turn out, as we go to more and more advanced 
physics, that many simple things can be deduced mathematically more rapidly 
than they can be really understood in a fundamental or simple sense. This is a 
strange characteristic, and as we get into more and more advanced work there are 
circumstances in which mathematics will produce results which no one has really 
been able to understand in any direct fashion. An example is the Dirac equation, 
which appears in a very simple and beautiful form, but whose consequences are 
hard to understand. In our particular case, the precession of a top looks like some 
kind of a miracle involving right angles and circles, and twists and right-hand 
screws. What we should try to do is to understand it in a more physical way. 

How can we explain the torque in terms of the real forces and the accelerations? 
We note that when the wheel is precessing, the particles that are going around the 
wheel are not really moving in a plane because the wheel is precessing (see Fig. 
20-4). As we explained previously (Fig. 19-4), the particles which are crossing 
through the precession axis are moving in curved paths, and this requires application 
of a lateral force. This is supplied by our pushing on the axle, which then com- 
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municates the force to the rim through the spokes. “Wait,” someone says, “what 
about the particles that are going back on the other side?” It does not take long 
to decide that there must be a force in the opposite direction on that side. The net 
force that we have to apply is therefore zero. The forces balance out, but one of 
them must be applied at one side of the wheel, and the other must be applied at the 
other side of the wheel. We could apply these forces directly, but because the wheel 
is solid we are allowed to do it by pushing on the axle, since forces can be carried 
up through the spokes. 

What we have so far proved is that if the wheel is precessing, it can balance 
the torque due to gravity or some other applied torque. But all we have shown is 
that this is a solution of an equation. That is, if the torque is given, and if we get 
the spinning started right , then the wheel will precess smoothly and uniformly. 
But we have not proved (and it is not true) that a uniform precession is the most 
general motion a spinning body can undergo as the result of a given torque. The 
general motion involves also a “wobbling” about the mean precession. This 
“wobbling” is called nutation . 

Some people like to say that when one exerts a torque on a gyroscope, it turns 
and it precesses, and that the torque produces the precession. It is very strange that 
when one suddenly lets go of a gyroscope, it does not fall under the action of gravity, 
but moves sidewise instead! Why is it that the downward force of the gravity, which 
we know and feel , makes it go sidewise? All the formulas in the world like (20.15) 
are not going to tell us, because (20.15) is a special equation, valid only after the 
gyroscope is precessing nicely. What really happens, in detail, is the following. 
If we were to hold the axis absolutely fixed, so that it cannot precess in any manner 
(but the top is spinning) then there is no torque acting, not even a torque from 
gravity, because it is balanced by our fingers. But if we suddenly let go, then there 
will instantaneously be a torque from gravity. Anyone in his right mind would 
think that the top would fall, and that is what it starts to do, as can be seen if the 
top is not spinning too fast. 

The gyro actually does fall, as we would expect. But as soon as it falls, it is 
then turning, and if this turning were to continue, a torque would be required. 
In the absence of a torque in this direction, the gyro begins to “fall” in the direction 
opposite that of the missing force. This gives the gyro a component of motion 
around the vertical axis, as it would have in steady precession. But the actual motion 
“overshoots” the steady precessional velocity, and the axis actually rises again to 
the level from which it started. The path followed by the end of the axle is a cycloid 
(the path followed by a pebble that is stuck in the tread of an automobile tire). 
Ordinarily, this motion is too quick for the eye to follow, and it damps out quickly 
because of the friction in the gimbal bearings, leaving only the steady preces- 
sional drift (Fig. 20-5). The slower the wheel spins, the more obvious the nu- 
tation is. 

When the motion settles down, the axis of the gyro is a little bit lower than it 
was at the start. Why? (These are the more complicated details, but we bring them 
in because we do not want the reader to get the idea that the gyroscope is an abso- 
lute miracle. It is a wonderful thing, but it is not a miracle.) If we were holding 
the axis absolutely horizontally, and suddenly let go, then the simple precession 
equation would tell us that it precesses, that it goes around in a horizontal plane. 
But that is impossible! Although we neglected it before, it is true that the wheel has 
some moment of inertia about the precession axis, and if it is moving about that 
axis, even slowly, it has a weak angular momentum about the axis. Where did it 
come from? If the pivots are perfect, there is no torque about the vertical axis. 
How then does it get to precess if there is no change in the angular momentum? 
The answer is that the cycloidal motion of the end of the axis damps down to the 
average, steady motion of the center of the equivalent rolling circle. That is, it set- 
tles down a little bit low. Because it is low, the spin angular momentum now has 
a small vertical component, which is exactly what is needed for the precession. So 
you see it has to go down a little, in order to go around. It has to yield a little bit 
to the gravity; by turning its axis down a little bit, it maintains the rotation about 
the vertical axis. That, then, is the way a gyroscope works. 



Fig. 20-5. Actual motion of tip of 
axis of gyroscope under gravity just 
after releasing axis previously held fixed. 
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Fig. 20-6. The angular momentum of 
a rotating body is not necessarily parallel 
to the angular velocity. 



Fig. 20-7. The angular velocity and 
angular momentum of a rigid body 
(A > B > C). 


20-4 Angular momentum of a solid body 

Before we leave the subject of rotations in three dimensions, we shall discuss, 
at least Qualitatively, a few effects that occur in three-dimensional rotations that are 
not self-evident. The main effect is that, in general, the angular momentum of a 
rigid body is not necessarily in the same direction as the angular velocity. Consider 
a wheel that is fastened onto a shaft in a lopsided fashion, but with the axis through 
the center of gravity, to be sure (Fig. 20-6). When we spin the wheel around the 
axis, anybody knows that there will be shaking at the bearings because of the 
lopsided way we have it mounted. Qualitatively, we know that in the rotating 
system there is centrifugal force acting on the wheel, trying to throw its mass as 
far as possible from the axis. This tends to line up the plane of the wheel so that il 
is perpendicular to the axis. To resist this tendency, a torque is exerted by the 
bearings. If there is a torque exerted by the bearings, there must be a rate of change 
of angular momentum. How can there be a rate of change of angular momentum 
when we are simply turning the wheel about the axis? Suppose we break the 
angular velocity « into components wj and « 2 perpendicular and parallel to the 
plane of the wheel. What is the angular momentum? The moments of inertia 
about these two axes are different, so the angular momentum components, which 
(in these particular, special axes only) are equal to the moments of inertia times the 
corresponding angular velocity components, are in a different ratio than are the 
angular velocity components. Therefore the angular momentum vector is in a 
direction in space not along the axis. When we turn the object, we have to turn the 
angular momentum vector in space, so we must exert torques on the shaft. 

Although it is much too complicated to prove here, there is a very important 
and interesting property of the moment of inertia which is easy to describe and to 
use, and which is the basis of our above analysis. This property is the following: 
Any rigid body, even an irregular one like a potato, possesses three mutually 
perpendicular axes through the CM, such that the moment of inertia about one of 
these axes has the greatest possible value for any axis through the CM, the moment 
of inertia about another of the axes has the minimum possible value, and the 
moment of inertia about the third axis is intermediate between these two (or equal 
to one of them). These axes are called the principal axes of the body, and they have 
the important property that if the body is rotating about one of them, its angular 
momentum is in the same direction as the angular velocity. For a body having 
axes of symmetry, the principal axes are along the symmetry axes. 

If we take the y-, and z-axes along the principal axes, and call the corre- 
sponding principal moments of inertia A, B, and C, we may easily evaluate the 
angular momentum and the kinetic energy of rotation of the body for any angular 
velocity <o. If we resolve u into components u x , u y , and w 2 along the x-, y-, z-axes, 
and use unit vectors i, j, k, also along x, y, z, we may write the angular momentum 
as 

L = Au x i + Bu y j + Cw 2 k. (20.16) 

The kinetic energy of rotation is 

KE = £(Aul + Bui + C<4) (20.17) 

= £L • u. 
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21 


The Harmonic Oscillator 


21-1 Linear differential equations 

In the study of physics, usually the course is divided into a series of subjects, 
such as mechanics, electricity, optics, etc., and one studies one subject after the 
other. For example, this course has so far dealt mostly with mechanics. But a 
strange thing occurs again and again: the equations which appear in different 
fields of physics, and even in other sciences, are often almost exactly the same, so 
that many phenomena have analogs in these different fields. To take the simplest 
example, the propagation of sound waves is in many ways analogous to the propaga- 
tion of light waves. If we study acoustics in great detail we discover that much of 
the work is the same as it would be if we were studying optics in great detail. So 
the study of a phenomenon in one field may permit an extension of our knowledge 
in another field. It is best to realize from the first that such extensions are possible, 
for otherwise one might not understand the reason for spending a great deal of 
time and energy on what appears to be only a small part of mechanics. 

The harmonic oscillator, which we are about to study, has close analogs in 
many other fields; although we start with a mechanical example of a weight on a 
spring, or a pendulum with a small swing, or certain other mechanical devices, we 
are really studying a certain differential equation. This equation appears again 
and again in physics and in other sciences, and in fact it is a part of so many 
phenomena that its close study is well worth our while. Some of the phenomena 
involving this equation are the oscillations of a mass on a spring; the oscillations 
of charge flowing back and forth in an electrical circuit; the vibrations of a t unin g 
fork which is generating sound waves; the analogous vibrations of the electrons 
in an atom, which generate light waves; the equations for the operation of a 
servosystem, such as a thermostat trying to adjust a temperature; complicated 
interactions in chemical reactions; the growth of a colony of bacteria in interaction 
with the food supply and the poisons the bacteria produce; foxes eating rabbits 
eating grass, and so on; all these phenomena follow equations which are very 
similar to one another, and this is the reason why we study the mechanical oscillator 
in such detail. The equations are called linear differential equations with constant 
coefficients. A linear differential equation with constant coefficients is a differential 
equation consisting of a sum of several terms, each term being a derivative of the 
dependent variable with respect to the independent variable, and multiplied by 
some constant. Thus 

a n d n x/dt n + a n _i d n ~ x xldt n ~ x + ■ • • + a x dx/dt + a 0 x = f(f) (21.1) 

is called a linear differential equation of order n with constant coefficients (each 
di is constant). 


21-2 The harmonic oscillator 

Perhaps the simplest mechanical system whose motion follows a linear differ- 
ential equation with constant coefficients is a mass on a spring: first the spring 
stretches to balance the gravity; once it is balanced, we then discuss the vertical 
displacement of the mass from its equilibrium position (Fig. 21-1). We shall call 
this upward displacement x , and we shall also suppose that the spring is perfectly 
linear, in which case the force pulling back when the spring is stretched is pre- 
cisely proportional to the amount of stretch. That is, the force is ~kx (with a 
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Fig. 21-1. A mass on a spring: a 
simple example of a harmonic oscillator. 
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minus sign to remind us that it pulls back). Thus the mass times the acceleration 
must equal — kx: 

md 2 x/dt 2 = — kx. (21.2) 

For simplicity, suppose it happens (or we change our unit of time measurement) 
that the ratio k/m — 1. We shall first study the equation 

d 2 x/dt 2 = -x. (21.3) 

Later we shall come back to Eq. (21.2) with the k and m explicitly present. 

We have already analyzed Eq. (21.3) in detail numerically; when we first 
introduced the subject of mechanics we solved this equation (see Eq. 9.12) to find 
the motion. By numerical integration we found a curve (Fig. 9-4) which showed 
that if m was initially displaced, but at rest, it would come down and go through 
zero; we did not then follow it any farther, but of course we know that it just 
keeps going up and down— it oscillates. When we calculated the motion numer- 
ically, we found that it went through the equilibrium point at t = 1.570. The 
length of the whole cycle is four times this long, or t 0 = 6.28 “sec.” This was 
found numerically, before we knew much calculus. We assume that in the meantime 
the Mathematics Department has brought forth a function which, when differ- 
entiated twice, is equal to itself with a minus sign. (There are, of course, ways of 
getting at this function in a direct fashion, but they are more complicated than 
already knowing what the answer is.) The function is x = cos t. If we differentiate 
this we find dx/dt = —sin / and d 2 x/dt 2 = —cost — —x. The function x — 
cos t starts, at / = 0, with x = 1, and no initial velocity; that was the situation 
with which we started when we did our numerical work. Now that we know that 
x = cos /, we can calculate a precise value for the time at which it should pass 
x — 0. The answer is / = 7 t / 2 , or 1.57108. We were wrong in the last figure 
because of the errors of numerical analysis, but it was very close! 

Now to go further with the original problem, we restore the time units to 
real seconds. What is the solution then? First of all, we might think that we can 
get the constants k and m in by multiplying cos t by something. So let us try the 
equation x = A cost; then we find dx/dt — —Asint, and d 2 x/dt 2 ~ — A 
cos t — — x. Thus we discover to our horror that we did not succeed in solving 
Eq. (21.2), but we got Eq. (21.3) again! That fact illustrates one of the most 
important properties of linear differential equations: if we multiply a solution of 
the equation by any constant , it is again a solution. The mathematical reason for 
this is clear. If x is a solution, and we multiply both sides of the equation, say by A , 
we see that all derivatives are also multiplied by A, and therefore Ax is just as good 
a solution of the original equation as x was. The physics of it is the following. 
If we have a weight on a spring, and pull it down twice as far, the force is twice 
as much, the resulting acceleration is twice as great, the velocity it acquires in a 
given time is twice as great, the distance covered in a given time is twice as great; 
but it has to cover twice as great a distance in order to get back to the origin 
because it is pulled down twice as far. So it takes the same time to get back to the 
origin, irrespective of the initial displacement. In other words, with a linear equa- 
tion, the motion has the same time pattern , no matter how “strong” it is. 

That was the wrong thing to do — it only taught us that we can multiply the 
solution by anything, and it satisfies the same equation, but not a different equation. 
After a little cut and try to get to an equation with a different constant multiplying 
x, we find that we must alter the scale of time. In other words, Eq. (21.2) has a 
solution of the form 

x = cos a 0 t. (21.4) 

(It is important to realize that in the present case, co 0 is not an angular velocity of a 
spinning body, but we run out of letters if we are not allowed to use the same letter 
for more than one thing.) The reason we put a subscript “0” on co is that we are 
going to have more omegas before long; let us remember that co 0 refers to the 
natural motion of this oscillator. Now we try Eq. (21.4) and this time we are more 
successful, because dx/dt — coq sin coq/ and d 2 x / di 2 — — coq cos c oqI = — coqX. 
21-2 



So at last we have solved the equation that we really wanted to solve. The equation 
d 2 x/dt 2 = —uqX is the same as Eq. (21.2) if cog = k/m. 

The next thing we must investigate is the physical significance of w 0 . We 
know that the cosine function repeats itself when the angle it refers to is 2ir. So 
* = cos co 0 / will repeat its motion, it will go through a complete cycle, when the 
“angle” changes by 27r. The quantity u Q t is often called the phase of the motion. 
In order to change u 0 t by 2ir, the time must change by an amount t‘ 0 , called the 
period of one complete oscillation; of course t 0 must be such that a> 0 t 0 = 2ir. 
That is, u 0 t 0 must account for one cycle of the angle, and then everything will 
repeat itself— if we increase t by t 0 , we add 2ir to the phase. Thus 

to = 2ir/o)o = 2 -ksJ m/k. ( 21 . 5 ) 

Thus if we had a heavier mass, it would take longer to oscillate back and forth on 
a spring. That is because it has more inertia, and so, while the forces are the same, 
it takes longer to get the mass moving. Or, if the spring is stronger, it will move 
more quickly, and that is right: the period is less if the spring is stronger. 

Note that the period of oscillation of a mass on a spring does not depend in 
any way on how it has been started, how far down we pull it. The period is deter- 
mined, but the amplitude of the oscillation is not determined by the equation of 
motion (21.2). The amplitude is determined, in fact, by how we let go of it, by 
what we call the initial conditions or starting conditions. 

Actually, we have not quite found the most general possible solution of Eq. 
(21.2). There are other solutions. It should be clear why: because all of the cases 
covered by x = a cos u 0 t start with an initial displacement and no initial velocity. 
But it is possible, for instance, for the mass to start at x = 0, and we may then 
give it an impulsive kick, so that it has some speed at t = 0. Such a motion is not 
represented by a cosine — it is represented by a sine. To put it another way, if 
x = cos O) 0 t is a solution, then is it not obvious that if we were to happen to walk 
into the room at some time (which we would call “t = 0”) and saw the mass as 
it was passing x = 0, it would keep on going just the same? Therefore, x = cos 
a) 0 t cannot be the most general solution; it must be possible to shift the beginning 
of time, so to speak. As an example, we could write the solution this way: .v = 
a cos a> 0 (t — t i), where 1 1 is some constant. This also corresponds to shifting the 
origin of time to some new instant. Furthermore, we may expand 


and write 


cos (co 0 t + A) = cos o>o t cos A — sin o > 0 t sin A, 
x = A cos o) 0 < + B sin o > 0 t. 


where A = a cos A and B = —a sin A. Any one! of these forms is a possible 
way to write the complete, general solution of (21.2): that is, every solution of the 
differential equation d 2 x/dt 2 = — u> 2 x that exists irl the world can be written as 


or 

or 


(a) 

* = acoso) 0 (t — /i), 


(b) 

* = a cos (o) Q t + A), 

(21.6) 

(c) 

x = A cos coq/ + i?sinco 0 /. 



Some of the quantities in (21 .6) have names: wo is called the angular frequency, 
it is the number of radians by which the phase changes in a second. That is deter- 
mined by the differential equation. The other constants are not determined by the 
equation, but by how the motion is started. Of these constants, a measures the 
maximum displacement attained by the mass, and is called the amplitude of oscilla- 
tion. The constant A is sometimes called the phase of the oscillation, but that is a 
confusion, because other people call w 0 ? + A the phase, and say the phase changes 
with time. We might say that A is a phase shift from some defined zero. Let us put 
it differently. Different A’s correspond to motions in different phases. That is 
true, but whether we want to call A the phase, or not, is another question. 
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21-3 Harmonic motion and circular motion 



Fig. 21-2. A particle moving in a 
circular path at constant speed. 



Screen 


Fig. 21-3. Demonstration of the 
equivalence between simple harmonic 
motion and uniform circular motion. 


The fact that cosines are involved in the solution of Eq. (21.2) suggests that 
there might be some relationship to circles. This is artificial, of course, because 
there is no circle actually involved in the linear motion — it just goes up and down. 
We may point out that we have, in fact, already solved that differential equation 
when we were studying the mechanics of circular motion. If a particle moves in a 
circle with a constant speed v, the radius vector from the center of the circle to the 
particle turns through an angle whose size is proportional to the time. If we call 
this angle 8 = vt/R (Fig. 21-2) then dd/dt — co 0 = v/R . We know that there is 
an acceleration a = v 2 /R = coq R toward the center. Now we also know that the 
position x , at a given moment, is tl^e radius of the circle times cos 8 , and that y is 
the radius times sin 8: 

x ~ R cos 8 , y = R sin 8 . 

Now what about the acceleration? What is the jc-component of acceleration, 
d 2 x/dt 2 l We have already worked that out geometrically; it is the magnitude of 
the acceleration times the cosine of the projection angle, with a minus sign because 
it is toward the center. 


a x = — acos0 = — a> 2 i?cos0 = — co 2 x\ (21.7) 

In other words, when a particle is moving in a circle, the horizontal component of 
its motion has an acceleration which is proportional to the horizontal displacement 
from the center. Of course we also have the solution for motion in a circle: 
x — R cos a) 0 t. Equation (21.7) does not depend upon the radius of the circle, 
so for a circle of any radius, one finds the same equation for a given co 0 . Thus, 
for several reasons, we expect that the displacement of a mass on a spring will turn 
out to be proportional to cos oo 0 /, and will, in fact, be exactly the same motion as 
we would see if we looked at the x-component of the position of an object rotating 
in a circle with angular velocity co 0 . Asa check on this, one can devise an experi- 
ment to show that the up-and-down motion of a mass on a spring is the same as 
that of a point going around in a circle. In Fig. 21-3 an arc light projected on a 
screen casts shadows of a crank pin on a shaft and of a vertically oscillating mass, 
side by side. If we let go of the mass at the right time from the right place, and if 
the shaft speed is carefully adjusted so that the frequencies match, each should 
follow the other exactly. One can also check the numerical solution we obtained 
earlier with the cosine function, and see whether that agrees very well. 

Here we may point out that because uniform motion in a circle is so closely 
related mathematically to oscillatory up-and-down motion, we can analyze 
oscillatory motion in a simpler way if we imagine it to be a projection of something 
going in a circle. In other words, although the distance y means nothing in the 
oscillator problem, we may still artificially supplement Eq. (21.2) with another 
equation using y, and put the two together. If we do this, we will be able to analyze 
our one-dimensional oscillator with circular motions, which is a lot easier than 
having to solve a differential equation. The trick in doing this is to use complex 
numbers, a procedure we shall introduce in the next chapter. 

21-4 Initial conditions 

Now let us consider what determines the constants A and B, or a and A. Of 
course these are determined by how we start the motion. If we start the motion 
with just a small displacement, that is one type of oscillation ; if we start with an 
initial displacement and then push up when we let go, we get still a different 
motion. The constants A and B , or a and A, or any other way of putting it, are 
determined, of course, by the way the motion started, not by any other features of 
the situation. These are called the initial conditions. We would like to connect the 
initial conditions with the constants. Although this can be done using any one 
of the forms (21.6), it turns out to be easiest if we use Eq. (21.6c). Suppose that at 
t = 0 we have started with an initial displacement x 0 and a certain velocity v 0 . 
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This is the most general way we can start the motion. (We cannot specify the 
acceleration with which it started, true, because that is determined by the spring, 
once we specify x 0 .) Now let us calculate A and B. We start with the equation 
for x, 

x = A cos a) 0 t + B sin c o 0 t. 

Since we shall later need the velocity also, we differentiate x and obtain 
v = — co sin co of + coo^coscoq^ 


These expressions are valid for all t, but we have special knowledge about x and 
v at t = 0. So if we put t = 0 into these equations, on the left we get x 0 and v 0 , 
because that is what x and v are at t = 0; also, we know that the cosine of zero is 
unity, and the sine of zero is zero. Therefore we get 


and 


*o = A-l + B-0 = A 


v 0 = — o) 0 A • 0 + a )qB * 1 = c o q B. 


So for this particular case we find that 

A = x o, B = v 0 /<a Q . 


From these values of A and B, we can get a and A if we wish. 

That is the end of our solution, but there is one physically interesting thing 
to check, and that is the conservation of energy. Since there are no frictional 
losses, energy ought to be conserved. Let us use the formula 


then 


x = a cos (a) 0 t + A); 
v = — coq a sin ( o) 0 t + A). 


Now let us find out what the kinetic energy T is, and what the potential energy 
U is. The potential energy at any moment is %kx 2 , where * is the displacement and 
k is the constant of the spring. If we substitute for x, using our expression above, 
we get 

U — \kx 2 — %ka 2 cos 2 (c o 0 t + A). 


Of course the potential energy is not constant ; the potential never becomes negative, 
naturally — there is always some energy in the spring, but the amount of energy 
fluctuates with x. The kinetic energy, on the other hand, is mv 2 , and by sub- 
stituting for v we get 

T — \mv 2 — %ma)la 2 sin 2 (o) 0 t + A). 

Now the kinetic energy is zero when x is at the maximum, because then there is no 
velocity; on the other hand, it is maximal when x is passing through zero, because 
then it is moving fastest. This variation of the kinetic energy is just the opposite 
of that of the potential energy. But the total energy ought to be a constant. If 
we note that k — mool , we see that 

T + U = ^ino) 2 a 2 [cos 2 ( coq £ A) -j- sin 2 ( coq / 4 “ A)] = ^mcoQa 2 . 

The energy is dependent on the square of the amplitude; if we have twice the 
amplitude, we get an oscillation which has four times the energy. The average 
potential energy is half the maximum and, therefore, half the total, and the average 
kinetic energy is likewise half the total energy. 


21-5 Forced oscillations 

Next we shall discuss the forced harmonic oscillator , i.e., one in which there is 
an external driving force acting. The equation then is the following: 

md 2 x/dt 2 = — kx + F(t). (21.8) 
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We would like to find out what happens in these circumstances. The external driv- 
ing force can have various kinds of functional dependence on the time; the first 
one that we shall analyze is very simple— we shall suppose that the force is oscillat- 
ing: 

F(t) = F o cos cot. (21.9) 

Notice, however, that this co is not necessarily co 0 : we have co under our control; 
the forcing may be done at different frequencies. So we try to solve Eq. (21.8) 
with the special force (21.9). What is the solution of (21.8)? One special solution, 
(we shall discuss the more general cases later) is 

* = C cos cot, (21.10) 

where the constant is to be determined. In other words, we might suppose that if 
we kept pushing back and forth, the mass would follow back and forth in step 
with the force. We can try it anyway. So we put (21.10) into (21.9), and get 

income cos cot = —mcolC coscot + F 0 cos cot, (21.11) 

We have also put in k — mcol, so that we will understand the equation better at 
the end. Now because the cosine appears everywhere, we can divide it out, and 
that shows that (21.10) is, in fact, a solution, provided we pick C just right. The 
answer is that C must be 

C = F 0 /m(o>l - w 2 ). (21.12) 

That is, m oscillates at the same frequency as the force, but with an amplitude 
which depends on the frequency of the force, and also upon the frequency of the 
natural motion of the oscillator. It means, first, that if co is very small compared 
with co 0 , then the displacement and the force are in the same direction. On the 
other hand, if we shake it back and forth very fast, then (21.12) tells us that C is 
negative if co is above the natural frequency co 0 of the harmonic oscillator. (We 
will call co 0 the natural frequency of the harmonic oscillator, and co the applied 
frequency.) At very high frequency the denominator may become very large, and 
there is then not much amplitude. 

Of course the solution we have found is the solution only if things are started 
just right, for otherwise there is a part which usually dies out after a while. This 
other part is called the transient response to F(t), while (21.10) and (21.12) are 
called the steady-state response. 

According to our formula (21.12), a very remarkable thing should also occur: 
if co is almost exactly the same as co 0 , then C should approach infinity. So if we 
adjust the frequency of the force to be “in time” with the natural frequency, then 
we should get an enormous displacement. This is well known to anybody who has 
pushed a child on a swing. It does not work very well to close our eyes and push at 
a certain speed at random. If we happen to get the right timing, then the swing 
goes very high, but if we have the wrong timing, then sometimes we may be pushing 
when we should be pulling, and so on, and it does not work. 

If we make co exactly equal to co 0 , we find that it should oscillate at an infinite 
amplitude, which is, of course, impossible. The reason it does not is that something 
goes wrong with the equation, there are some other frictional terms, and other 
forces, which are not in (21.8) but which occur in the real world. So the amplitude 
does not reach infinity for some reason ; it may be that the spring breaks! 
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Algebra 


22-1 Addition and multiplication 

In our study of oscillating systems we shall have occasion to use one of the 
most remarkable, almost astounding, formulas in all of mathematics. From the 
physicist’s point of view we could bring forth this formula in two minutes or so, 
and be done with it. But science is as much for intellectual enjoyment as for prac- 
tical utility, so instead of just spending a few minutes on this amazing jewel, we 
shall surround the jewel by its proper setting in the grand design of that branch of 
mathematics which is called elementary algebra. 

Now you may ask, “What is mathematics doing in a physics lecture?” We 
have several possible excuses: first, of course, mathematics is an important tool, 
but that would only excuse us for giving the formula in two minutes. On the other 
hand, in theoretical physics we discover that all our laws can be written in mathe- 
matical form-, and that this has a certain simplicity and beauty about it. So, 
ultimately, in order to understand nature it may be necessary to have a deeper 
understanding of mathematical relationships. But the real reason is that the subject 
is enjoyable, and although we humans cut nature up in different ways, and we have 
different courses in different departments, such compartmentalization is really 
artificial, and we should take our intellectual pleasures where we find them. 

Another reason for looking more carefully at algebra now, even though most 
of us studied algebra in high school, is that that was the first time we studied it; all 
the equations were unfamiliar, and it was hard work, just as physics is now. Every 
so often it is a great pleasure to look back to see what territory has been covered, 
and what the great map or plan of the whole thing is. Perhaps some day somebody 
in the Mathematics Department will present a lecture on mechanics in such a way 
as to show what it was we were trying to learn in the physics course! 

The subject of algebra will not be developed from the point of view of a mathe- 
matician, exactly, because the mathematicians are mainly interested in how various 
mathematical facts are demonstrated, and how many assumptions are absolutely 
required, and what is not required. They are not so interested in the result of what 
they prove. For example, we may find the Pythagorean theorem quite interesting, 
that the sum of the squares of the sides of a right triangle is equal to the square of 
the hypotenuse; that is an interesting fact, a curiously simple thing, which may be 
appreciated without discussing the question of how to prove it, or what axioms are 
required. So, in the same spirit, we shall describe qualitatively, if we may put it 
that way, the system of elementary algebra. We say elementary algebra because 
there is a branch of mathematics called modern algebra in which some of the rules 
such as ab = ba , are abandoned, and it is still called algebra, but we shall not 
discuss that. 

To discuss this subject we start in the middle. We suppose that we already 
know what integers are, what zero is, and what it means to increase a number by 
one unit. You may say, “That is not in the middle!” But it is the middle from a 
mathematical standpoint, because we could go even further back and describe the 
theory of sets in order to derive some of these properties of integers. But we are 
not going in that direction, the direction of mathematical philosophy and mathe- 
matical logic, but rather in the other direction, from the assumption that we know 
what integers are and we know how to count. 

If we start with a certain number a , an integer, and we count successively one 
unit b times, the number we arrive at we call a + b, and that defines addition of 
integers. 
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Once we have defined addition, then we can consider this: if we start with 
nothing and add a to it, b times in succession, we call the result multiplication of 
integers ; we call it b times a . 

Now we can also have a succession of multiplications: if we start with 1 and 
multiply by a , b times in succession, we call that raising to a power: a b . 

Now as a consequence of these definitions it can be easily shown that all of 
the following relationships are true: 


(a) 

a b — 

b + a 

(b) 

a+ (b + c) = (a + b) + 

c 

(c) 

ab — ba 


(d) 

a(b + c) = ab + ac 


(e) 

(ab)c — 

a(bc) 

(0 

0 ab) e = a e b c 

(22.1) 

(g) 

a b a c = a 

( <E>+c) 

(h) 

(a b ) c = a ibc) 


(0 

a + 0 = 

: a 

(j) 

a - 1 = a 


(k) 

a 1 = a 





These 

results are well known and we 

shall not belabor the point, 

we merely list 


them. Of course, 1 and 0 have special properties; for example, a + 0 is a, a times 
1 = a , and a to the first power is a. 

In this discussion we must also assume a few other properties like continuity 
and ordering, which are very hard to define; we will let the rigorous theory do it. 
Furthermore, it is definitely true that we have written down too many “rules”; 
some of them may be deducible from the others, but we shall not worry about 
such matters. 


22-2 The inverse operations 

In addition to the direct operations of addition, multiplication, and raising 
to a power, we have also the inverse operations, which are defined as follows. 
Let us assume that a and c are given, and that we wish to find what values of b 
satisfy such equations as a + b — c, ab — c, b a — c. If a + b = c, b is defined 
as c — a, which is called subtraction . The operation called division is also clear: 
if ab = c, then b = c/a defines division — a solution of the equation ab = c 
“backwards.” Now if we have a power b a = c and we ask ourselves, “What is 
blf it is called the a\h root of c: b = \/c. For instance, if we ask ourselves the 
following question, “What integer, raised to the third power, equals 8?,” then the 
answer is called the cube root of 8 ; it is 2. Because b a and a b are not equal, there 
are two inverse problems associated with powers, and the other inverse problem 
would be, “To what power must we raise 2 to get 8?” This is called taking the 
logarithm . If a b = c, we write b = log a c. The fact that it has a cumbersome nota- 
tion relative to the others does not mean that it is any less elementary, at least 
applied to integers, than the other processes. Although logarithms come late in 
an algebra class, in practice they are, of course, just as simple as roots ; they are 
just a different kind of solution of an algebraic equation. The direct and inverse 
operations are summarized as follows: 


(a) 

addition 
a + b = c 

(a') 

subtraction 
b = c — a 

(b) 

multiplication 
ab = c 

(b') 

division 
b = c/a 

(c) 

power 
b a = c 

(o') 

root 

b = sfc 

(d) 

power 
a b — c 

(dO 

logarithm 
b = log a c 


Now here is the idea. These relationships, or rules, are correct for integers, 
since they follow from the definitions of addition, multiplication, and raising to a 
power. We are going to discuss whether or not we can broaden the class of objects 
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which a , b, and c represent so that they will obey these same rules , although the proc- 
esses for a + b, and so on, will not be definable in terms of the direct action of 
adding 1, for instance, or successive multiplications by integers. 

22-3 Abstraction and generalization 

When we try to solve simple algebraic equations using all these definitions, 
we soon discover some insoluble problems, such as the following. Suppose that 
we try to solve the equation b = 3 — 5. That means, according to our definition 
of subtraction, that we must find a number which, when added to 5, gives 3. And 
of course there is no such number, because we consider only positive integers; 
this is an insoluble problem. However, the plan, the great idea, is this: abstraction 
and generalization . From the whole structure of algebra, rules plus integers, we 
abstract the original definitions of addition and multiplication, but we leave the 
rules (22.1) and (22.2), and assume these to be true in general on a wider class of 
numbers, even though they are originally derived on a smaller class. Thus, rather 
than using integers symbolically to define the rules, we use the rules as the definition 
of the symbols, which then represent a more general kind of number. As an ex- 
ample, by working with the rules alone we can show that 3 — 5 = 0 — 2. In 
fact we can show that one can make all subtractions, provided we define a whole 
set of new numbers: 0 — 1,0 — 2, 0 — 3, 0 — 4, and so on, called the negative 
integers . Then we may use all the other rules, like a(b + c) = ab + ac and so 
forth, to find what the rules are for multiplying negative numbers, and we will 
discover, in fact, that all of the rules can be maintained with negative as well as 
positive integers. 

So we have increased the range of objects over which the rules work, but the 
meaning of the symbols is different. 

One cannot say, for instance, that —2 times 5 really means to add 5 together 
successively —2 times. That means nothing. But nevertheless everything will work 
out all right according to the rules. 

An interesting problem comes up in taking powers. Suppose that we wish to 
discover what a (3_5> means. We know only that 3 — 5 is a solution of the problem, 
(3 — 5) + 5 = 3. Knowing that, we know that a (s ~ 5) a 5 = a 3 . Therefore 
fl (3-5) = a*/a 5 , by the definition of division. With a little more work, this can 
be reduced to 1 / a 2 . So we find that the negative powers are the reciprocals of the 
positive powers, but l /a 2 is a meaningless symbol, because if a is a positive or 
negative integer, the square of it is greater than 1, and we do not yet know what 
we mean by 1 divided by a number greater than 1! 

Onward! The great plan is to continue the process of generalization ; whenever 
we find another problem that we cannot solve we extend our realm of numbers. 
Consider division: we cannot find a number which is an integer, even a negative 
integer, which is equal to the result of dividing 3 by 5. But if we suppose that all 
fractional numbers also satisfy the rules, then we can talk about multiplying and 
adding fractions, and everything works as well as it did before. 

Take another example of powers: what is a 3/5 ? We know only that (3/5)5 = 
3, since that was the definition of 3/5. So weknowalso that (a (3/5) ) 5 = a <3/5)(5) = 
a 3 , because this is one of the rules. Then by the definition of roots we find that 

a ( 3 /5) = ^ 

In this way, then, we can define what we mean by putting fractions in the 
various symbols, by using the rules themselves to help us determine the definition — 
it is not arbitrary. It is a remarkable fact that all the rules still work for positive 
and negative integers, as well as for fractions! 

We go on in the process of generalization. Are there any other equations we 
cannot solve? Yes, there are. For example, it is impossible to solve this equation: 
b = 2 1/2 = \/2. It is impossible to find a number which is rational (a fraction) 
whose square is equal to 2. It is very easy for us in modern days to answer this 
question. We know the decimal system, and so we have no difficulty in appreciating 
the meaning of an unending decimal as a type of approximation to the square root 
of 2. Historically, this idea presented great difficulty to the Greeks. To really 
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define precisely what is meant here requires that we add some substance of con- 
tinuity and ordering, and it is, in fact, quite the most difficult step in the processes 
of generalization just at this point. It was made, formally and rigorously, by 
Dedekind. However, without worrying about the mathematical rigor of the thing, 
it is quite easy to understand that what we mean is that we are going to find a 
whole sequence of approximate fractions, perfect fractions (because any decimal, 
when stopped somewhere, is of course rational), which just keeps on going, getting 
closer and closer to the desired result. That is good enough for what we wish to 
discuss, and it permits us to involve ourselves in irrational numbers, and to cal- 
culate things like the square root of 2 to any accuracy that we desire, with enough 
work. 

22-4 Approximating irrational numbers 

The next problem comes with what happens with the irrational powers. Sup- 
pose that we want to define, for instance, 10 \ n principle, the answer is simple 
enough. If we approximate the square root of 2 to a certain number of decimal 
places, then the power is rational, and we can take the approximate root, using 
the above method, and get an approximation to 10^. Then we may run it up a 
few more decimal places (it is again rational), take the appropriate root, this time 
a much higher root because there is a much bigger denominator in the fraction, 
and get a better approximation. Of course we are going to get some enormously 
high roots involved here, and the work is quite difficult. How can we cope with 
this problem? 

In the computations of square roots, cube roots, and other small roots, there 
is an arithmetical process available by which we can get one decimal place after 
another. But the amount of labor needed to calculate irrational powers and the 
logarithms that go with them (the inverse problem) is so great that there is no simple 
arithmetical process we can use. Therefore tables have been built up which permit 
us to calculate these powers, and these are called the tables of logarithms, or the 
tables of powers, depending on which way the table is set up. It is merely a question 
of saving time; if we must raise some number to an irrational power, we can look 
it up rather than having to compute it. Of course, such a computation is just a 
technical problem, but it is an interesting one, ahd of great historical value. In 
the first place, not only do we have the problerh of solving ^ = 10 V 2 , but we also 
have the problem of solving 10 x — 2, or* = log 1 5 2. This is not a problem where 
we have to define a new kind of number for the result, it is merely a computational 
problem. The answer is simply an irrational nuhiber, an unending decimal, not a 
new kind of a number. 

Let us now discuss the problem of calculating solutions of such equations. 
The general idea is really very simple. If we could calculate 10 \ and 10 4/ 10 , and 
10 1/10 °, and lo 4/1000 and so on, and multiply them all together, we would get 
10 i-4i4. .. or and that is the general idea on which things work. But instead 
of calculating 10 1/10 and so on, we shall calculate 10 1/2 , 10 1/4 , and so on. Before 
we start, we should explain why we make so much work with 10, instead of some 
other number. Of course, we realize that logarithm tables are of great practical 
utility, quite aside from the mathematical problem of taking roots, since with 
any base at all, 

log& (1 ac ) = logb a + log 6 c. (22.3) 

We are all familiar with the fact that one can use this fact in a practical way to 
multiply numbers if we have a table of logarithms. The only question is, with what 
base b shall we compute? It makes no difference what base is used; we can use the 
same principle all the time, and if we are using logarithms to any particular base, 
we can find logarithms to any other base merely by a change in scale, a multiplying 
factor. If we multiply Eq. (22.3) by 61, it is just as true, and if we had a table of 
logs with a base ft, and somebody else multiplied all of our table by 61, there would 
be no essential difference. Suppose that we know the logarithms of all the numbers 
to the base b . In other words, we can solve the equation b a = c for any c because 
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we have a table. The problem is to find the logarithm of the same number c to 
some other base, let us say the base x. We would like to solve x a ' = c. It is easy 
to do, because we can always write x = b\ which defines /, knowing x and b. As a 
matter of fact, t = log ft x. Then if we put that in and solve for a\ we see that 
( b l ) a ' = b ta ' = c . In other words, la! is the logarithm of c in base b. Thus 
a ' = a/u Thus logs to base x are just 1 /t, which is a constant, times the logs to 
the base, b . Therefore any log table is equivalent to any other log table if we 
multiply by a constant, and the constant is 1 /log^ x. This permits us to choose a 
particular base, and for convenience we take the base 10. (The question may arise 
as to whether there is any natural base, any base in which things are somehow 
simpler, and we shall try to find an answer to that later. At the moment we shall 
just use the base 10.) 

Table 22-1 

Successive Square Roots of Ten 


Power s 

1024 s 

10 s 

(10 s 

- i)A 

1 

1024 

10.00000 

9.00 

1/2 

512 

3.16228 

4.32 

1/4 

256 

1.77828 

3.113 

1/8 

128 

1.33352 

2.668 

1/16 

64 

1.15478 

2.476 

1/32 

32 

1.074607 

2.3874 

1/64 

16 

1.036633 

2.3445 

1/128 

8 

1.018152 

2.3234 211 

1/256 

4 

1.0090350 

2.3130 104 

1/512 

2 

1.0045073 ; 

2.3077 53 

1/1024 

1 

1.0022511 

2.3 

051 26 

26 

A/1024 
(A — ► 0) 

A 

1 + .0022486A’*- 

— 2.3C 

)25 


Now let us see how to calculate logarithms. We begin by computing successive 
square roots of 10, by cut and try. The results are shown in Table 22-1. The 
powers of 10 are given in the first column, and the result, 10 5 , is given in the third 
column. Thus 10 1 = 10. The one-halfpower of 10 we can easily workout, because 
that is the square root of 10, and there is a known, simple process for taking square 
roots of any number * Using this process, we find the first square root to be 
3.16228. What good is that? It already tells us something, it tells us how to take 
lO 0,5 , so we now know at least one logarithm, if we happen to need the logarithm 
of 3.16228, we know the answer is close to 0.50000. But we must do a little bit 
better than that; we clearly need more information. So we take the square root 
again, and find 10 1/4 , which is 1.77828. Now we have the logarithm of more 
numbers than we had before, 1.250 is the logarithm of 17.78 and, incidentally, 
if it happens that somebody asks for 10 0,75 , we can get it, because that is 
10(0.5+0.25). j s therefore the product of the second and third numbers. If we 
can get enough numbers in column s to be able to make up almost any number, 
then by multiplying 1 the proper things in column 3, we can get 10 to any power; 
that is the plan. So we evaluate ten successive square roots of 10, and that is the 
main work which is involved in the calculations. 

Why don’t we keep on going for more and more accuracy? Because we begin 
to notice something. When we raise 10 to a very small power, we get 1 plus a small 
amount. The reason for this is clear, because we are going to have to take the 


* There is a definite arithmetic procedure, but the easiest way to find the square root 
of any number A is to choose some a fairly close, find N/a, average a! = \[a + (N/a)l, 
and use this average a' for the next choice for a. The convergence is very rapid — the 
number of significant figures doubles each time. 
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1000th power of 10 1/1 000 to get back to 10, so we had better not start with too 
big a number; it has to be close to I. What we notice is that the small numbers 
that are added to 1 begin to look as though we are merely dividing by 2 each time; 
we see 1815 becomes 903, then 450, 225; so it is clear that, to an excellent approxi- 
mation, if we take another root, we shall get 1.00112 something, and rather than 
actually take all the square roots, we guess at the ultimate limit. When we take a 
small fraction A of 1024 as A approaches zero, what will the answer be? Of course 
it will be some number close to 0.002251 1 A. Not exactly 0.002251 1 A, however— we 
can get a better value by the following trick: we subtract the 1, and then divide by 
the power s . This ought to correct all the excesses to the same value. We see that 
they are very closely equal. At the top of the table they are not equal, but as they 
come down, they get closer and closer to a constant value. What is the value? 
Again we look to see how the series is going, how it has changed with s. It changed 
by 21 1, by 104, by 53, by 26. These changes are obviously half of each other, very 
closely, as we go down. Therefore, if we kept going, the changes would be 13, 7, 3, 
2 and 1, more or less, or a total of 26. Thus we have only 26 more to go, and so 
we find that the true number is 2.3025. (Actually, we shall later see that the exact 
number should be 2.3026, but to keep it realistic, we shall not alter anything in the 
arithmetic.) From this table we can now calculate any power of 10, by compound- 
ing the power out of 1024ths. 

Let us now actually calculate a logarithm, because the process we shall use is 
where logarithm tables actually come from. The procedure is shown in Table 
22-2, and the numerical values are shown in Table 22-1 (columns 2 and 3). 


Table 22-2 

Calculation of a logarithm: logic 2 


2 -h 1.77828 = 1.124682 
1.124682 1.074607 = 1.046598, etc. 


.*.2 = (1.77828)(1. 074607)(1 .036633)(1.090350)(1. 000573) 

= 10 [-L (256 + 32 + 16 + 4 + O. 254 ] = 10 P 08 ' 254 ] 
[.1024 J L 1024 J 

= 100 30103 (— = 0.254) 

\2249 ) 


log 10 2 = 0.30103 


Suppose we want the logarithm of 2. That is, we want to know to what power 
we must raise 10 to get 2. Can we raise 10 to the 1/2 power? No; that is too big. 
In other words, we can see that the answer is going to be bigger than 1/4, and less 
than 1/2. Let us take the factor 10 1/4 out; we divide 2 by 1.778 . . . , and get 
1.124 ... , and so on, and now we know that we have taken away 0.250000 from 
the logarithm. The number 1.124 ... is now the number whose logarithm we need. 
When we are finished we shall add back tht 1/4, or 256/1024. Now we look in the 
table for the next number just below 1.124 ... , and that is 1.074607. We therefore 
divide by 1.074607 and get 1.046598. From that we discover that 2 can be made 
up of a product of numbers that aile in Table 22-1, as follows: 

2 = ( 1 .77828)( 1 .074607)( 1 .036633)( 1 .0090350)( 1 .000573). 

There was one factor (1.Q0O573) left over, naturally, which is beyond the range 
of our table. To get the logarithm of this factor, we use our result (hat 10 A/ 1024 « 

1 + 2.3025 A/1024. We find A — 0.254. Therefore our answer is 10 to the follow- 
ing power: (256 + 32 + 16 + 4 + 0.254)/1024. Adding those (ogether, we 
get 308.254/1024. Dividing, we get 0.30103, so we know that the log 10 2 = 0.30103, 
which happens to be right to 5 figures! 

This is how logarithms were originally computed by Mr. Briggs of Halifax, 
in 1620. He said, “I computed successively 54 square roots of 10.” We know he 
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really computed only the first 27, because the rest of them can be obtained by this 
trick with A. His work involved calculating the square root of 10 twenty-seven 
times, which is not much more than the ten times we did; however, it was more 
work because he calculated to sixteen decimal places, and then reduced his answer 
to fourteen when he published it, so that there were no rounding errors. He made 
tables of logarithms to fourteen decimal places by this method, which is quite 
tedious. But all logarithm tables for three hundred years we xe borrowed from Mr. 
Briggs’ tables by reducing the number of decimal places. Only in modern times, 
with the WPA and computing machines, have new tables been independently 
computed. There are much more efficient methods of computing logarithms today, 
using certain series expansions. 

In the above process, we discovered something rather interesting, and that is 
that for very small powers e we can calculate 10 c easily; we have discovered that 
10 c = 1 + 2.3025e, by sheer numerical analysis. Of course this also means that 
10”/ 2*3025 _ 1 4. n if n i s very small. Now logarithms to any other base are 
merely multiples of logarithms to the base 10. The base 10 was used only because 
we have 10 fingers, and the arithmetic of it is easy, but if we ask for a mathematically 
natural base, one that has nothing to do with the number of fingers on human 
beings, we might try to change our scale of logarithms in some convenient and 
natural manner, and the method which people have chosen is to redefine the log- 
arithms by multiplying all the logarithms to the base 10 by 2.3025 . . . This then 
corresponds to using some other base, and this is called the natural base, or base e. 
Note that log e (1 +«)««, or e n 1 + n as n 0. 

It is easy enough to find out what e is: e = io 1/2 3025 or lo 0 - 434294 --, an 
irrational power. Our table of the successive square roots of 10 can be used to 
compute, not just logarithms, but also 10 to any power, so let us use it to calculate 
this natural base e. For convenience we transform 0.434294 . . . into 444.73/1024. 
Now, 444.73 is 256 + 128 + 32 + 16 + 2 + 0.73. Therefore e, since it is an 
exponent of a sum, will be a product of the numbers 

(1.77828)(1.33352)(1.074607)(1.036633)(1. 01 8 152)(1. 00903 5)(1. 001643) = 2.7184. 

(The only problem is the last one, which is 0.73, and which is not in the table, but 
we know that if A is small enough, the answer is 1 + 2.3025 A.) When we multiply 
all these together, we get 2.7184 (it should be 2.7183, but it is good enough). The 
use of such tables, then, is the way in which irrational powers and the logarithms 
of irrational numbers are all calculated. That takes care of the irrationals. 


22-5 Complex numbers 

Now it turns out that after all that work we still cannot solve every equation! 
For instance, what is the square root of — 1 ? Suppose we have to find x 2 — — 1. 
The square of no rational, of no irrational, of nothing that we have discovered so 
far, is equal to — 1. So we again have to generalize our numbers to a still wider 
class. Let us suppose that a specific solution of x 2 = — 1 is called something, 
we shall call it /; i has the property, by definition, that its square is —1. That is 
about all we are going to say about it; of course, there is more than one root of the 
equation x 2 = — 1. Someone could write /, but another could say, “No, I prefer 
— /. My i is minus your It is just as good a solution, and since the only definition 
that i has is that i 2 = — 1 , it must be true that any equation we can write is equally 
true if the sign of / is changed everywhere. This is called taking the complex 
conjugate . Now we are going to make up numbers by adding successive f s, and 
multiplying fs by numbers, and adding other numbers, and so on, according to 
all of our rules. In this way we find that numbers can an be written in the form 
p + iq, where p and q are what we call real numbers, i.e., the numbers we have 
been defining up until now. The number i is called the unit imaginary number. 
Any real multiple of i is called pure imaginary. The most general number, a, is of 
the form p + iq and is called a complex number . Things do not get any worse if, 
for instance, we multiply two such numbers, let us say (r + is)(p + iq). Then, 
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using the rules, we get 


(r + is)(p + iq) = rp + r(iq) + ( is)p + (is)(iq) 

= rp + i(rq) + i(sp) + (ii)(sq) 

= (rp - sq ) + i(rq + sp), (22.4) 

since ii = i 2 = -1. Therefore all the numbers that now belong in the rules 
(22.1) have this mathematical form. 

Now you say, “This can go on forever! We have defined powers of imaginaries 
and all the rest, and when we are all finished, somebody else will come along with 
another equation which cannot be solved, like x 6 + 3x 2 = -2. Then we have to 
generalize all over again!” But it turns out that with this one more invention, just 
the square root of — 1, every algebraic equation can be solved! This is a fantastic 
fact, which we must leave to the Mathematics Department to prove. The proofs 
are very beautiful and very interesting, but certainly not self-evident. In fact, the 
most obvious supposition is that we are going to have to invent again and again 
and again. But the greatest miracle of all is that we do not. This is the last invention. 
After this invention of complex numbers, we find that the rules still work with 
complex numbers, and we are finished inventing new things. We can find the com- 
plex power of any complex number, we can solve any equation that is written 
algebraically, in terms of a finite number of those symbols. We do not find any 
new numbers. The square root of /, for instance, has a definite result, it is not 
something new; and P is something. We will discuss that now. 

We have already discussed multiplication, and addition is also easy; if we add 
two complex numbers, (p + iq) + (r + is), the answer is (p + r) + i(q + s). 
Now we can add and multiply complex numbers. But the real problem, of course, 
is to compute complex powers of complex numbers. It turns out that the problem 
is actually no more difficult than computing complex powers of real numbers. So 
let us concentrate now on the problem of calculating 10 to a complex power, not 
just an irrational power, but 10 (r+ts) . Of course, we must at all times use our 
rules (22.1) and (22.2). Thus 

10 (r+,s) = 10 r 10”. (22.5) 

But 10 r we already know how to compute, and we can always multiply anything 
by anything else; therefore the problem is to compute only 10”. Let us call it some 
complex number, x + iy. Problem: given j, find x, find y. Now if 

10” = x + iy, 

then the complex conjugate of this equation must also be true, so that 

KT” = x - iy. 

(Thus we see that we can deduce a number of things without actually computing 
anything, by using our rules.) We deduce another interesting thing by multiplying 
these together: 

10”10“” = 10° = 1 = (x + iy)(x - iy) = x 2 + y 2 . (22.6) 

Thus if we find x, we have y also. 

Now the problem is how to compute 10 to an imaginary power. What guide 
is there? We may work over our rules until we can go no further, but here is a 
reasonable guide: if we can compute it for any particular s, we can get it for all the 
rest. If we know 10” for any one s and then we want it for twice that s, we can 
square the number, and so on. But how can we find 10” for even one special value 
of j? To do so we shall make one additional assumption, which is not quite in the 
category of all the other rules, but which leads to reasonable results and permits 
us to make progress: when the power is small, we shall suppose that the “law” 
l° e = 1 + 2.3025c is right, as c gets very small, not only for real e, but for complex 
e as well. Therefore, we begin with the supposition that this law is true in general, 
and that tells us that 10” = 1 + 2.3025 • is, for s — > 0. So we assume that if s is 
very small, say one part in 1024, we have a rather good approximation to 10”. 
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Now we make a table by which we can compute all the imaginary powers of 
10, that is, compute x and y. It is done as follows. The first power we start with is 
the 1/1024 power, which we presume is very nearly 1 + 2. 3025// 1024. Thus we 
start with 

10 I/1024 = 1.00000 + 0.0022486/, (22.7) 

and if we keep multiplying the number by itself, we can get to a higher imaginary 
power. In fact, we may just reverse the procedure we used in making our logarithm 
table, and calculate the square, 4th power, 8 th power, etc., of (22.7), and thus 
build up the values shown in Table 22-3. We notice an interesting thing, that the 
x numbers are positive at first, but then swing negative. We shall look into that a 
little bit more in a moment. But first we may be curious to find for what number 
s the real part of 10™ is zero. The y-value would be /, and so we would have 10™ = 
/, or is — logic /• As an example of how to use this table, just as we calculated 
logic 2 before, let us now use Table 22-3 to find logic /* 

Which of the numbers in Table 22-3 do we have to multiply together to get a 
pure imaginary result? After a little trial and error, we discover that to reduce 
x the most, it is best to multiply “512” by “128.” This gives 0.13056 + 0.99144/. 
Then we discover that we should multiply this by a number whose imaginary part 
is about equal to the size of the real part we are trying to remove. Thus we choose 
“64” whose y-value is 0.14349, since that is closest to 0.13056. This then gives 
—0.01350 + 0.99993/. Now we have overshot, and must divide by 0.99996 + 
0.00900/. How do we do that? By changing the sign of / and multiplying by 
0.99996 — 0.00900/ (which works if x 2 + y 2 — 1). Continuing in this way, we 
find that the entire power to which 10 must be raised to give / is /(512 +128 + 
64 — 4 — 2 + 0.20)/1024, or 698.20//1024. If we raise 10 to that power, we 
can get /. Therefore log ! 0 / = 0.68226/. 

22-6 Imaginary exponents 

To further investigate the subject of taking complex imaginary powers, let 
us look at the powers of 10 taking successive powers, not doubling the power each 
time, in order to follow Table 22-3 further and to see what happens to those minus 
signs. This is shown in Table 22-4, in which we take 10 l/8 , and just keep multiply- 
ing it. We see that x decreases, passes through zero, swings almost to — 1 (if we 
could get in between p = 10 and p — 11 it would obviously swing to — 1 ), and 
swings back. The jy-value is going back and forth too. 

In Fig. 22-1 the dots represent the numbers that appear in Table 22-4, and 
the lines are just drawn to help you visually. So we see that the numbers x and y 
oscillate; 10 ™ repeats itself, it is a periodic thing, and as such, it is easy enough to 
explain, because if a certain power is /, then the fourth power of that would be i 2 
squared . It would be +1 again, and therefore, since 10°- 6Sl is equal to /, by taking 
the fourth power we discover that 10 2,72 * is equal to +1. Therefore, if we wanted 
10 3OO \ for instance, we could write it as 10 2 72t times 10 0,28 \ In other words, it 
has a period, it repeats. Of course, we recognize what the curves look like! They 
look like the sine and cosine, and we shall call them, for a while, the algebraic sine 
and algebraic cosine. However, instead of using the base 10, we shall put them 
into our natural base, which only changes the horizontal scale; so we denote 
2.30255 by t, and write 10™ = e lt , where t is a real number. Now e lt = x + iy , 
and we shall write this as the algebraic cosine of t plus / times the algebraic sine of 
t. Thus 

e %t = cos t + /sinr. ( 22 . 8 ) 

What are the properties of cos t and sin /? First, we know, for instance, that x 2 + 
y 2 must be 1 ; we have proved that before, and it is just as true for base e as for base 
10. Therefore cos 2 1 + sin 2 / = 1. We also know that, for small /, e lt — 1 + it, 
and therefore cos / is nearly 1 , and sin t is nearly /, and so it goes, that all of the 
various properties of these remarkable functions , which come from taking imaginary 
powers, are the same as the sine and cosine of trigonometry. 


Table 22-3 
Successive Squares of 
10 i/i024 = x + 0.0022486/ 


Power is 

1024 5 

10 “ 

i/ 1024 

1 

1.00000 + 0.00225/* 

j/512 

2 

1.00000 + 0.00450/ 

i/256 

4 

0.99996 + 0.00900/ 

i/ 128 

8 

0.99984 + 0.01800/ 

// 64 

16 

0.99936 + 0.03599/ 

i/22 

32 

0.99742 + 0.07193/ 

i/16 

64 

0.98967 + 0.14349/ 

U 8 

128 

0.95885 + 0.28402/ 

// 4 

256 

0.83872 + 0.54467/ 

i/2 

512 

0.40679 + 0.91365/ 

i/1 

1024 

-0.66928 + 0.74332/ 


* Should be 0.0022486/ 


Table 22-4 

Successive Powers of 10 8 


p = power • 8 / 

10' p/8 

0 

1.00000 + 0 . 00000 / 

1 

0.95882 + 0.28402/ 

2 

0.83867 + 0.54465/ 

3 

0.64944 + 0.76042/ 

4 

0.40672 + 0.91356/ 

5 

0.13050 + 0.99146/ 

6 

-0.15647 + 0.98770/ 

7 

-0.43055 + 0.90260/ 

8 

-0.66917 + 0.74315/ 

9 

-0.85268 + 0.52249/ 

10 

-0.96596 + 0.25880/ 

11 

-0.99969 - 0.02620/ 

12 

-0.95104 - 0.30905/ 

14 

-0.62928 - 0.77717/ 

16 

-0.10447 - 0.99453/ 

18 

+0.45454 - 0.89098/ 

20 

+0.86648 - 0.49967/ 

22 

+0.99884 + 0.05287/ 

24 

+0.80890 + 0.58836/ 
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Is the period the same? Let us find out. e to what power is equal to /? What 
is the logarithm of i to the base e? We worked it out before, in the base 10 it was 
0.68226/, but when we change our logarithmic scale to e, we have to multiply by 
2.3025, and if we do that it comes out 1.5709. So this will be called “algebraic 
ir/2.” But, we see, it differs from the regular ir/2 by only one place in the last 
point, and that, of course, is the result of errors in our arithmetic! So we have 
created two new functions in a purely algebraic manner, the cosine and the sine, 
which belong to algebra, and only to algebra. We wake up at the end to discover 
the very functions that are natural to geometry. So there is a connection, ulti- 
mately, between algebra and geometry. 

We summarize with this, the most remarkable formula in mathematics: 

e 19 = cos 0+/ sin 0. (22.9) 

This is our jewel. 

We may relate the geometry to the algebra by representing complex numbers 
in a plane; the horizontal position of a point is x, the vertical position of a point 
is y (Fig. 22-2). We represent every complex number, x + />. Then if the radial 
distance to this point is called r and the angle is called 0, the algebraic law is that 
x + iy is written in the form re i9 , where the geometrical relationships between 
x , y, r i and 0 are as shown. This, then, is the unification of algebra and geometry. 

When we began this chapter, armed only with the basic notions of integers 
and counting, we had little idea of the power of the processes of abstraction and 
generalization. Using the set of algebraic “laws,” or properties of numbers, Eq. 
(22.1), and the definitions of inverse operations (22.2), we have been able here, 
ourselves, to manufacture not only numbers but useful things like tables of log- 
arithms, powers, and trigonometric functions (for these are what the imaginary 
powers of real numbers are), all merely by extracting ten successive square roots 
of ten! 
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Resonance 


23-1 Complex numbers and harmonic motion 

In the present chapter we shall continue our discussion of the harmonic 
oscillator and, in particular, the forced harmonic oscillator, using a new technique 
in the analysis. In the preceding chapter we introduced the idea of complex num- 
bers, which have real and imaginary parts and which can be represented on a 
diagram in which the ordinate represents the imaginary part and the abscissa 
represents the real part. If a is a complex number, we may write it as a = a r + ia^ 
where the subscript r means the real part of a , and the subscript i means the 
imaginary part of a. Referring to Fig. 23-1 , we see that we may also write a com- 
plex number a — x + iy in the form x + iy = re te 9 where r 2 = x 2 + y 2 — 
(x + iy)(x — iy) = aa*. (The complex conjugate of a , written a* 9 is obtained 
by reversing the sign of / in a.) So we shall represent a complex number in either 
of two forms, a real plus an imaginary part, or a magnitude r and a phase angle 
0, so-called. Given r and 0, x and y are clearly r cos 0 and r sin 0 and, in reverse, 
given a complex number x + iy 9 r — \/x 2 + y 2 and tan 0 = y/x, the ratio of 
the imaginary to the real part. 

We are going to apply complex numbers to our analysis of physical phenomena 
by the following trick. We have examples of things that oscillate; the oscillation 
may have a driving force which is a certain constant times cos cot. Now such a 
force, F = F 0 cos c ot 9 can be written as the real part of a complex number F = 
F 0 e lut because e %<at = cos c ot + i sin co/. The reason we do this is that it is easier 
to work with an exponential function than with a cosine. So the whole trick 
is to represent our oscillatory functions as the real parts of certain complex func- 
tions. The complex number F that we have so defined is not a real physical force, 
because no force in physics is really complex; actual forces have no imaginary part, 
only a real part. We shall, however, speak of the “force” F 0 e l<at 9 but of course 
the actual force is the real part of that expression. 

Let us take another example. Suppose we want to represent a force which is 
a cosine wave that is out of phase with a delayed phase A. This, of course, would 
be the real part of F 0 e t{<at ~ A) , but exponentials being what they are, we may write 
e i(ut-A) __ e lut e — l * t Thus we see that the algebra of exponentials is much easier 
than that of sines and cosines ; this is the reason we choose to use complex numbers. 
We shall often write 

F = F 0 e~ iA e ia>t = Fe io>t • (23.1) 

We write a little caret (*) over the F to remind ourselves that this quantity is a 
complex number: here the number is 

F - F 0 e~ iA . 


Now let us solve an equation, using complex numbers, to see whether we can 
work out a problem for some real case. For example, let us try to solve 


d 2 x kx 
It 2 + In 


£ 

m 


m 


cos co/, 


(23.2) 


where F is the force which drives the oscillator and x is the displacement. Now, 
absurd though it may seem, let us suppose that x and F are actually complex 
numbers, for a mathematical purpose only. That is to say, x has a real part and 
an imaginary part times /, and F has a real part and an imaginary part times i. 
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Fig. 23-1. A complex number may 
be represented by a point in the "complex 
plane.” 



Now if we had a solution of (23.2) with complex numbers, and substituted the 
complex numbers in the equation, we would get 


or 


d 2 (x r + ixj) k(x r + ixj) _ F r + /F t 
dt 2 m m 

d 2 x r kxr , .( d 2 Xj kx A _ F r , <F, 

dt 2 m \dt 2 m ) m m 


Now, since if two complex numbers are equal, their real parts must be equal and 
their complex parts must be equal, we deduce that the real part of x satisfies the 
equation with the real part of the force. We must emphasize, however, that this 
separation into a real part and an imaginary part is not valid in general, but is 
valid only for equations which are linear , that is, for equations in which x appears 
in every term only in the first power or the zeroth power. For instance, if there 
were in the equation a term \x 2 , then when we substitute x r + ix u we would get 
X(x r + ix,) 2 , but when separated into real and imaginary parts this would yield 
X(xr — x 2 ) as the real part and 2i\x r x l as the imaginary part. So we see that the 
real part of the equation would not involve just Xx 2 , but also —\x 2 . In this case 
we get a different equation than the one we wanted to solve, with x it the completely 
artificial thing we introduced in our analysis, mixed in. 

Let us now try our new method for the problem of the forced oscillator, that 
we already know how to solve. We want to solve Eq. (23.2) as before, but we say 
that we are going to try to solve 


dhc kx = Fe iat 

dt 2 m m 


(23.3) 


where Fe l "‘ is a complex number. Of course x will also be complex, but remember 
the rule: take the real part to find out what is really going on. So we try to solve 
(23.3) for the forced solution; we shall discuss other solutions later. The forced 
solution has the same frequency as the applied force, and has some amplitude of 
oscillation and some phase, and so it can be represented also by some complex 
number x whose magnitude represents the swing of x and whose phase represents 
the time delay in the same way as for the force. Now a wonderful feature of an 
exponential function is that dlxe^ l )/dt = iu> xe^K When we differentiate an 
exponential function, we bring down the exponent as a simple multiplier. The 
second derivative does the same thing, it brings down another /«, and so it is very 
simple to write immediately, by inspection, what the equation is for x: every time 
we see a differentiation, we simply multiply by u o. (Differentiation is now as easy as 
multiplication! This idea of using exponentials in linear differential equations is 
almost as great as the invention of logarithms, in which multiplication is replaced 
by addition. Here differentiation is replaced by multiplication.) Thus our equation 
becomes 

(ico) 2 x + (kx/m) = F/m. (23.4) 


(We have cancelled the common factor e”*.) See how simple it is! Differential 
equations are immediately converted, by sight, into mere algebraic equations; 
we virtually have the solution by sight, that 

* = p / m 
( k/m ) - ’ 

since (/co) 2 = — « 2 . This may be slightly simplified by substituting k/m = a>l, 
which gives 

* = F/m(ul - co 2 ). (23.5) 

This, of course, is the solution we had before; for since m(o>l — co 2 ) is a real num- 
ber, the phase angles of F and of x are the same (or perhaps 180° apart, if co 2 > coq), 
as advertised previously, The magnitude of Jc, which measures how far it oscillates, 
is related to the size of the F by the factor 1/w(coq — co 2 ), and this factor becomes 
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enormous when to is nearly equal to to 0 . So we get a very strong response when 
we apply the right frequency to (if we hold a pendulum on the end of a string and 
shake it at just the right frequency, we can make it swing very high). 


23-2 The forced oscillator with damping 

That, then, is how we analyze oscillatory motion with the more elegant 
mathematical technique. But the elegance of the technique is not at all exhibited 
in such a problem that can be solved easily by other methods. It is only exhibited 
when one applies it to more difficult problems. Let us therefore solve another, 
more difficult problem, which furthermore adds a relatively realistic feature to the 
previous one. Equation (23.5) tells us that if the frequency c o were exactly equal to 
oj 0 , we would have an infinite response. Actually, of course, no such infinite 
response occurs because some other things, like friction, which we have so far 
ignored, limits the response. Let us therefore add to Eq. (23.2) a friction term. 

Ordinarily such a problem is very difficult because of the character and 
complexity of the frictional term. There are, however, many circumstances in 
which the frictional force is proportional to the speed with which the object moves. 
An example of such friction is the friction for slow motion of an object in oil or a 
thick liquid. There is no force when it is just standing still, but the faster it moves 
the faster the oil has to go past the object, and the greater is the resistance. So 
we shall assume that there is, in addition to the terms in (23.2), another term, a 
resistance force proportional to the velocity: F/ = —cdx/dt. It will be convenient, 
in our mathematical analysis, to write the constant c as m times 7 to simplify the 
equation a little. This is just the same trick we use with k when we replace it by 
mo) o, just to simplify the algebra. Thus our equation will be 

m(d 2 x/dt 2 ) + c(dx/dt) + kx = F (23.6) 

or, writing c — my and k = mail and dividing out the mass m, 

(i d 2 x/dt 2 ) + lidx/di) + <Ax = F/m. (23.6a) 

Now we have the equation in the most convenient form to solve. If 7 is very 
small, that represents very little friction; if 7 is very large, there is a tremendous 
amount of friction. How do we solve this new linear differential equation? Suppose 
that the driving force is equal to F 0 cos (ait + A); we could put this into (23.6a) 
and try to solve it, but we shall instead solve it by our new method. Thus we 
write F as the real part of Fe lat and x as the real part of xe l0}t , and substitute these 
into Eq. (23.6a). It is not even necessary to do the actual substituting, for we can 
see by inspection that the equation would become 

[(/w) 2 * + y(ia)± + = ( F/m)e iu ' . (23.7) 

[As a matter of fact, if we tried to solve Eq. (23.6a) by our old straightforward way, 
we would really appreciate the magic of the “complex” method.] If we divide by 
e li on both sides, then we can obtain the response x to the given force F; it is 

x ~ F/m(ail — co 2 + ilai). (23.8) 

Thus again x is given by F times a certain factor. There is no technical name 
for this factor, no particular letter for it, but we may call it R for discussion pur- 
poses: 


m(oi o — a) 2 -j- /7to) 

and 

* = FR. (23.9) 

(Although the letters 7 and oj 0 are in very common use, this R has no particular 
name.) This factor R can either be written as p + iq, or as a certain magnitude 
p times e lB . If it is written as a certain magnitude times e %9 > let us see what it means. 
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Now F = F 0 <? ,A , and the actual force F is the real part of F 0 e iA e iut , that is, 
F 0 cos (cot + A). Next, Eq. (23.9) tells us that x is equal to FR. So, writing 
R = pe lS as another name for R, we get 

* = RF = pe ie F 0 e iA = pF 0 e i(t> + A) . 

Finally, going even further back, we see that the physical x, which is the real part 
of the complex x, is equal to the real part of pF 0 e t<s+A, e 1 “ i . But p and F 0 are real, 
and the real part of e i<< ’ +A+ " <) is simply cos (co/ + A + 8). Thus 

x = pF o cos (co/ + A + 6). (23.10) 

This tells us that the amplitude of the response is the magnitude of the force F 
multiplied by a certain magnification factor, p; this gives us the “amount” of 
oscillation. It also tells us, however, that x is not oscillating in phase with the 
force, which has the phase A, but is shifted by an extra amount 8. Therefore p 
and 8 represent the size of the response and the phase shift of the response. 

Now let us work out what p is. If we have a complex number, the square of 
the magnitude is equal to the number times its complex conjugate; thus 



Fig. 23—2. Plot of p 2 versus co. 



Fig, 23—3. Plot of 8 versus co. 


m 2 (wo - co 2 + /7co)(cog — co 2 — iy co) 

= 1 (23.11) 

m 2 [(u 2 - cog) 2 + 7 2 w 2 ] 

In addition, the phase angle 8 is easy to find, for if we write 
1 /R = 1 /pe 19 = (1 /p)e~ ie = m(o) 2 — co 2 + /7co), 

we see that 

tan 8 = -7co/(cog - co 2 ). (23.12) 

It is minus because tan (—8) = —tan 8. A negative value for 8 results for all co, 
and this corresponds to the displacement x lagging the force F. 

Figure 23—2 shows how p 2 varies as a function of frequency (p 2 is physically 
more interesting than p, because p 2 is proportional to the square of the amplitude, 
or more or less to the energy that is developed in the oscillator by the force). We 
see that if 7 is very small, then l/(cog — co 2 ) 2 is the most important term, and the 
response tries to go up toward infinity when co equals co 0 . Now the “infinity” is not 
actually infinite because if co = co 0 , then l/7 2 co 2 is still there. The phase shift 
varies as shown in Fig. 23-3. 

In certain circumstances we get a slightly different formula than (23.8), also 
called a resonance formula, and one might think that it represents a different 
phenomenon, but it does not. The reason is that if 7 is very small the most interest- 
ing part of the curve is near co = co 0 , and we may replace (23.8) by an approximate 
formula which is very accurate if 7 is small and co is near co 0 . Since cog - co 2 = 
(co 0 — co)(co 0 + co), if co is near co 0 this is nearly the same as 2co 0 (co 0 — co) and 
Tco is nearly the same as 7coq. Using these in (23.8), we see that cog — co 2 + /7co ~ 
2co 0 (co 0 — co + iy/2), so that 

x « F/2/wco 0 (coo - co + 17/2) if 7 « co 0 and co « co 0 . (23.13) 


It is easy to find the corresponding formula for p 2 . It is 


p 2 « l /4m 2 col [(co 0 - co) 2 + 7 2 /4J. 

We shall leave it to the student to show the following : if we call the maximum 
height of the curve of p 2 vs. co one unit, and we ask for the width Aco of the curve, 
at one half the maximum height, the full width at half the maximum height of 
the curve is Aco = 7, supposing that 7 is small. The resonance is sharper and 
sharper as the frictional effects are made smaller and smaller. 
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As another measure of the width, some people use a quantity Q which is 
defined as Q = co 0 /t. The narrower the resonance, the higher the Q: Q = 1000 
means a resonance whose width is only 1000th of the frequency scale. The Q of 
the resonance curve shown in Fig. 23-2 is 5. 

The importance of the resonance phenomenon is that it occurs in many other 
circumstances, and so the rest of this chapter will describe some of these other 
circumstances. 

23-3 Electrical resonance 

The simplest and broadest technical applications of resonance are in electricity. 

In the electrical world there are a number of objects which can be connected to 
make electric circuits. These passive circuit elements , as they are often called, are 
of three main types, although each one has a little bit of the other two mixed in. 
Before describing them in greater detail, let us note that the whole idea of our 
mechanical oscillator being a mass on the end of a spring is only an approximation. 
All the mass is not actually at the “mass” ; some of the mass is in the inertia of the 
spring. Similarly, all of the spring is not at the “spring”; the mass itself has a little 
elasticity, and although it may appear so, it is not absolutely rigid, and as it goes 
up and down, it flexes ever so slightly under the action of the spring pulling it. 
The same thing is true in electricity. There is an approximation in which we can 
lump things into “circuit elements” which are assumed to have pure, ideal char- 
acteristics. It is not the proper time to discuss that approximation here, we shall 
simply assume that it is true in the circumstances. 

The three main kinds of circuit elements are the following. The first is called 
a capacitor (Fig, 23-4); an example is two plane metallic plates spaced a very small 
distance apart by an insulating material. When the plates are charged there is a 
certain voltage difference, that is, a certain difference in potential, between them. 
The same difference of potential appears between the terminals A and B , because 
if there were any difference along the connecting wire, electricity would flow right 
away. So there is a certain voltage difference V between the plates if there is a 
certain electric charge +? and —q on them, respectively. Between the plates 
there will be a certain electric field ; we have even found a formula for it (Chapters 
13 and 14): 

V = crd/e 0 = qd/e Q A, (23.14) 

where d is the spacing and A is the area of the plates. Note that the potential 
difference is a linear function of the charge. If we do not have parallel plates, but 
insulated electrodes which are of any other shape, the difference in potential is 
still precisely proportional to the charge, but the constant of proportionality may 
not be so easy to compute. However, all we need to know is that the potential 
difference across a capacitor is proportional to the charge: V = qjC\ the propor- 
tionality constant is 1/C, where C is the capacitance of the object. 

The second kind of circuit element is called a resistor ; it offers resistance 
to the flow of electrical current. It turns out that metallic wires and many 
other substances resist the flow of electricity in this manner: if there is a voltage 
difference across a piece of some substance, there exists an electric current I = 
dq/dt that is proportional to the electric voltage difference: 

V = RI = R dq/dt. (23.15) 

The proportionality coefficient is called the resistance R. This relationship may 
already be familiar to you; it is Ohm’s law. 

If we think of the charge q on a capacitor as being analogous to the displace- 
ment x of a mechanical system, we see that the current, I = dq/dt , is analogous 
to velocity, 1/C is analogous to a spring constant k, and R is analogous to the 
resistive coefficient 7. Now it is very interesting that there exists another circuit 
element which is the analog of mass\ This is a coil which builds up a magnetic field 
within itself when there is a current in it. A changing magnetic field develops in the 
coil a voltage that is proportional to dl/dt (this is how a transformer works, in 
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CAPACITOR RESISTOR INDUCTOR 

Fig. 23-4. The three passive circuit 
elements. 



fact). The magnetic field is proportional to a current, and the induced voltage 
(so-called) in such a coil is proportional to the rate of change of the current: 



Fig. 23-5. An oscillatory electrical 
circuit with resistance, inductance, and 
capacitance. 


V = Ldl/dt = L d 2 q/dt 2 . (23.16) 

The coefficient L is the self -inductance, and is analogous to the mass in a mechanical 
oscillating circuit. 

Suppose we make a circuit in which we have connected the three circuit 
elements in series (Fig. 23-5); then the voltage across the whole thing from 1 to 2 
is the work done in carrying a charge through, and it consists of the sum of several 
pieces: across the inductor, V L = L d 2 q/dt 2 \ across the resistance, V R = Rdq/dt ; 
across the capacitor, V c = q/ C. The sum of these is equal to the applied voltage, 
V: 

Ld 2 q/dt 2 + Rdq/dt + q/C = V(i). (23.17) 

Now we see that this equation is exactly the same as the mechanical equation (23.6), 
and of course it can be solved in exactly the same manner. We suppose that V(i) 
is oscillatory: we are driving the circuit with a generator with a pure sine wave 
oscillation. Then we can write our V(t) as a complex V with the understanding 
that it must be ultimately multiplied by e iu>t , and the real part taken in order to 
find the true F. Likewise, the charge q can thus be analyzed, and then in exactly 
the same manner as in Eq. (23.8) we write the corresponding equation: the second 
derivative of q is ( ia>) 2 q ; the first derivative is ( iof)q . Thus Eq. (23.17) translates to 


or 


L(io)) 2 + R(ia>) + - 


% = V 


V 

q = 

L(icc) 2 + R(icx)) + - 

which we can write in the form 


$ = V/L( a> 2 0 - co 2 + /Yco), (23.18) 

where 4 = l/LC and 7 = R/L. It is exactly the same denominator as we had in 
the mechanical case, with exactly the same resonance properties! The correspond- 
ence between the electrical and mechanical cases is outlined in Table 23-1. 


Table 23-1 


General 

characteristic 

Mechanical 

property 

Electrical 

property 

indep. variable 

time ( t ) 

time (0 

dep. variable 

position (*) 

charge ( q ) 

inertia 

mass (m) 

inductance ( L ) 

resistance 

drag coeff. (c = 7 m) 

resistance (R = 7 L) 

stiffness 

stiffness (&) 

(capacitance) -1 (1/C) 

resonant frequency 

oo q = k/m 

<4 = l/LC 

period 

to = 27t\/ m/k 

to = 2i t\/LC 

figure of merit 

Q = wo/7 

Q = c coL/R 


We must mention a small technical point. In the electrical literature, a different 
notation is used. (From one field to another, the subject is not really any different, 
but the way of writing the notations is often different.) First, j is commonly used 
instead of / in electrical engineering, to denote a/ 11 !. (After all, i must be the cur- 
rent!) Also, the engineers would rather have a relationship between V and / than 
between F and q , just because they are more used to it that way. Thus, since 
I = dq/dt — io)q, we can just substitute I/ioo for § and get 

V - {wL + R + l//coC)/= Z/. (23.19) 
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Another way is to rewrite Eq. (23.17), so that it looks more familiar; one often sees 
it written this way: 

Ldl/dt + RI + (1/C) J l Idt = V{t). (23.20) 

At any rate, we find the relation (23.19) between voltage V and current / which is 
just the same as (23.18) except divided by /co, and that produces Eq. (23.19). The 
quantity R + icoL + 1 //coC is a complex number, and is used so much in electrical 
engineering that it has a name: it is called the complex impedance , Z. Thus we can 
write V = ZI. The reason that the engineers like to do this is that they learned 
something when they were young: V = RI for resistances, when they only knew 
about resistances and dc. Now they have become more educated and have ac 
circuits, so they want the equation to look the same. Thus they write V = Z/, the 
only difference being that the resistance is replaced by a more complicated thing, 
a complex quantity. So they insist that they cannot use what everyone else in the 
world uses for imaginary numbers, they have to use a j for that; it is a miracle that 
they did not insist also that the letter Z be an R! (Then they get into trouble when 
they talk about current densities, for which they also use j. The difficulties of science 
are to a large extent the difficulties of notations, the units, and all the other arti- 
ficialities which are invented by man, not by nature.) 


23-4 Resonance in nature 

Although we have discussed the electrical case in detail, we could also bring 
up case after case in many fields, and show exactly how the resonance equation is 
the same. There are many circumstances in nature in which something is “oscilla- 
ting” and in which the resonance phenomenon occurs. We said that in an earlier 
chapter ; let us now demonstrate it. If we walk around our study, pulling books 
off the shelves and simply looking through them to find an example of a curve 
that corresponds to Fig. 23-2 and comes from the same equation, what do we find? 
Just to demonstrate the wide range obtained by taking the smallest possible sample, 
it takes only five or six books to produce quite a series of phenomena which show 
resonances. 

The first two are from mechanics, the first on a large scale: the atmosphere 
of the whole earth. If the atmosphere, which we suppose surrounds the earth 
evenly on all sides, is pulled to one side by the moon or, rather, squashed prolate 
into a double tide, and if we could then let it go, it would go sloshing up and down ; 
it is an oscillator. This oscillator is driven by the moon, which is effectively re- 
volving about the earth ; any one component of the force, say in the x-direction, has 
a cosine component, and so the response of the earth’s atmosphere to the tidal pull 
of the moon is that of an oscillator. The expected response of the atmosphere is 
shown in Fig. 23-6, curve b (curve a is another theoretical curve under discussion 
in the book from which this is taken out of context). Now one might think that we 
only have one point on this resonance curve, since we only have the one frequency, 
corresponding to the rotation of the earth under the moon, which occurs at a 
period of 12.42 hours— 12 hours for the earth (the tide is a double bump), plus a 
little more because the moon is going around. But from the size of the atmospheric 
tides, and from the phase , the amount of delay, we can get both p and 6. From 
those we can get co 0 and 7, and thus dra\y the entire curve! This is an example of 
very poor science. From two numbers we obtain two numbers, and from those 
two numbers we draw a beautiful curve, which of course goes through the very 
point that determined the curve! It is of no use unless we can measure something 
else , and in the case of geophysics that is often very difficult. But in this particular 
case there is another thing which we can show theoretically must have the same 
timing as the natural frequency co 0 : that is, if someone disturbed the atmosphere, 
it would oscillate with the frequency oj 0 . Now there was such a sharp disturbance 
in 1883; the Krakatoa volcano exploded and half the island blew off, and it made 
such a terrific explosion in the atmosphere that the period of oscillation of the 
atmosphere could be measured. It came out to 10J hours. The oj 0 obtained from 
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Cycles per day 



Fig. 23-6. Response of the atmos- 
phere to external excitation, a is the re- 
quired response if the atmospheric S 2 -tide 
is of gravitational origin; peak amplifi- 
cation is 1 00:1. b is derived from observed 
magnification and phase of M 2 -tide. 
[Munk and MacDonald, "Rotation of 
the Earth,” Cambridge University Press 
(I960)] 
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Wavelength in microns (10" 4 cm) 

Fig. 23-7. Transmission of infrared 
radiation through a thin (0.17 fi) sodium 
chloride film. [After R. B. Barnes, Z. 
Physik 75, 723 (1932). Kittel, Introduc- 
tion to Solid State Physics , Wiley, 1 956.] 
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STATIC MAGNETIC FIELD IN OERSTEDS 


Fig. 23-8. Magnetic energy loss in 
paramagnetic organic compound as func- 
tion of applied magnetic field intensity. 
[Holden et a/., Phys . Rev. 75, 1614(1 949)] 


Fig. 23-6 comes out 10 hours and 20 minutes, so there we have at least one check 
on the reality of our understanding of the atmospheric tides. 

Next we go to the small scale of mechanical oscillation. This time we take 
a sodium chloride crystal, which has sodium ions and chlorine ions next to each 
other, as we described in an early chapter. These ions are electrically charged, 
alternately plus and minus. Now there is an interesting oscillation possible. Sup- 
pose that we could drive all the plus charges to the right and all the negative charges 
to the left, and let go; they would then oscillate back and forth, the sodium lattice 
against the chlorine lattice. How can we ever drive such a thing? That is easy, for 
if we apply an electric field on the crystal, it will push the plus charge one way and 
the minus charge the other way! So, by having an external electric field we can 
perhaps get the crystal to oscillate. The frequency of the electric field needed is so 
high, however, that it corresponds to infrared radiation! So we try to find a reso- 
nance curve by measuring the absorption of infrared light by sodium chloride. 
Such a curve is shown in Fig. 23-7. The abscissa is not frequency, but is given in 
terms of wavelength, but that is just a technical matter, of course, since for a wave 
there is a definite relation between frequency and wavelength; so it is really a 
frequency scale, and a certain frequency corresponds to the resonant frequency. 

But what about the width? What determines the width? There are many 
cases in which the width that is seen on the curve is not really the natural width 
7 that one would have theoretically. There are two reasons why there can be a 
wider curve than the theoretical curve. If the objects do not all have the same 
frequency, as might happen if the crystal were strained in certain regions, so that 
in those regions the oscillation frequency were slightly different than in other 
regions, then what we have is many resonance curves on top of each other; so we 
apparently get a wider curve. The other kind of width is simply this: perhaps we 
cannot measure the frequency precisely enough — if we open the slit of the spectrom- 
eter fairly wide, so although we thought we had only one frequency, we actually 
had a certain range Aco, then we may not have the resolving power needed to see a 
narrow curve. Offhand, we cannot say whether the width in Fig. 23-7 is natural, 
or whether it is due to inhomogeneities in the crystal or the finite width of the slit 
of the spectrometer. 

Now we turn to a more esoteric example, and that is the swinging of a magnet. 
If we have a magnet, with north and south poles, in a constant magnetic field, the 
N end of the magnet will be pulled one way and the S end the other way, and there 
will in general be a torque on it, so it will vibrate about its equilibrium position, 
like a compass needle. However, the magnets we are talking about are atoms . These 
atoms have an angular momentum, the torque does not produce a simple motion 
in the direction of the field, but instead, of course, a precession. Now, looked at 
from the side, any one component is “swinging,” and we can disturb or drive that 
swinging and measure an absorption. The curve in Fig. 23-8 represents a typical 
such resonance curve. What has been done here is slightly different technically. 
The frequency of the lateral field that is used to drive this swinging is always kept 
the same, while we would have expected that the investigators would vary that and 
plot the curve. They could have done it that way, but technically it was easier for 
them to leave the frequency co fixed, and change the strength of the constant 
magnetic field, which corresponds to changing o> 0 in our formula. They have 
plotted the resonance curve against co 0 . Anyway, this is a typical resonance with a 
certain co 0 and 7. 

Now we go still further. Our next example has to do with atomic nuclei. The 
motions of protons and neutrons in nuclei are oscillatory in certain ways, and we 
can demonstrate this by the following experiment. We bombard a lithium atom 
with protons, and we discover that a certain reaction, producing 7-rays, actually 
has a very sharp maximum typical of resonance. We note in Fig. 23-9, however, 
one difference from other cases: the horizontal scale is not a frequency, it is an 
energy! The reason is that in quantum mechanics what we think of classically as 
the energy will turn out to be really related to a frequency of a wave amplitude. 
When we analyze something which in simple large-scale physics has to do with a 
frequency, we find that when we do quantum-mechanical experiments with atomic 
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Fig. 23-9. The intensity of gamma- 
radiation from lithium as a function of the 
energy of the bombarding protons. The 
dashed curve is a theoretical one cal- 
culated for protons with an angular 
momentum i = 0. [Bonner and Evans, 
Phys. Rev. 73, 666 (1948)] 



matter, we get the corresponding curve as a function of energy. In fact, this curve 
is a demonstration of this relationship, in a sense. It shows that frequency and 
energy have some deep interrelationship, which of course they do. 

Now we turn to another example which also involves a nuclear energy level, but 
now a much, much narrower one. The o> 0 in Fig. 23-10 corresponds to an energy of 
100,000 electron volts, while the width 7 is approximately 10~ 5 electron volt; 
in other words, this has a Q of 10 1 °! When this curve was measured it was the 
largest Q of any oscillator that had ever been measured. It was measured by Dr. 
Moessbauer, and it was the basis of his Nobel prize. The horizontal scale here is 
velocity, because the technique for obtaining the slightly different frequencies was 
to use the Doppler effect, by moving the source relative to the absorber. One can 
see how delicate the experiment is when we realize that the speed involved is a few 
centimeters per second! On the actual scale of the figure, zero frequency would 
correspond to a point about 10 10 cm to the left — slightly off the paper! 

Finally, if we look in an issue of the Physical Review , say that of January 1, 
1962, will we find a resonance curve? Every issue has a resonance curve, and Fig. 
23-11 is the resonance curve for this one. This resonance curve turns out be be 
very interesting. It is the resonance found in a certain reaction among strange 
particles, a reaction in which a K~~ and a proton interact. The resonance is de- 
tected by seeing how many of some kinds of particles come out, and depending on 
what and how many come out, one gets different curves, but of the same shape 
and with the peak at the same energy. We thus determine that there is a resonance 
at a certain energy for the K~ meson. That presumably means that there is some 
kind of a state, or condition, corresponding to this resonance, which can be attained 
by putting together a K~ and a proton. This is a new particle, or resonance. Today 
we do not know whether to call a bump like this a “particle” or simply a resonance. 
When there is a very sharp resonance, it corresponds to a very definite energy , 
just as though there were a particle of that energy present in nature. When the 
resonance gets wider, then we do not know whether to say there is a particle which 
does not last very long, or simply a resonance in the reaction probability. In the 
second chapter, this point is made about the particles, but when the second chapter 
was written this resonance was not known, so our chart should now have still 
another particle in it! 



Fig. 23-10. [Courtesy of Dr. R. 
Mossbauer] 



Fig. 23-11. Momentum dependence 
of the cross section for the reactions (a) 
+ p — > A + 7r + + 7T ” and (b) 
+ p — ► K° + n. The lower curves 
in (a) and (b) represent the presumed 
nonresonant backgrounds, while the upper 
curves contain in addition the superposed 
resonance. [Ferro- Luzzi et a/., Phys . Rev., 
Lett. 8 , 28 (1962)] 
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Transients 


24-1 The energy of an oscillator 

Although this chapter is entitled “transients,” certain parts of it are, in a way, 
part of the last chapter on forced oscillation. One of the features of a forced oscilla- 
tion which we have not yet discussed is the energy in the oscillation. Let us now 
consider that energy. 

In a mechanical oscillator, how much kinetic energy is there? It is proportional 
to the square of the velocity. Now we come to an important point. Consider an 
arbitrary quantity A , which may be the velocity or something else that we want to 
discuss. When we write A — Ae^ 1 , a complex number, the true and honest A , 
in the physical world, is only the real part ; therefore if, for some reason, we want 
to use the square of A, it is not right to square the complex number and then take 
the real part, because the real part of the square of a complex number is not just 
the square of the real part, but also involves the imaginary part. So when we wish 
to find the energy we have to get away from the complex notation for a while to 
see what the inner workings are. 

Now the true physical A is the real part of ^4 0 <^ C “ <+A \ that is, A = A 0 cos 
(cot + A), where A, the complex number, is written as A 0 e tA . Now the square of 
this real physical quantity is A 2 = A% cos 2 (. cot + A). The square of the quantity, 
then, goes up and down from a maximum to zero, like the square of the cosine. 
The square of the cosine has a maximum of 1 and a minimum of 0, and its average 
value is 1/2. 

In many circumstances we are not interested in the energy at any specific 
moment during the oscillation; for a large number of applications we merely want 
the average of A 2 , the mean of the square of A over a period of time large compared 
with the period of oscillation. In those circumstances, the average of the cosine 
squared may be used, so we have the following theorem: if A is represented by a 
complex number, then the mean of A 2 is equal to %Aq. Now A$ is the square of 
the magnitude of the complex A. (This can be written in many ways — some people 
like to write \A\ 2 ; others write, AA*, A times its complex conjugate.) We shall use 
this theorem several times. 

Now let us consider the energy in a forced oscillator. The equation for the 
forced oscillator is 

md 2 x/dt 2 + 7 mdx/dt + mcolx = F(t ). (24.1) 

In our problem, of course, F(t) is a cosine function of t. Now let us analyze the 
situation: how much work is done by the outside force FI The work done by the 
force per second, i.e., the power, is the force times the velocity. (We know that 
the differential work in a time dt is F dx , and the power is F dx/dt.) Thus 

But the first two terms on the right can also be written as d/dt[\m(dx/dt ) 2 + 
imcolx 2 ], as is immediately verified by differentiating. That is to say, the term in 
brackets is a pure derivative of two terms that are easy to understand — one is the 
kinetic energy of motion, and the other is the potential energy of the spring. Let 
us call this quantity the stored energy , that is, the energy stored in the oscillation. 
Suppose that we want the average power over many cycles when the oscillator is 
being forced and has been running for a long time. In the long run, the stored 
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energy does not change — its derivative gives zero average effect. In other words, 
if we average the power in the long run, all the energy ultimately ends up in the 
resistive term ym(dx/dt) 2 . There is some energy stored in the oscillation, but that 
does not change with time, if we average over many cycles. Therefore the mean 
power ( P ) is 

(P) = (7 m(dx/dt) 2 ). (24.3) 

Using our method of writing complex numbers, and our theorem that (A 2 ) = 
iA 2 , we may find this mean power. Thus if x = xe^ 1 , then dx/dt = mxe iut . 
Therefore, in these circumstances, the average power could be written as 

(P) = \ynm 2 x%. (24.4) 

In the notation for electrical circuits, dx/dt is replaced by the current / (/ is 
dq/dt, where q corresponds to x), and my corresponds to the resistance R. Thus 
the rate of the energy loss — the power used up by the forcing function — is the 
resistance in the circuit times the average square of the current: 

(P) = R(I 2 ) = R-ill (24.5) 

This energy, of course, goes into heating the resistor; it is sometimes called the 
heating loss or the Joule heating. 

Another interesting feature to discuss is how much energy is stored. That is 
not the same as the power, because although power was at first used to store up 
some energy, after that the system keeps on absorbing power, insofar as there are 
any heating (resistive) losses. At any moment there is a certain amount of stored 
energy, so we would like to calculate the mean stored energy (E) also. We have 
already calculated what the average of (dx/dt) 2 is, so we find 


(E) = \m((dx/dt ) 2 ) + \mu%(x 2 ) 

= im(w 2 + wfoxl ( ’ 

Now, when an oscillator is very efficient, and if u is near co 0 , so that |x| is large, 
the stored energy is very high — we can get a large stored energy from a relatively 
small force. The force does a great deal of work in getting the oscillation going, 
but then to keep it steady, all it has to do is to fight the friction. The oscillator can 
have a great deal of energy if the friction is very low, and even though it is oscillating 
strongly, not much energy is being lost. The efficiency of an oscillator can be 
measured by how much energy is stored, compared with how much work the force 
does per oscillation. 

How does the stored energy compare with the amount of work that is done in 
one cycle? This is called the Q of the system, and Q is defined as 2ir times the 
mean stored energy, divided by the work done per cycle. (If we were to say the 
work done per radian instead of per cycle, then the 2x disappears.) 


o = 2 tt iM^ 2 + o>o) • (x 2 ) a 2 + top 
ymu 2 (x 2 ) ■ 27t/w 27w 


(24.7) 


Q is not a very useful number unless it is very large. When it is relatively large, it 
gives a measure of how good the oscillator is. People have tried to define Q in the 
simplest and most useful way; various definitions differ a bit from one another, 
but if Q is very large, all definitions are in agreement. The most generally accepted 
definition is Eq. (24.7), which depends on «. For a good oscillator, close to reso- 
nance, we can simplify (24.7) a little by setting w = «o, and we then have Q = 
«oA, which is the definition of Q that we used before. 

What is Q for an electrical circuit? To find out, we merely have to translate 
L for m, R for my, and 1/C for ma > o (see Table 23—1). The Q at resonance is 
Im/R, where « is the resonance frequency. If we consider a circuit with a high Q, 
that means that the amount of energy stored in the oscillation is very large compared 
with the amount of work done per cycle by the machinery that drives the oscilla- 
tions. 
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24-2 Damped oscillations 

We now turn to our main topic of discussion: transients. By a transient is meant 
a solution of the differential equation when there is no force present, but when the 
system is not simply at rest. (Of course, if it is standing still at the origin with no 
force acting, that is a nice problem — it stays there!) Suppose the oscillation starts 
another way : say it was driven by a force for a while, and then we turn off the force. 
What happens then? Let us first get a rough idea of what will happen for a very 
high Q system. So long as a force is acting, the stored energy stays the same, and 
there is a certain amount of work done to maintain it. Now suppose we turn off 
the force, and no more work is being done; then the losses which are eating up the 
energy of the supply are no longer eating up its energy — there is no more driver. 
The losses will have to consume, so to speak, the energy that is stored. Let us 
suppose that Q/2t = 1000. Then the work done per cycle is 1/1000 of the stored 
energy. Is it not reasonable, since it is oscillating with no driving force, that in one 
cycle the system will still lose a thousandth of its energy E, which ordinarily would 
have been supplied from the outside, and that it will continue oscillating, always 
losing 1/1000 of its energy per cycle? So, as a guess, for a relatively high Q 
system, we would suppose that the following equation might be roughly right 
(we will later do it exactly, and it will turn out that it was right!): 

dE/dt uE/Q. (24.8) 


This is rough because it is true only for large Q. In each radian the system loses a 
fraction 1 / Q of the stored energy E. Thus in a given amount of time dt the energy 
will change by an amount u dt/Q, since the number of radians associated with the 
time dt is o> dt. What is the frequency? Let us suppose that the system moves so 
nicely, with hardly any force, that if we let go it will oscillate at essentially the same 
frequency all by itself. So we will guess that w is the resonant frequency co 0 . Then 
we deduce from Eq. (24.8) that the stored energy will vary as 

E = £ 0 e-“o (/ « = E Q e~ yt . (24.9) 


This would be the measure of the energy at any moment. What would the formula 
be, roughly, for the amplitude of the oscillation as a function of the time? The 
same? No! The amount of energy in a spring, say, goes as the square of the dis- 
placement; the kinetic energy goes as the square of the velocity; so the total energy 
goes as the square of the displacement. Thus the displacement, the amplitude of 
oscillation, will decrease half as fast because of the square. In other words, we 
guess that the solution for the damped transient motion will be an oscillation of 
frequency close to the resonance frequency co 0 , in which the amplitude of the sine- 
wave motion will diminish as e~ ytl 2 : 

x — Aye~ r< ^ 2 cos u)ot. (24.10) 

This equation and Fig. 24-1 give us an idea of what we should expect; now let us 
try to analyze the motion precisely by solving the differential equation of the 
motion itself. 

So, starting with Eq. (24.1), with no outside force, how do we solve it? Being 
physicists, we do not have to worry about the method as much as we do about what 
the solution is. Armed with our previous experience, let us try as a solution an 
exponential curve, x = Ae™ 1 . (Why do we try this? It is the easiest thing to 
differentiate!) We put this into (24.1) (with F(r) = 0), using the rule that each 
time we differentiate x with respect to time, we multiply by ia. So it is really quite 
simple to substitute. Thus our equation looks like this: 

(—a 2 + iJa + coo)Ae iat = 0. (24.11) 

The net result must be zero for all times, which is impossible unless (a) ,4 = 0, 
which is no solution at all — it stands still, or (b) 



Fig. 24-1. A damped cosine oscilla- 
tion. 


— a 2 + iacJ + a>o = 0. 


(24.12) 
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If we can solve this and find an a, then we will have a solution in which A need 
not be zero! 

a = 17/2 =h V coo - 7 2 /4. (24.13) 

For a while we shall assume that 7 is fairly small compared with o> 0 , so that 
coo - 7 2 /4 is definitely positive, and there is nothing the matter with taking the 
square root. The only bothersome thing is that we get two solutions! Thus 

«i = 17/2 + VoTo - 7 2 /4 = 17/2 + to 7 (24.14) 

and 

«2 = 17/2 - V«o " 7 2 /4 - 17/2 - w 7 . (24.15) 

Let us consider the first one, supposing that we had not noticed that the square root 
has two possible values. Then we know that a solution for a; is * i — Ae^\ 
where A is any constant whatever. Now, in substituting a u because it is going to 
come so many times and it takes so long to write, we shall call Vcoj; — 7 2 /4 = co 7 . 
Thus ia 1 = -7/2 + iw y , and we get a: = or what Js the samC) 

because of the wonderful properties of an exponential, 

*i = Ae~ ytl2 e iu, yK (24.16) 

First, we recognize this as an oscillation, an oscillation at a frequency w 7 which is 
not exactly the frequency co 0 , but is rather close to co 0 if it is a good system. Second, 
the amplitude of the oscillation is decreasing exponentially! If we take, for instance, 
the real part of (24.16), we get 

xi = Ae- ytI2 cosa>yt. ( 24 . 17 ) 

This is very much like our guessed-at solution (24.10), except that the frequency 

really is o> 7 . This is the only error, so it is the same thing— we have the right idea. 
But everything is not all right! What is not all right is that there is another solution . 

The other solution is a 2) and we see that the difference is only that the sign 
of co 7 is reversed: 

x 2 = Be~ ytj2 e~^K (24.18) 

What does this mean? We shall soon prove that if x x and x 2 are each a possible 
solution of Eq. (24.1) with F — 0, then x% + x 2 is also a solution of the same 
equation! So the general solution jc is of the mathematical form 

x = e~ ytl 2 (Ae io>yt + Be~^ yt ). (24.19) 

Now we may wonder why we bother to give this other solution, since we were so 
happy with the first one all by itself. What is the extra one for, because of course 
we know we should only take the real part? We know that we must take the real 
part, but how did the mathematics know that we only wanted the real part? When 
we had a nonzero driving force F(t ), we put in an artificial force to go with it, and 
the imaginary part of the equation, so to speak, was driven in a definite way. But 
when we put F(t) = 0, our convention that x should be only the real part of 
whatever we write down is purely our own, and the mathematical equations do 
not know it yet. The physical world has a real solution, but the answer that we 
were so happy with before is not real, it is complex. The equation does not know 
that we are arbitrarily going to take the real part, so it has to present us, so to 
speak, with a complex conjugate type of solution, so that by putting them together 
we can make a truly real solution; that is what a 2 is doing for us. In order for x 
to be real, Be^ yt will have to be the complex conjugate of Ae so that the 
imaginary parts disappear. So it turns out that B is the complex conjugate of A y 
and our real solution is 

x = e^^Ae**** + A*e~^ t ). (24.20) 

So our real solution is an oscillation with a phase shift and a damping — just as 
advertised. 
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24-3 Electrical transients 


Now let us see if the above really works. We construct the electrical circuit 
shown in Fig. 24-2, in which we apply to an oscilloscope the voltage across the 
inductance L after we suddenly turn on a voltage by closing the switch S. It is an 
oscillatory circuit, and it generates a transient of some kind. It corresponds to a 
circumstance in which we suddenly apply a force and the system starts to oscillate. 
It is the electrical analog of a damped mechanical oscillator, and we watch the 
oscillation on an oscilloscope, where we should see the curves that we were trying 
to analyze. (The horizontal motion of the oscilloscope is driven at a uniform 
speed, while the vertical motion is the voltage across the inductor. The rest of the 
circuit is only a technical detail. We would like to repeat the experiment many, 
many times, since the persistence of vision is not good enough to see only one 
trace on the screen. So we do the experiment again and again by closing the 
switch 60 times a second ; each time we close the switch, we also start the oscillo- 
scope horizontal sweep, and it draws the curve over and over.) In Figs. 24-3 to 
24-6 we see examples of damped oscillations, actually photographed on an oscillo- 
scope screen. Figure 24-3 shows a damped oscillation in a circuit which has a 
high Q, a small 7. It does not die out very fast; it oscillates many times on the 
way down. 

But let us see what happens as we decrease Q , so that the oscillation dies out 
more rapidly. We can decrease Q by increasing the resistance R in the circuit. 
When we increase the resistance in the circuit, it dies out faster (Fig. 24-4). Then 
if we increase the resistance in the circuit still more, it dies out faster still (Fig. 
24-5). But when we put in more than a certain amount, we cannot see any oscilla- 
tion at all! The question is, is this because our eyes are not good enough? If we 
increase the resistance still more, we get a curve like that of Fig. 24-6, which does 
not appear to have any oscillations, except perhaps one. Now, how can we explain 
that by mathematics? 

The resistance is, of course, proportional to the 7 term in the mechanical 
device. Specifically, 7 is R/L. Now if we increase the 7 in the solutions (24.14) 
and (24.15) that we were so happy with before, chaos sets in when 7/2 exceeds 
o> 0 ; we must write it a different way, as 

1*7/2 + /V 7 2 /4 — coo and i*7/2 — i \/ 7 2 /4 — coq. 


IP 

' I 

— — V- 

Fig. 24-2. An electrical circuit for 
demonstrating transients. 





Those are now the two solutions and, following the same line of mathematical 
reasoning as previously, we again find two solutions: e ^ and If we now 

substitute for we get 

X = ^-( 7/2+V72/4 ""«) j , 

a nice exponential decay with no oscillations. Likewise, the other solution is 

Note that the square root cannot exceed 7/2, because even if co 0 = 0, one term 
just equals the other. But coq is taken away from 7 2 /4, so the square root is less 
than 7/2, and the term in parentheses is, therefore, always a positive number. 
Thank goodness! Why? Because if it were negative, we would find e raised to a 
positive factor times t , which would mean it was exploding! In putting more and 
more resistance into the circuit, we know it is not going to explode — quite the 
contrary. So now we have two solutions, each one by itself a dying exponential, 
but one having a much faster “dying rate” than the other. The general solution is 
of course a combination of the two ; the coefficients in the combination depending 
upon how the motion starts — what the initial conditions of the problem are. In 
the particular way this circuit happens to be starting, the A is negative and the B 
is positive, so we get the difference of two exponential curves. 

Now let us discuss how we can find the two coefficients A and B (or A and A *), 
if we know how the motion was started. 




Figure 24-6 
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Suppose that at / = 0 we know that x = x 0 , and dx/dt = y 0 . If we put 
t = 0, x = x 0 , and dx/dt = v 0 into the expressions 

* = e~ ytl \Ae^ yl + A*e~ iuyt ), 

dx/dt = e~ ytl2 [(-y/2 + ico y )Ae^ yl + (-7/2 - iu y )A*e~ iuye ], 
we find, since e° = e i0 = 1, 


x 0 = A + A* — 2 An, 

v 0 = (-7/2)0 4 + A*) + iu y (A - A*) 

— — 7x 0 /2 -f- iw y (2iAi), 

where A = Ar + iAj, and A* = Ar — iAj. Thus we find 

Ar = Xo/2 
and 

Ai = (v 0 + yxo/2)/2uy. (24.21) 

This completely determines A and A*, and therefore the complete curve of the 
transient solution, in terms of how it begins. Incidentally, we can write the solution 
another way if we note that 

e lt + e~ %> = 2 cos 6 and e ie — e~ ie = 2 i sin 6. 

We may then write the complete solution as 

x = e~ ytl2 x 0 cos to-,/ + V ° sin co-,/1, (24.22) 

<*)y J 

where = +Vc — t 2 /4. This is the mathematical expression for the way 
an oscillation dies out. We shall not make direct use of it, but there are a number 
of points we should like to emphasize that are true in more general cases. 

First of all the behavior of such a system with no external force is expressed 
by a sum, or superposition, of pure exponentials in time (which we wrote as e lat ). 
This is a good solution to try in such circumstances. The values of a may be com- 
plex in general, the imaginary parts representing damping. Finally the intimate 
mathematical relation of the sinusoidal and exponential function discussed in 
Chapter 22 often appears physically as a change from oscillatory to exponential 
behavior when some physical parameter (in this case resistance, 7) exceeds some 
critical value. 
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Linear Systems and Review 


25-1 Linear differential equations 

In this chapter we shall discuss certain aspects of oscillating systems that are 
found somewhat more generally than just in the particular systems we have been 
discussing. For our particular system, the differential equation that we have 
been solving is 

72 j 

m + ym Tt + mw ° x = (25.1) 

Now this particular combination of “operations” on the variable x has the interest- 
ing property that if we substitute (x + j) for x, then we get the sum of the same 
operations on x and y; or, if we multiply x by a, then we get just a times the same 
combination. This is easy to prove. Just as a “shorthand” notation, because we 
get tired of writing down all those letters in (25.1), we shall use the symbol L(x) 
instead. When we see this, it means the left-hand side of (25.1), with x substituted 
in. With this system of writing, L(x + y) would mean the following: 

U.x + y) = m d - X d + ^ + y m - * + ^ + wwo(x + y). (25.2) 

(We underline the L so as to remind ourselves that it is not an ordinary function.) 
We sometimes call this an operator notation , but it makes no difference what we 
call it, it is just “shorthand.” 

Our first statement was that 

L(x + y) = L(x) + L(y), (25.3) 

which of course follows from the fact that a(x + y) = ax + ay , d(x + y)/dt = 
dx/dt + dy/dt, etc. 

Our second statement was, for constant a , 

L(ax) = aL(x). (25.4) 

[Actually, (25.3) and (25.4) are very closely related, because if we put x + x into 
(25.3), this is the same as setting a = 2 in (25.4), and so on.] 

In more complicated problems, there may be more derivatives, and more 
terms in L; the question of interest is whether the two equations (25.3) and (25.4) 
are maintained or not. If they are, we call such a problem a linear problem. In 
this chapter we shall discuss some of the properties that exist because the system 
is linear, to appreciate the generality of some of the results that we have obtained 
in our special analysis of a special equation. 

Now let us study some of the properties of linear differential equations, 
having illustrated them already with the specific equation (25.1) that we have stud- 
ied so closely. The first property of interest is this: suppose that we have to solve 
the differential equation for a transient, the free oscillation with no driving force. 
That is, we want to solve 

L(x) = 0. (25.5) 

Suppose that, by some hook or crook, we have found a particular solution, which 
we shall call x\. That is, we have anxi for which L(^i) — 0. Now we notice that 
ax i is also a solution to the same equation; we can multiply this special solution 
by any constant whatever, and get a new solution. In other words, if we had a 
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motion of a certain “size,” then a motion twice as “big” is again a solution. 
Proof: L(ax{) = aL{x i) = a * 0 = 0. 

Next, suppose that, by hook or by crook, we have not only found one solution 
* 1 , but also another solution, x 2 . (Remember that when we substituted x = e 
for finding the transients, we found two values for a , that is, two solutions, xi 
and x 2 .) Now let us show that the combination (jti + x 2 ) is also a solution. 
In other words, if we put x = Xi + x 2 , x is again a solution of the equation. 
Why? Because, if L(x 0 = 0 and L(x 2 ) = 0, then L(xi + x 2 ) = L(x0 + 
M x 2 ) = 0 + 0 = 0. So if we have found a number of solutions for the motion 
of a linear system we can add them together. 

Combining these two ideas, we see, of course, that we can also add six of one 
and two of the other: if x x is a solution, so is ax\. Therefore any sum of these two 
solutions, such as (curx + f$x 2 ), is also a solution. If we happen to be able to 
find three solutions, then we find that any combination of the three solutions is 
again a solution, and so on. It turns out that the number of what we call inde- 
pendent solutions* that we have obtained for our oscillator problem is only two . 
The number of independent solutions that one finds in the general case depends 
upon what is called the number of degrees of freedom. We shall not discuss this in 
detail now, but if we have a second-order differential equation, there are only two 
independent solutions, and we have found both of them; so we have the most 
general solution. 

Now let us go on to another proposition, which applies to the situation in 
which the systeip is subjected to an outside force. Suppose the equation is 

L(x) = F(t ), (25.6) 

and suppose that we have found a special solution of it. Let us say that Joe’s 
solution is xj , and that L(xj) = F(t). Suppose we want to find yet another 
solution; suppose we add to Joe’s solution one of those that was a solution of the 
free equation (25.5), say x\. Then we see by (25.3) that 

L(xj + x,) = L(xj) + L( Xl ) = F(t) + 0 = F(t). (25.7) 

Therefore, to the “forced” solution we can add any “free” solution, and we still 
have a solutipn. The free solution is called a transient solution. 

When we have no force acting, and suddenly turn one on, we do not im- 
mediately get the steady solution that we solved for with the sine wave solution, 
but for a while there is a transient which sooner or later dies out, if we wait long 
enough. The “forced” solution does not die out, since it keeps on being driven by 
the force. Ultimately, for long periods of time, the solution is unique, but initially 
the motions are different for different circumstances, depending on how the system 
was started. 


25-2 Superposition of solutions 

Now we come to another interesting proposition. Suppose that we have a 
certain particular driving force F a (let us say an oscillatory one with a certain 
co = co a , but our conclusions will be true for any functional form of F a ) and we 
have solved for the forced motion (with or without the transients; it makes no 
difference). Now suppose some other force is acting, let us say and we solve 
the same problem, but for this different force. Then suppose someone comes along 
and says, “I have a new problem for you to solve; I have the force F a + F&.” Can 
we do it? Of course we can do it, because the solution is the sum of the two 
solutions x a and x b for the forces taken separately — a most remarkable circum- 
stance indeed. If we use (25.3), we see that 

L(x a + x b ) = L(x a ) + L(x b ) = F a (t) + F h (t). (25.8) 


* Solutions which cannot be expressed as linear combinations of each other are called 
independent. 
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This is an example of what is called the principle of superposition for linear 
systems, and it is very important. It means the following: if we have a complicated 
force which can be broken up in any convenient manner into a sum of separate 
pieces, each of which is in some way simple, in the sense that for each special piece 
into which we have divided the force we can solve the equation, then the answer is 
available for the whole force, because we may simply add the pieces of the solution 
back together, in the same manner as the total force is compounded out of pieces 
(Fig. 25-1). 

Let us give another example of the principle of superposition. In Chapter 12 
we said that it was one of the great facts of the laws of electricity that if we have a 
certain distribution of charges q a and calculate the electric field E a arising from these 
charges at a certain place P, and if, on the other hand, we have another set of charges 
q b and we calculate the field E b due to these at the corresponding place, then if both 
charge distributions are present at the same time, the field E at P is the sum of 
E a due to one set plus E & due to the other. In other words, if we know the field 
due to a certain charge, then the field due to many charges is merely the vector 
sum of the fields of these charges taken individually. This is exactly analogous to 
the above proposition that if we know the result of two given forces taken at one 
time, then if the forcfe is considered as a sum of them, the response is a sum of the 
corresponding individual responses. 

The reason why this is true in electricity is that the great laws of electricity, 
Maxwell’s equations, which determine the electric field, turn out to be differential 
equations which are linear , i.e., which have the property (25.3). What corresponds 
to the force is the charge generating the electric field, and the equation which deter- 
mines the electric field in terms of the charge is linear. 

As another interesting example of this proposition, let us ask how it is possible 
to “tune in” to a particular radio station at the same time as all the radio stations 
are broadcasting. The radio station transmits, fundamentally, an oscillating 
electric field of very high frequency which acts on our radio antenna. It is true 
that the amplitude of the oscillation of the field is changed, modulated, to carry 
the signal of the voice, but that is very slow, and we are not going to worry about 
it. When one hears “This station is broadcasting at a frequency of 780 kilocycles,” 
this indicates that 780,000 oscillations per Second is the frequency of the electric 
field of the station antenna, and this drives the electrons up and down at that 
frequency in our antenna. Now at the same time we may have another radio 
station in the same town radiating at a different frequency, say 550 kilocycles per 
second; then the electrons in our antenna are also being driven by that frequency. 
Now the question is, how is it that we can separate the signals coming into the one 
radio at 780 kilocycles from those coming in at 550 kilocycles? We certainly do 
not hear both stations at the same time. 

By the principle of superposition, the response of the electric circuit in the 
radio, the first part of which is a linear circuit, to the forces that are acting due to 
the electric field F a + /% is x a + x b . It therefore looks as though we will never disen- 
tangle them. In fact, the very proposition of superposition seems to insist that we 
cannot avoid having both of them in our system. But remember, for a resonant 
circuit, the response curve, the amount of x per unit F, as a function of the fre- 
quency, looks like Fig. 25-3. If it were a very high Q circuit, the response would 
show a very sharp maximum. Now suppose that the two stations are comparable 
in strength, that is, the two forces are of the same magnitude. The response that 
we get is the sum of x a and jt 6 . But, in Fig. 25-3, is tremendous, while x b is 
small. So, in spite of the fact that the two signals are equal in strength, when they 
go through the sharp resonant circuit of the radio tuned for co a , the frequency of 
the transmission of one station, then the response to this station is much greater 
than to the other. Therefore the complete response, with both signals acting, is 
almost all made up of co a , and we have selected the station we want. 

Now what about the tuning? How do we tune it? We change co 0 by changing 
the L or the C of the circuit, because the frequency of the circuit has to do with the 
combination of L and C. In particular, most radios are built so that one can 
change the capacitance. When we retune the radio, we can make a new setting of 



Fig. 25-1. An example of the princi- 
ple of superposition for linear systems. 
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Fig. 25-2. The principle of super- 
position in electrostatics. 



Fig. 25-3. A sharply tuned reso- 
nance curve. 
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Fig. 25-4. A complicated force may 
be treated as a succession of sharp 
impulses. 


the dial, so that the natural frequency of the circuit is shifted, say, to u> c . In those 
circumstances we hear neither one station nor the other; we get silence, provided 
there is no other station at frequency <o c . If we keep on changing the capacitance 
until the resonance curve is at o>*>, then of course we hear the other station. That 
is how radio tuning works; it is again the principle of superposition, combined with 
a resonant response.* 

To conclude this discussion, let us describe qualitatively what happens if we 
proceed further in analyzing a linear problem with a given force, when the force is 
quite complicated. Out of the many possible procedures, there are two especially 
useful general ways that we can solve the problem. One is this: suppose that we 
can solve it for special known forces, such as sine waves of different frequencies. 
We know it is child’s play to solve it for sine waves. So we have the so-called 
“child’s play” cases. Now the question is whether our very complicated force 
can be represented as the sum of two or more “child’s play” forces. In Fig. 25-1 
we already had a fairly complicated curve, and of course we can make it more 
complicated still if we add in more sine waves. So it is certainly possible to obtain 
very complicated curves. And, in fact, the reverse is also true: practically every 
curve can be obtained by adding together infinite numbers of sine waves of different 
wavelengths (or frequencies) for each one of which we know the answer. We just 
have to know how much of each sine wave to put in to make the given F, and then 
our answer, x, is the corresponding sum of the F sine waves, each multiplied by its 
effective ratio of x to F. This method of solution is called the method of Fourier 
transforms or Fourier analysis . We are not going to actually carry out such an 
analysis just now; we only wish to describe the idea involved. 

Another way in which our complicated problem can be solved is the following 
very interesting one. Suppose that, by some tremendous mental effort, it were 
possible to solve our problem for a special force, namely an impulse . The force is 
quickly turned on and then off; it is all over. Actually we need only solve for an 
impulse of some unit strength, any other strength can be gotten by multiplication 
by an appropriate factor. We know that the response x for an impulse is a damped 
oscillation. Now what can we say about some other force, for instance a force 
like that of Fig. 25-4? 

Such a force can be likened to a succession of blows with a hammer. First 
there is no force, and all of a sudden there is a steady force — impulse, impulse, 
impulse, impulse, . . . and then it stops. In other words, we imagine the continuous 
force to be a series of impulses, very close together. Now, we know the result for 
an impulse, so the result for a whole series of impulses will be a whole series of 
damped oscillations: it will be the curve for the first impulse, and then (slightly 
later) we add to that the curve for the second impulse, and the curve for the third 
impulse, and so on. Thus we can represent, mathematically, the complete solution 
for arbitrary functions if we know the answer for an impulse. We get the answer 
for any other force simply by integrating. This method is called the Green's 
function method. A Green’s function is a response to an impulse, and the method 
of analyzing any force by putting together the response of impulses is called the 
Green’s function method. 

The physical principles involved in both of these schemes are so simple, involv- 
ing just the linear equation, that they can be readily understood, but the mathe- 
matical problems that are involved, the complicated integrations and so on, are a 
little too advanced for us to attack right now. You will most likely return to this 
some day when you have had more practice in mathematics. But the idea is very 
simple indeed. 

Finally, we make some remarks on why linear systems are so important. The 
answer is simple: because we can solve them! So most of the time we solve linear 


* In modern superheterodyne receivers the actual operation is more complex. The 
amplifiers are all tuned to a fixed frequency (called IF frequency) and an oscillator of 
variable tunable frequency is combined with the input signal in a nonlinear circuit to 
produce a new frequency (the difference of signal and oscillator frequency) equal to the 
IF frequency, which is then amplified. This will be discussed in Chapter 50. 
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problems. Second (and most important), it turns out that the fundamental laws of 
physics are often linear. The Maxwell equations for the laws of electricity are 
linear, for example. The great laws of quantum mechanics turn out, so far as we 
know, to be linear equations. That is why we spend so much time on linear equa- 
tions: because if we understand linear equations, we are ready, in principle, to 
understand a lot of things. 

We mention another situation where linear equations are found. When 
displacements are small, many functions can be approximated linearly. For example, 
if we have a simple pendulum, the correct equation for its motion is 

d 2 S/dt 2 = ~(g/L) sin 6. (25.9) 

This equation can be solved by elliptic functions, but the easiest way to solve it is 
numerically, as was shown in Chapter 9 on Newton’s Laws of Motion. A non- 
linear equation cannot be solved, ordinarily, any other way but numerically. Now 
for small 6 , sin 6 is practically equal to 6 , and we have a linear equation. It turns out 
that there are many circumstances where small effects are linear: for the example 
here the swing of a pendulum through small arcs. As another example, if we pull 
a little bit on a spring, the force is proportional to the extension. If we pull hard, 
we break the spring, and the force is a completely different function of the distance! 
Linear equations are important. In fact they are so important that perhaps fifty 
percent of the time we are solving linear equations in physics and in engineering. 

25-3 Oscillations in linear systems 

Let us now review the things we have been talking about in the past few 
chapters. It is very easy for the physics of oscillators to become obscured by the 
mathematics. The physics is actually very simple, and if we may forget the mathe- 
matics for a moment we shall see that we can understand almost everything that 
happens in an oscillating system. First, if we have only the spring and the weight, 
it is easy to understand why the system oscillates — it is a consequence of inertia. 
We pull the mass down and the force pulls it back up; as it passes zero, which is 
the place it likes to be, it cannot just suddenly stop ; because of its momentum it 
keeps on going and swings to the other side, and back and forth. So, if there were 
no friction, we would surely expect an oscillatory motion, and indeed we get one. 
But if there is even a little bit of friction, then on the return cycle, the swing will 
not be quite as high as it was the first time. 

Now what happens, cycle by cycle? That depends on the kind and amount 
of friction. Suppose that we could concoct a kind of friction force that always 
remains in the same proportion to the other forces, of inertia and in the spring, as 
the amplitude of oscillation varies. In other words, for smaller oscillations the 
friction should be weaker than for big oscillations. Ordinary friction does not have 
this property, so a special kind of friction must be carefully invented for the very 
purpose of creating a friction that is directly proportional to the velocity — so that 
for big oscillations it is stronger and for small oscillations it is weaker. If we happen 
to have that kind of friction, then at the end of each successive cycle the system is 
in the same condition as it was at the start, except a little bit smaller. All the forces 
are smaller in the same proportion: the spring force is reduced, the inertial effects 
are lower because the accelerations are now weaker, and the friction is less too, 
by our careful design. When we actually have that kind of friction, we find that 
each oscillation is exactly the same as the first one, except reduced in amplitude. 
If the first cycle dropped the amplitude, say, to 90 percent of what it was at the 
start, the next will drop it to 90 percent of 90 percent, and so on: the sizes of the 
oscillations are reduced by the same fraction of themselves in every cycle . An 
exponential function is a curve which does just that. It changes by the same factor 
in each equal interval of time. That is to say, if the amplitude of one cycle, relative 
to the preceding one, is called a, then the amplitude of the next is a 2 , and of the 
next, a 3 . So the amplitude is some constant raised to a power equal to the number 
of cycles traversed : 

A = A 0 a n . (25.10) 
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Fig. 25-5. Resonance curves 
various amounts of friction present. 


with 


But of course n ~ t, so it is perfectly clear that the general solution will be some 
kind of an oscillation, sine or cosine <o/, times an amplitude which goes as b l more 
or less. But b can be written as e ~ c , if b is positive and less than 1. So this is why 
the solution looks like e~ ct cos c ot. It is very simple. 

What happens if the friction is not so artificial; for example, ordinary rubbing 
on a table, so that the friction force is a certain constant amount, and is independent 
of the size of the oscillation that reverses its direction each half-cycle? Then the 
equation is no longer linear, it becomes hard to solve, and must be solved by the 
numerical method given in Chapter 2, or by considering each half-cycle separately. 
The numerical method is the most powerful method of all, and can solve any 
equation. It is only when we have a simple problem that we can use mathematical 
analysis. 

Mathematical analysis is not the grand thing it is said to be; it solves only the 
simplest possible equations. As soon as the equations get a little more complicated, 
just a shade — they cannot be solved analytically. But the numerical method, which 
was advertised at the beginning of the course, can take care of any equation of 
physical interest. 

Next, what about the resonance curve? Why is there a resonance? First, 
imagine for a moment that there is no friction, and we have something which could 
oscillate by itself. If we tapped the pendulum just right each time it went by, of 
course we could make it go like mad. But if we close our eyes and do not watch 
it, and tap at arbitrary equal intervals, what is going to happen? Sometimes we 
will find ourselves tapping when it is going the wrong way. When we happen to 
have the timing just right, of course, each tap is given at just the right time, and 
so it goes higher and higher and higher. So without friction we get a curve which 
looks like the solid curve in Fig. 25-5 for different frequencies. Qualitatively, we 
understand the resonance curve; in order to get the exact shape of the curve it is 
probably just as well to do the mathematics. The curve goes toward infinity as 
oo — > co 0j where co 0 is the natural frequency of the oscillator. 

Now suppose there is a little bit of friction; then when the displacement of 
the oscillator is small, the friction does not affect it much; the resonance curve is 
the same, except when we are near resonance. Instead of becoming infinite near 
resonance, the curve is only going to get so high that the work done by our tapping 
each time is enough to compensate for the loss of energy by friction during the 
cycle. So the top of the curve is rounded off — it does not go to infinity. If there is 
more friction, the top of the curve is rounded off still more. Now someone might 
say, “I thought the widths of the curves depended on the friction.” That is because 
the curve is usually plotted so that the top of the curve is called one unit. However, 
the mathematical expression is even simpler to understand if we just plot all the 
curves on the same scale; then all that happens is that the friction cuts down the 
top! If there is less friction, we can go farther up into that little pinnacle before 
the friction cuts it off, so it looks relatively narrow. That is, the higher the peak 
of the curve, the narrower the width at half the maximum height. 

Finally, we take the case where there is an enormous amount of friction. It 
turns out that if there is too much friction, the system does not oscillate at all. 
The energy in the spring is barely able to move it against the frictional force, and 
so it slowly oozes down to the equilibrium point. 


25-4 Analogs in physics 

The next aspect of this review is to note that masses and springs are not the 
only linear systems; there are others. In particular, there are electrical systems 
called linear circuits, in which we find a complete analog to mechanical systems. 
We did not learn exactly why each of the objects in an electrical circuit works in 
the way it does — that is not to be understood at the present moment; we may assert 
it as an experimentally verifiable fact that they behave as stated. 

For example, let us take the simplest possible circumstance. We have a piece 
of wire, which is just a resistance, and we have applied to it a difference in potential, 
V. Now the V means this : if we carry a charge q through the wire from one terminal 
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to another terminal, the work done is qV. The higher the voltage difference, the 
more work was done when the charge, as we say, “falls” from the high potential 
end of the terminal to the low potential end. So charges release energy in going 
from one end to the other. Now the charges do not simply fly from one end straight 
to the other end ; the atoms in the wire offer some resistance to the current, and this 
resistance obeys the following law for almost all ordinary substances: if there is a 
current /, that is, so and so many charges per second tumbling down, the number 
per second that comes tumbling through the wire is proportional to how hard we 
push them— in other words, proportional to how much voltage there is: 

V = IR = R(dq/dt). (25.11) 

The coefficient R is called the resistance , and the equation is called Ohm’s Law. 
The unit of resistance is the ohm; it is equal to one volt per ampere. In mechanical 
situations, to get such a frictional force in proportion to the velocity is difficult; in 
an electrical system it is very easy, and this law is extremely accurate for most 
metals. 

We are often interested in how much work is done per second, the power loss, 
or the energy liberated by the charges as they tumble down the wire. When we 
carry a charge q through a voltage V, the work is qV 9 so the work done per second 
would be V(dq/dt\ which is the same as VI, or also IR I= I 2 R. This is called 
the heating loss— this is how much heat is generated in the resistance per second, 
by the conservation of energy. It is this heat that makes an ordinary incandescent 
light bulb work. 

Of course, there are other interesting properties of mechanical systems, such 
as the mass (inertia), and it turns out that there is an electrical analog to inertia 
also. It is possible to make something called an inductor , having a property called 
inductance , such that a current, once started through the inductance, does not 
want to stop. It requires a voltage in order to change the current! If the current is 
constant, there is no voltage across an inductance, dc circuits do not know any- 
thing about inductance; it is only when we change the current that the effects of 
inductance show up. The equation is 

V = L(dl/dt) = L(d 2 q/dt 2 ), (25.12) 

and the unit of inductance, called the henry , is such that one volt applied to an 
inductance of one henry produces a change of one ampere per second in the current. 
Equation (25.12) is the analog of Newton’s law for electricity, if we wish: V corre- 
sponds to F, L corresponds to m, and I corresponds to velocity! All of the con- 
sequent equations for the two kinds of systems will have the same derivations 
because, in all the equations, we can change any letter to its corresponding analog 
letter and we get the same equation; everything we deduce will have a correspond- 
ence in the two systems. 

Now what electrical thing corresponds to the mechanical spring, in which there 
was a force proportional to the stretch? If we start with F = kx and replace 
F — » V and x — > q, we get V = aq. It turns out that there is such a thing, in fact 
it is the only one of the three circuit elements we can really understand, because 
we did study a pair of parallel plates, and we found that if there were a charge of 
certain equal, opposite amounts on each plate, the electric field between them would 
be proportional to the size of the charge. So the work done in moving a unit charge 
across the gap from one plate to the other is precisely proportional to the charge. 
This work is the definition of the voltage difference, and it is the line integral of the 
electric field from one plate to another. It turns out, for historical reasons, that 
the constant of proportionality is not called C, but 1 /C. It could have been called 
C, but it was not. So we have 

V = q/C. (25.13) 

The unit of capacitance, C, is the farad ; a charge of one coulomb on each plate of 
a one-farad capacitor yields a voltage difference of one volt. 
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There are our analogies, and the equation corresponding to the oscillating 
circuit becomes the following, by direct substitution of L for m, q for x , etc: 

m(d 2 x/dt 2 ) + lm(dx/dt) + kx = F, (25.14) 

L(d 2 q/dt 2 ) + R(dq/dt) + q/C = V. (25.15) 

Now everything we learned about (25.14) can be transformed to apply to (25.15). 
Every consequence is the same; so much the same that there is a brilliant thing we 
can do. 

Suppose we have a mechanical system which is quite complicated, not just 
one mass on a spring, but several masses on several springs, all hooked together. 
What do we do? Solve it? Perhaps; but look, we can make an electrical circuit 
which will have the same equations as the thing we are trying to analyze! For 
instance, if we wanted to analyze a mass on a spring, why can we not build an 
electrical circuit in which we use an inductance proportional to the mass, a re- 
sistance proportional to the corresponding ml, 1/C proportional to k, all in the 
same ratio? Then, of course, this electrical circuit will be the exact analog of our 
mechanical one, in the sense that whatever q does, in response to V (V also is made 
to correspond to the forces that are acting), so the x would do in response to the 
force! So if we have a complicated thing with a whole lot of interconnecting ele- 
ments, we can interconnect a whole lot of resistances, inductances, and capacitances, 
to imitate the mechanically complicated system. What is the advantage to that? 
One problem is just as hard (or as easy) as the other, because they are exactly 
equivalent. The advantage is not that it is any easier to solve the mathematical 
equations after we discover that we have an electrical circuit (although that is the 
method used by electrical engineers!), but instead, the real reason for looking at 
the analog is that it is easier to make the electrical circuit, and to change something 
in the system. 

Suppose we have designed an automobile, and want to know how much it is 
going to shake when it goes over a certain kind of bumpy road. We build an 
electrical circuit with inductances to represent the inertia of the wheels, spring 
constants as capacitances to represent the springs of the wheels, and resistors to 
represent the shock absorbers, and so n for the other parts of the automobile. 
Then we need a bumpy road. All right, we apply a voltage from a generator, which 
represents such and such a kind of bump, and then look at how the left wheel 
jiggles by measuring the charge on some capacitor. Having measured it (it is easy 
to do), we find that it is bumping too much. Do we need more shock absorber, 
or less shock absorber? With a complicated thing like an automobile, do we 
actually change the shock absorber, and solve it all over again? No!, we simply 
turn a dial; dial number ten is shock absorber number three, so we put in more 
shock absorber. The bumps are worse — all right, we try less. The bumps are still 
worse; we change the stiffness of the spring (dial 17), and we adjust all these things 
electrically , with merely the turn of a knob. 

This is called an analog computer . It is a device which imitates the problem 
that we want to solve by making another problem, which has the same equation, 
but in another circumstance of nature, and which is easier to build, to measure, 
to adjust, and to destroy! 

25-5 Series and parallel impedances 

Finally, there is an important item which is not quite in the nature of review. 
This has to do with an electrical circuit in which there is more than one circuit 
element. For example, when we have an inductor, a resistor, and a capacitor 
connected as in Fig. 24-2, we note that all the charge went through every one of 
the three, so that the current in such a singly connected thing is the same at all 
points along the wire. Since the current is the same in each one, the voltage across 
R is IR, the voltage across L is L{dl/dt ), and so on. So, the total voltage drop is the 
sum of these, and this leads to Eq. (25.15). Using complex numbers, we found 
that we could solve the equation for the steady-state motion in response to a 
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sinusoidal force. We thus found that V = ZI. Now Z is called the impedance of 
this particular circuit. It tells us that if we apply a sinusoidal voltage, P, we get a 
current /. 

Now suppose we have a more complicated circuit which has two pieces, which 
by themselves have certain impedances, Zi and 2 2 , and we put them in series 
(Fig. 25-6a) and apply a voltage. What happens? It is now a little more compli- 
cated, but if /is the current through Z u the voltage difference across Z x is P' 1 = 
IZu similarly, the voltage across Z 2 is V 2 = /Z 2 . The same current goes through 
both. Therefore the total voltage is the sum of the voltages across the two sections 
and is equal to V — V\ + V 2 = (Zj -f~ Z 2 )/. This means that the voltage on 
the complete circuit can be written V = IZ, where the Z, of the combined system 
in series is the sum of the two 2’s of the separate pieces: 

2. = + ^2- (25.16) 


This is not the only way things may be connected. We may also connect them 
in another way, called a parallel connection (Fig. 25-6b). Now we see that a given 
voltage across the terminals, if the connecting wires are perfect conductors, is 
effectively applied to both of the impedances, and will cause currents in each 
independently. Therefore the current through 2 X is equal to 1 1 = V/2i. The 
current in 2 2 is / 2 = V /Z 2 . It is the same voltage. Now the total current which 
is supplied to the terminals is the sum of the currents in the two sections: / = 
V/Z j + PyZ 2 . This can be written as 


Thus 


" (i/zo + QtU 

l/Z p = l/Zi + 1/Z 2 . (25.17) 



(a) Series 



(b) feral lei 


Fig. 25-6. Two impedances, con- 
nected in series and in parallel. 


More complicated circuits can sometimes be simplified by taking pieces of 
them, working out the succession of impedances of the pieces, and combining the 
circuit together step by step, using the above rules. If we have any kind of circuit 
with many impedances connected in all kinds of ways, and if we include the volt- 
ages in the form of little generators having no impedance (when we pass charge 
through it, the generator adds a voltage V), then the following principles apply: 
(1) At any junction, the sum of the currents into a junction is zero. That is, all 
the current which comes in must come back out. (2) If we carry a charge around 
any loop, and back to where it started, the net work done is zero. These rules are 
called Kirchhofps laws for electrical circuits. Their systematic application to 
complicated circuits often simplifies the analysis of such circuits. We mention 
them here in conjunction with Eqs. (25.16) and (25.17), in case you have already 
come across such circuits that you need to analyze in laboratory work. They will 
be discussed again in more detail next year. 
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26 


Optics: The Principle of Least Time 


26-1 Light 

This is the first of a number of chapters on the subject of electromagnetic 
radiation . Light, with which we see, is only one small part of a vast spectrum of 
the same kind of thing, the various parts of this spectrum being distinguished by 
different \alues of a certain quantity which varies. This variable quantity could 
be called the “wavelength.” As it varies in the visible spectrum, the light apparently 
changes color from red to violet. If we explore the spectrum systematically, from 
long wavelengths toward shorter ones, we would begin with what are usually called 
radiowaves. Radiowaves are technically available in a wide range of wavelengths, 
some even longer than those used in regular broadcasts; regular broadcasts have 
wavelengths corresponding to about 500 meters. Then there are the so-called 
“short waves,” i.e., radar waves, millimeter waves, and so on. There are no actual 
boundaries between one range of wavelengths and another, because nature did 
not present us with sharp edges. The number associated with a given name for 
the waves are only approximate and, of course, so are the names we give to the 
different ranges. 

Then, a long way down through the millimeter waves, we come to what we 
call the infrared , and thence to the visible spectrum. Then going in the other 
direction, we get into a region which is called the ultraviolet. Where the ultraviolet 
stops, the x-rays begin, but we cannot define precisely where this is; it is roughly at 
10“ 8 m, or 10 -2 p. These are “soft” x-rays ; then there are ordinary x-rays and very 
hard x-rays; then 7-rays, and so on, for smaller and smaller values of this dimension 
called the wavelength. 

Within this vast range of wavelengths, there are three or more regions of 
approximation which are especially interesting. In one of these, a condition exists 
in which the wavelengths involved are very small compared with the dimensions 
of the equipment available for their study; furthermore, the photon energies, using 
the quantum theory, are small compared with the energy sensitivity of the equip- 
ment. Under these conditions we can make a rough first approximation by a 
method called geometrical optics. If, on the other hand, the wavelengths are com- 
parable to the dimensions of the equipment, which is difficult to arrange with 
visible light but easier with radio waves, and if the photon energies are still negligi- 
bly small, then a very useful approximation can be made by studying the behavior 
of the waves, still disregarding the quantum mechanics. This method is based on 
the classical theory of electromagnetic radiation , which will be discussed in a later 
chapter. Next, if we go to very short wavelengths, where we can disregard the 
wave character but the photons have a very large energy compared with the 
sensitivity of our equipment, things get simple again. This is the simple photon 
picture, which we will describe only very roughly. The complete picture, which 
unifies the whole thing into one model, will not be available to us for a long time. 

In this chapter our discussion is limited to the geometrical optics region, in 
which we forget about the wavelength and the photon character of the light, which 
will all be explained in due time. We do not even bother to say what the light is, 
but just find out how it behaves on a large scale compared with the dimensions of 
interest. All this must be said in order to emphasize the fact that what we are going 
to talk about is only a very crude approximation; this is one of the chapters that 
we shall have to “unlearn” again. But we shall very quickly unlearn it, because 
we shall almost immediately go on to a more accurate method. 


26-1 Light 

26-2 Reflection and refraction 

26-3 Fermat’s principle of least time 

26-4 Applications of Fermat’s 
principle 

26-5 A more precise statement of 
Fermat’s principle 

26-6 How it works 
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Although geometrical optics is just an approximation, it is of very great 
importance technically and of great interest historically. We shall present this 
subject more historically than some of the others in order to give some idea of the 
development of a physical theory or physical idea. 

First, light is, of course, familiar to everybody, and has been familiar since 
time immemorial. Now one problem is, by what process do we see light? There 
have been many theories, but it finally settled down to one, which is that there is 
something which enters the eye — which bounces off objects into the eye. We have 
heard that idea so long that we accept it, and it is almost impossible for us to 
realize that very intelligent men have proposed contrary theories — that something 
comes out of the eye and feels for the object, for example. Some other important 
observations are that, as light goes from one place to another, it goes in straight 
lines , if there is nothing in the way, and that the rays do not seem to interfere with 
one another. That is, light is crisscrossing in all directions in the room, but the 
light that is passing across our line of vision does not affect the light that comes 
to us from some object. This was once a most powerful argument against the 
corpuscular theory; it was used by Huygens. If light were like a lot of arrows 
shooting along, how could other arrows go through them so easily? Such philo- 
sophical arguments are not of much weight. One could always say that light is 
made up of arrows which go through each other! 



Fig. 26—1. The angle of incidence is 
equal to the angle of reflection. 



Fig. 26-2. A light ray is refracted 
when it passes from one medium into 
another. 


Table 26-1 


Angle in air 

10 ° 

20 ° 

30° 

40° 

50° 

60° 

70° 

80° 


Angle in water 
8 ° 

15-1/2° 

22 - 1 / 2 ° 

29° 

35° 

40-1/2° 

45-1/2° 

50° 


26-2 Reflection and refraction 

The discussion above gives enough of the basic idea of geometrical optics — 
now we have to go a little further into the quantitative features. Thus far we have 
light going only in straight lines between two points; now let us study the behavior 
of light when it hits various materials. The simplest object is a mirror, and the 
law for a mirror is that when the light hits the mirror, it does not continue in a 
straight line, but bounces off the mirror into a new straight line, which changes 
when we change the inclination of the mirror. The question for the ancients was, 
what is the relation between the two angles involved? This is a very simple relation, 
discovered long ago. The light striking a mirror travels in such a way that the two 
angles, between each beam and the mirror, are equal. For some reason it is 
customary to measure the angles from the normal to the mirror surface. Thus the 
so-called law of reflection is 

Oi = 8 r . (26.1) 

That is a simple enough proposition, but a more difficult problem is encoun- 
tered when light goes from one medium into another, for example from air into 
water; here also, we see that it does not go in a straight line. In the water the ray 
is at an inclination to its path in the air; if we change the angle 0; so that it comes 
down more nearly vertically, then the angle of “breakage” is not as great. But 
if we tilt the beam of light at quite an angle, then the deviation angle is very large. 
The question is, what is the relation of one angle to the other? This also puzzled 
the ancients for a long time, and here they never found the answer! It is, however, 
one of the few places in all of Greek physics that one may find any experimental 
results listed. Claudius Ptolemy made a list of the angle in water for each of a 
number of different angles in air. Table 26-1 shows the angles in the air, in degrees, 
and the corresponding angle as measured in the water. (Ordinarily it is said that 
Greek scientists never did any experiments. But it would be impossible to obtain 
this table of values without knowing the right law, except by experiment. It 
should be noted, however, that these do not represent independent careful measure- 
ments for each angle but only some numbers interpolated from a few measure- 
ments, for they all fit perfectly on a parabola.) 

This, then, is one of the important steps in the development of physical law: 
first we observe an effect, then we measure it and list it in a table; then we try to 
find the rule by which one thing can be connected with another. The above 
numerical table was made in 140 a.d., but it was not until 1621 that someone 
finally found the rule connecting the two angles! The rule, found by Willebrord 
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Snell, a Dutch mathematician, is as follows: if is the angle in air and B r is the 
angle in the water, then it turns out that the sine of 0* is equal to some constant 
multiple of the sine of 0 r : 

sin Oi = n sin 0 r . (26.2) 

For water the number n is approximately 1.33. Equation (26.2) is called Snell's 
law; it permits us to predict how the light is going to bend when it goes from air 
into water. Table 26-2 shows the angles in air and in water according to Snell’s 
law. Note the remarkable agreement with Ptolemy’s list. 


26-3 Fermat’s principle of least time 

Now in the further development of science, we want more than just a formula. 
First we have an observation, then we have numbers that we measure, then we 
have a law which summarizes all the numbers. But the real glory of science is that 
we can find a way of thinking such that the law is evident. 

The first way of thinking that made the law about the behavior of light evident 
was discovered by Fermat in about 1650, and it is called the principle of least time , 
or Fermat's principle. His idea is this: that out of all possible paths that it might 
take to get from one point to another, light takes the path which requires the 
shortest time. 

Let us first show that this is true for the case of the mirror, that this simple 
principle contains both the law of straight-line propagation and the law for the 
mirror. So, we are growing in our understanding! Let us try to find the solution 
to the following problem. In Fig. 26-3 are shown two points, A and 5, and a 
plane mirror, MM f . What is the way to get from A to B in the shortest time? 
The answer is to go straight from A to B! But if we add the extra rule that the light 
has to strike the mirror and come back in the shortest time, the answer is not so 
easy. One way would be to go as quickly as possible to the mirror and then go to 
5, on the path ADB. Of course, we then have a long path DB . If we move over a 
little to the right, to £, we slightly increase the first distance, but we greatly decrease 
the second one, and so the total path length, and therefore the travel time, is less. 
How can we find the point C for which the time is the shortest? We can find it 
very nicely by a geometrical trick. 

We construct on the other side of MM f an artificial point B f y which is the 
same distance below the plane MM f as the point B is above the plane. Then we 
draw the line EB f . Now because BFM is a right angle and BF = FB\ EB is 
equal to EB r . Therefore the sum of the two distances, AE + EB , which is propor- 
tional to the time it will take if the light travels with constant velocity, is also the 
sum of the two lengths AE + EB f . Therefore the problem becomes, when is the 
sum of these two lengths the least? The answer is easy: when the line goes through 
point C as a straight line from A to B'l In other words, we have to find the point 
where we go toward the artificial point, and that will be the correct one. Now if 
ACB f is a straight line, then angle BCF is equal to angle B'CF and thence to angle 
ACM. Thus the statement that the angle of incidence equals the angle of reflection 
is equivalent to the statement that the light goes to the mirror in such a way that 
it comes back to the point B f in the least possible time . Originally, the statement 
was made by Hero of Alexandria that the light travels in such a way that it goes 
to the mirror and to the other point in the shortest possible distance , so it is not a 
modern theory. It was this that inspired Fermat to suggest to himself that perhaps 
refraction operated on a similar basis. But for refraction, light obviously does not 
use the path of shortest distance , so Fermat tried the idea that it takes the shortest 
time. 

Before we go on to analyze refraction, we should make one more remark 
about the mirror. If we have a source of light at the point B and it sends light to- 
ward the mirror, then we see that the light which goes to A from the point B comes 
to A in exactly the same manner as it would have come to A if there were an object 
at B\ and no mirror. Now of course the eye detects only the light which enters it 
physically, so if we have an object at B and a mirror which makes the light come 
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Table 26-2 


Angle in air 

Angle in water 

10° 

7-1/2° 

20° 

15° 

o 

o 

22° 

40° 

29° 

50° 

35° 

60° 

40-1/2° 

70° 

45° 

80° 

48° 



Fig. 26-3. Illustration of the principle 
of least time. 




Fig. 26-4. Illustration of Fermat’s 
principle for refraction. 



x c x 


Fig. 26-5. The minimum time corre- 
sponds to point C, but nearby points 
correspond to nearly the same time. 


into the eye in exactly the same manner as it would have come into the eye if the 
object were at B\ then the eye-brain system interprets that, assuming it does not 
know too much, as being an object at B '. So the illusion that there is an object 
behind the mirror is merely due to the fact that the light which is entering the 
eye is entering in exactly the same manner, physically, as it would have entered 
had there been an object back there (except for the dirt on the mirror, and our 
knowledge of the existence of the mirror, and so on, which is corrected in the brain). 

Now let us demonstrate that the principle of least time will give Snell’s law 
of refraction. We must, however, make an assumption about the speed of light 
in water. We shall assume that the speed of light in water is lower than the speed 
of light in air by a certain factor, n. 

In Fig. 26-4, our problem is again to go from A to B in the shortest time . 
To illustrate that the best thing to do is not just to go in a straight line, let us 
imagine that a beautiful girl has fallen out of a boat, and she is screaming for help 
in the water at point B . The line marked X is the shoreline. We are at point A on land, 
and we see the accident, and we can run and can also swim. But we can run faster 
than we can swim. What do we do? Do we go in a straight line? (Yes, no doubt!) 
However, by using a little more intelligence we would realize that it would be advan- 
tageous to travel a little greater distance on land in order to decrease the distance 
in the water, because we go so much slower in the water. (Following this line of 
reasoning out, we would say the right thing to do is to compute very carefully 
what should be done!) At any rate, let us try to show that the final solution to the 
problem is the path ACB , and that this path takes the shortest time of all possible 
ones. If it is the shortest path, that means that if we take any other, it will be longer. 
So, if we were to plot the time it takes against the position of point X, we would get 
a curve something like that shown in Fig. 26-5, where point C corresponds to the 
shortest of all possible times. This means that if we move the point X to points 
near C, in the first approximation there is essentially no change in time because the 
slope is zero at the bottom of the curve. So our way of finding the law will be to 
consider that we move the place by a very small amount, and to demand that 
there be essentially no change in time. (Of course there is an infinitesimal change 
of a second order ; we ought to have a positive increase for displacements in either 
direction from C.) So we consider a nearby point X and we calculate how long it 
would take to go from A to B by the two paths, and compare the new path with 
the old path. It is very easy to do. We want the difference, of course, to be nearly 
zero if the distance XC is short. First, look at the path on land. If we draw a 
perpendicular XE , we see that this path is shortened by the amount EC. Let us 
say we gain by not having to go that extra distance. On the other hand, in the water, 
by drawing a corresponding perpendicular, CF, we find that we have to go the 
extra distance XF , and that is what we lose. Or, in time , we gain the time it would 
have taken to go the distance FC, but we lose the time it would have taken to go 
the distance XF. Those times must be equal since, in the first approximation, there 
is to be no change in time. But supposing that in the water the speed is l/n times 
as fast as in air, then we must have 

EC = n * XF. (26.3) 

Therefore we see that when we have the right point, XC sin EXC = n • XC sin XCF 
or, cancelling the common hypotenuse length XC and noting that 

EXC = ECN = 6i and XCF = BCN' = 0 r , 


we have 


sin Oi = n sin 0 r . 


(26.4) 


So we see that to get from one point to another in the least time when the ratio 
of speeds is n , the light should enter at such an angle that the ratio of the sines of 
the angles 0 t - and 0 r is the ratio of the speeds in the two media. 
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26-4 Applications of Fermat’s principle 

Now let us consider some of the interesting consequences of the principle of 
least time. First is the principle of reciprocity. If to go from A to B we have found 
the path of the least time, then to go in the opposite direction (assuming that light 
goes at the same speed in any direction), the shortest time will be the same path, 
and therefore, if light can be sent one way, it can be sent the other way. 

An example of interest is a glass block with plane parallel faces, set at an angle 
to a light beam. Light, in going through the block from a point A to a point B 
(Fig. 26-6) does not go through in a straight line, but instead it decreases the time 
in the block by making the angle in the block less inclined, although it loses a little 
bit in the air. The beam is simply displaced parallel to itself because the angles 
in and out are the same. 

A third interesting phenomenon is the fact that when we see the sun setting, 
it is already below the horizon! It does not look as though it is below the horizon, 
but it is (Fig. 26-7). The earth’s atmosphere is thin at the top and dense at the 
bottom. Light travels more slowly in air than it does in a vacuum, and so the light 
of the sun can get to point S beyond the horizon more quickly if, instead of just 
going in a straight line, it avoids the dense regions where it goes slowly by getting 
through them at a steeper tilt. When it appears to go below the horizon, it is 
actually already well below the horizon. Another example of this phenomenon is 
the mirage that one often sees while driving on hot roads. One sees “water” on the 
road, but when he gets there, it is as dry as the desert! The phenomenon is the 
following. What we are really seeing is the sky light “reflected” on the road: 
light from the sky, heading for the road, can end up in the eye, as shown in Fig. 
26-8. Why? The air is very hot just above the road but it is cooler up higher. 
Hotter air is more expanded than cooler air and is thinner, and this decreases the 
speed of light less. That is to say, light goes faster in the hot region than in the 
cool region. Therefore, instead of the light deciding to come in the straightforward 
way, it also has a least-time path by which it goes into the region where it goes 
faster for awhile, in order to save time. So, it can go in a curve. 

As another important example of the principle of least time, suppose that we 
would like to arrange a situation where we have all the light that comes out of one 
point, F, collected back together at another point, P f (Fig. 26-9). That means, 
of course, that the light can go in a straight line from P to P\ That is all right. 
But how can we arrange that not only does it go straight, but also so that the light 
starting out from P toward Q also ends up at P'l We want to bring all the light 
back to what we call a focus. How? If the light always takes the path of least time, 
then certainly it should not want to go over all these other paths. The only way 
that the light can be perfectly satisfied to take several adjacent paths is to make 
those times exactly equal! Otherwise, it would select the one of least time. There- 
fore the problem of making a focusing system is merely to arrange a device so that 
it takes the same time for the light to go on all the different paths! 

This is easy to do. Suppose that we had a piece of glass in which light goes 
slower than it does in the air (Fig. 26-10). Now consider a ray which goes in air 
in the path PQP'. That is a longer path than from P directly to P f and no doubt 
takes a longer time. But if we were to insert a piece of glass of just the right thick- 
ness (we shall later figure out how thick) it might exactly compensate the excess 
time that it would take the light to go at an angle! In those circumstances we can 
arrange that the time the light takes to go straight through is the same as the time 
it takes to go in the path PQP r . Likewise, if we take a ray PRR'P' which is partly 
inclined, it is not quite as long as PQP f , and we do not have to compensate as 
much as for the straight one, but we do have to compensate somewhat. We end 
up with a piece of glass that looks like Fig. 26-10. With this shape, all the light 
which comes from P will go to P\ This, of course, is well known to us, and we call 
such a device a converging lens. In the next chapter we shall actually calculate 
what shape the lens has to have to make a perfect focus. 

Take another example: suppose we wish to arrange some mirrors so that the 
light from P always goes to P f (Fig. 26-1 1). On any path, it goes to some mirror 
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Fig. 26-6. A beam of light is offset as 
it passes through a transparent block. 


TO APPARENT SUN 

X 



Fig. 26-7. Near the horizon, the ap- 
parent sun is higher than the true sun by 
about 1/2 degree. 



HOT ROAD OR SAND 


Fig. 26-8. A mirage. 
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Fig. 26-9. An optical “black box.” 
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Fig. 26-10. A focusing optical sys- 
tem. 



Fig. 26-1 1. An ellipsoidal mirror. 



Fig. 26-12. 


A paraboloidal mirror. 


and comes back, and all times must be equal. Here the light always travels in air, 
so the time and the distance are proportional. Therefore the statement that all 
the times are the same is the same as the statement that the total distance is the 
same. Thus the sum of the two distances r x and r 2 must be a constant. An ellipse 
is that curve which has the property that the sum of the distances from two points 
is a constant for every point on the ellipse; thus we can be sure that the light from 
one focus will come to the other. 

The same principle works for gathering the light of a star. The great 200-inch 
Palomar telescope is built on the following principle. Imagine a star billions of 
miles away; we would like to cause all the light that comes in to come to a focus. 
Of course we cannot draw the rays that go all the way up to the star, but we still 
want to check whether the times are equal. Of course we know that when the vari- 
ous rays have arrived at some plane KK\ perpendicular to the rays, all the times 
in this plane are equal (Fig. 26-12). The rays must then come down to the mirror 
and proceed toward P' in equal times. That is, we must find a curve which has the 
property that the sum of the distances XX' + X'P' is a constant, no matter where 
X is chosen. An easy way to find it is to extend the length of the line XX' down to 
a plane LL'. Now if we arrange our curve so that A' A" — A'P\ B'B" — B'P' y 
C'C" = C'P', and so on, we will have our curve, because then of course, AA' + 
A'P' = AA' + A' A" will be constant. Thus our curve is the locus of all points 
equidistant from a line and a point. Such a curve is called a parabola ; the mirror 
is made in the shape of a parabola. 

The above examples illustrate the principle upon which such optical devices 
can be designed. The exact curves can be calculated using the principle that, to 
focus perfectly, the travel times must be exactly equal for all light rays, as well as 
being less than for any other nearby path. 

We shall discuss these focusing optical devices further in the next chapter; 
let us now discuss the further development of the theory. When a new theoretical 
principle is developed, such as the principle of least time, our first inclination might 
be to say, “Well, that is very pretty; it is delightful; but the question is, does it 
help at all in understanding the physics?” Someone may say, “Yes, look at how 
many things we can now understand!” Another says, “Very well, but I can under- 
stand mirrors, too. I need a curve such that every tangent plane makes equal angles 
with the two rays. I can figure out a lens, too, because every ray that comes to it 
is bent through an angle given by Snell’s law.” Evidently the statement of least 
time and the statement that angles are equal on reflection, and that the sines of 
the angles are proportional on refraction, are the same. So is it merely a philo- 
sophical question, or one of beauty? There can be arguments on both sides. 


However, the importance of a powerful principle is that it predicts new things . 

It is easy to show that there are a number of new things predicted by Fermat’s 
principle. First, suppose that there are three media, glass, water, and air, and we 
perform a refraction experiment and measure the index n for one medium against 
another. Let us call n 12 the index of air (1) against water (2) ; /z 1 3 the index of air 
(1) against glass (3). If we measured water against glass, we should find another 
index, which we shall call w 23 . But there is no a priori reason why there should be 
any connection between n 12 , n 13 , and « 23 . On the other hand, according to the 
idea of least time, there is a definite relationship. The index n \ 2 is the ratio of two 
things, the speed in air to the speed in water; n 13 is the ratio of the speed in air to 
the speed in glass; « 23 is the ratio of the speed in water to the speed in glass. 
Therefore we cancel out the air, and get 


V_2 = Vl/V* = «13 

Vs Vi/v 2 «12 


(26.5) 


In other words, we predict that the index for a new pair of materials can be ob- 
tained from the indexes of the individual materials, both against air or against 
vacuum. So if we measure the speed of light in all materials, and from this get a 
single number for each material, namely its index relative to vacuum, called m 
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{n i is the speed in air relative to the speed in vacuum, etc.), then our formula is 
easy. The index for any two materials i and j is 


nij 


in = iy 
vj «* 


(26.6) 


Using only Snell’s law, there is no basis for a prediction of this kind.* But of 
course this prediction works. The relation (26.5) was known very early, and was a 
very strong argument for the principle of least time. 

Another argument for the principle of least time, another prediction, is that 
if we measure the speed of light in water, it will be lower than in air. This is a 
prediction of a completely different type. It is a brilliant prediction, because all 
we have so far measured are angles', here we have a theoretical prediction which is 
quite different from the observations from which Fermat deduced the idea of least 
time. It turns out, in fact, that the speed in water is slower than the speed in air, 
by just the proportion that is needed to get the right index! 


26-5 A more precise statement of Fermat’s principle 

Actually, we must make the statement of the principle of least time a little 
more accurately. It was not stated correctly above. It is incorrectly called the 
principle of least time and we have gone along with the incorrect description for 
convenience, but we must now see what the correct statement is. Suppose we had 
a mirror as in Fig. 26-3. What makes the light think it has to go to the mirror? 
The path of least time is clearly AB. So some people might say, “Sometimes it is a 
maximum time.” It is not a maximum time, because certainly a curved path would 
take a still longer time! The correct statement is the following: a ray going in a 
certain particular path has the property that if we make a small change (say a 
one percent shift) in the ray in any manner whatever, say in the location at which 
it comes to the mirror, or the shape of the curve, or anything, there will be no first- 
order change in the time; there will be only a second- order change in the time. 
In other words, the principle is that light takes a path such that there are many 
other paths nearby which take almost exactly the same time. 

The following is another difficulty with the principle of least time, and one 
which people who do not like this kind of a theory could never stomach. With 
Snell’s theory we can “understand” light. Light goes along, it sees a surface, it 
bends because it does something at the surface. The idea of causality, that it goes 
from one point to another, and another, and so on, is easy to understand. But 
the principle of least time is a completely different philosophical principle about 
the way nature works. Instead of saying it is a causal thing, that when we do one 
thing, something else happens, and so on, it says this: we set up the situation, and 
light decides which is the shortest time, or the extreme one, and chooses that path. 
But what does it do, how does it find out? Does it smell the nearby paths, and check 
them against each other? The answer is, yes, it does, in a way. That is the feature 
which is, of course, not known in geometrical optics, and which is involved in 
the idea of wavelength ; the wavelength tells us approximately how far away the 
light must “smell” the path in order to check it. It is hard to demonstrate this fact 
on a large scale with light, because the wavelengths are so terribly short. But with 
radiowaves, say 3-cm waves, the distances over which the radiowaves are checking 
are larger. If we have a source of radiowaves, a detector, and a slit, as in Fig. 26-13, 
the rays of course go from 5 to D because it is a straight line, and if we close down 
the slit it is all right — they still go. But now if we move the detector aside to D f , 
the waves will not go through the wide slit from S to Z>', because they check several 
paths nearby, and say, “No, my friend, those all correspond to different times.” 
On the other hand, if we prevent the radiation from checking the paths by closing 
the slit down to a very narrow crack, then there is but one path available, and the 



Fig. 26-13. The passage of radio- 
waves through a narrow slit. 


* Although it can be deduced if the additional assumption is made that adding a layer 
of one substance to the surface of another does not change the eventual angle of refraction 
in the latter material. 
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radiation takes it! With a narrow slit, more radiation reaches D f than reaches it 
with a wide slit! 

One can do the same thing with light, but it is hard to demonstrate on a large 
scale. The effect can be seen under the following simple conditions. Find a small, 
bright light, say an unfrosted bulb in a street light far away or the reflection of the 
sun in a curved automobile bumper. Then put two fingers in front of one eye, so 
as to look through the crack, and squeeze the light to zero very gently. You will 
see that the image of the light, which was a little dot before, becomes quite elon- 
gated, and even stretches into a long line. The reason is that the fingers are very 
close together, and the light which is supposed to come in a straight line is spread 
out at an angle, so that when it comes into the eye it comes in from several direc- 
tions. Also you will notice, if you are very careful, side maxima, a lot of fringes 
along the edges too, Furthermore, the whole thing is colored. All of this will be 
explained in due time, but for the present it is a demonstration that light does not 
always go in straight lines, and it is one that is very easily performed. 


26-6 How it works 



Fig. 26-1 4. The summation of proba- 
bility amplitudes for many neighboring 
paths. 


Finally, we give a very crude view of what actually happens, how the whole 
thing really works, from what we now believe is the correct, quantum-dynamically 
accurate viewpoint, but of course only qualitatively described. In following the 
light from A to B in Fig. 26-3, we find that the light does not seem to be in the 
form of waves at all. Instead the rays seem to be made up of photons, and they 
actually produce clicks in a photon counter, if we are using one. The brightness of 
the light is proportional to the average number of photons that come in per second, 
and what we calculate is the chance that a photon gets from A to B, say by hitting 
the mirror. The law for that chance is the following very strange one. Take any 
path and find the time for that path; then make a complex number, or draw a little 
complex vector, pe l \ whose angle 6 is proportional to the time . The number of 
turns per second is the frequency of the light. Now take another path; it has, for 
instance, a different time, so the vector for it is turned through a different angle — 
the angle being always proportional to the time. Take all the available paths and 
add on a little vector for each one; then the answer is that the chance of arrival 
of the photon is proportional to the square of the length of the final vector, from 
the beginning to the end! 

Now let us show how this implies the principle of least time for a mirror. We 
consider all rays, all possible paths ADB, AEB , ACB , etc., in Fig. 26-3. The path 
ADB makes a certain small contribution, but the next path, AEB , takes a quite 
different time, so its angle 6 is quite different. Let us say that point C corresponds to 
minimum time, where if we change the paths the times do not change. So for awhile 
the times do change, and then they begin to change less and less as we get near point 
C (Fig. 26-14). So the arrows which we have to add are coming almost exactly at 
the same angle for awhile near C, and then gradually the time begins to increase 
again, and the phases go around the other way, and so on. Eventually, we have 
quite a tight knot. The total probability is the distance from one end to the other, 
squared. Almost all of that accumulated probability occurs in the region where all 
the arrows are in the same direction (or in the same phase). All the contributions 
from the paths which have very different times as we change the path, cancel them- 
selves out by pointing in different directions. That is why, if we hide the extreme 
parts of the mirror, it still reflects almost exactly the same, because all we did was 
to take out a piece of the diagram inside the spiral ends, and that makes only a 
very small change in the light. So this is the relationship between the ultimate 
picture of photons with a probability of arrival depending on an accumulation of 
arrows, and the principle of least time. 
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27 

Geometrical Optics 


27-1 Introduction 

In this chapter we shall discuss some elementary applications of the ideas of 
the previous chapter to a number of practical devices, using the approximation 
called geometrical optics . This is a most useful approximation in the practical 
design of many optical systems and instruments. Geometrical optics is either 
very simple or else it is very complicated. By that we mean that we can either study 
it only superficially, so that we can design instruments roughly, using rules that 
are so simple that we hardly need deal with them here at all, since they are practi- 
cally of high school level, or else, if we want to know about the small errors in 
lenses and similar details, the subject gets so complicated that it is too advanced 
to discuss here! If one has an actual, detailed problem in lens design, including 
analysis of aberrations, then he is advised to read about the subject or else simply 
to trace the rays through the various surfaces (which is what the book tells how to 
do), using the law of refraction from one side to the other, and to find out where 
they come out and see if they form a satisfactory image. People have said that this 
is too tedious, but today, with computing machines, it is the right way to do it. 
One can set up the problem and make the calculation for one ray after another 
very easily. So the subject is really ultimately quite simple, and involves no new 
principles. Furthermore, it turns out that the rules of either elementary or advanced 
optics are seldom characteristic of other fields, so that there is no special reason 
to follow the subject very far, with one important exception. 

The most advanced and abstract theory of geometrical optics was worked 
out by Hamilton, and it turns out that this has very important applications in 
mechanics. It is actually even more important in mechanics than it is in optics, 
and so we leave Hamilton’s theory for the subject of advanced analytical mechanics, 
which is studied in the senior year or in graduate school. So, appreciating that 
geometrical optics contributes very little, except for its own sake, we now go on to 
discuss the elementary properties of simple optical systems on the basis of the 
principles outlined in the last chapter. 

In order to go on, we must have one geometrical formula, which is the follow- 
ing: if we have a triangle with a small altitude h and a long base d } then the diagonal 
.9 (we are going to need it to find the difference in time between two different routes) 
is longer than the base (Fig. 27-1). How much longer? The difference A = s — d 
can be found in a number of ways. One way is this. We see that s 2 — d 2 — h 2 , 
or (s — d)(s + d) = h 2 . But s — d = A, and s + d ~ 2s. Thus 

A - h 2 /2s. (27.1) 

This is all the geometry we need to discuss the formation of images by curved 
surfaces! 

27-2 The focal length of a spherical surface 

The first and simplest situation to discuss is a single refracting surface, sep- 
arating two media with different indices of refraction (Fig. 27-2). We leave the 
case of arbitrary indices of refraction to the student, because ideas are always the 
most important thing, not the specific situation, and the problem is easy enough 
to do in any case. So we shall suppose that, on the left, the speed is 1 and on the 
right it is 1 /«, where n is the index of refraction. The light travels more slowly in 
the glass by a factor n. 
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AIR GLASS 

Fig. 27-2. Focusing by a single re- 
fracting surface. 
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Now suppose that we have a point at O , at a distance 5 from the front surface 
of the glass, and another point O' at a distance inside the glass, and we desire to 
arrange the curved surface in such a manner that every ray from O which hits the 
surface, at any point P, will be bent so as to proceed toward the point O'. For that 
to be true, we have to shape the surface in such a way that the time it takes for 
the light to go from O to P, that is, the distance OP divided by the speed of light 
(the speed here is unity), plus n * O'P, which is the time it takes to go from P to O', 
is equal to a constant independent of the point P. This condition supplies us with 
an equation for determining the surface. The answer is that the surface is a very 
complicated fourth-degree curve, and the student may entertain himself by trying 
to calculate it by analytic geometry. It is simpler to try a special case that corre- 
sponds to s —> 00 , because then the curve is a second-degree curve and is more 
recognizable. It is interesting to compare this curve with the parabolic curve we 
found for a focusing mirror when the light is coming from infinity. 

So the proper surface cannot easily be made — to focus the light from one 
point to another requires a rather complicated surface. It turns out in practice 
that we do not try to make such complicated surfaces ordinarily, but instead 
we make a compromise. Instead of trying to get all the rays to come to a focus, we 
arrange it so that only the rays fairly close to the axis OO f come to a focus. The 
farther ones may deviate if they want to, unfortunately, because the ideal surface 
is complicated, and we use instead a spherical surface with the right curvature at 
the axis. It is so much easier to fabricate a sphere than other surfaces that it is 
profitable for us to find out what happens to rays striking a spherical surface, 
supposing that only the rays near the axis are going to be focused perfectly. 
Those rays which are near the axis are sometimes called paraxial rays , and what 
we are analyzing are the conditions for the focusing of paraxial rays. We shall 
discuss later the errors that are introduced by the fact that all rays are not always 
close to the axis. 

Thus, supposing P is close to the axis, we drop a perpendicular PQ such that 
the height PQ is h. For a moment, we imagine that the surface is a plane passing 
through P. In that case, the time needed to go from O to P would exceed the time 
from O to Q , and also, the time from P to O' would exceed the time from Q to O'. 
But that is why the glass must be curved, because the total excess time must be 
compensated by the delay in passing from V to Ql Now the excess time along 
route OP is h 2 /2s, and the excess time on the other route is nh 2 /2s'. This excess 
time, which must be matched by the delay in going along VQ, differs from what 
it would have been in a vacuum, because there is a medium present. In other 
words, the time to go from V to Q is not as if it were straight in the air, but it is 
slower by the factor n, so that the excess delay in this distance is then ( n — 1 )VQ. 
And now, how large is VQ1 If the point C is the center of the sphere and if 
its radius is R , we see by the same formula that the distance VQ is equal to 
h 2 j2R. Therefore we discover that the law that connects the distances s and s', 
and that gives us the radius of curvature R of the surface that we need, is 

(h 2 /2s) + 0 nh 2 /2s ') = (n - \)h 2 /2R (27.2) 

or 

(1/s) + (n/ s') = (n - 1 )/R. (27.3) 

If we have a position O and another position O', and want to focus light from O 
to O', then we can calculate the required radius of curvature R of the surface by 
this formula. 

Now it turns out, interestingly, that the same lens, with the same curvature 
R, will focus for other distances, namely, for any pair of distances such that the 
sum of the two reciprocals, one multiplied by n, is a constant. Thus a given lens 
will (so long as we limit ourselves to paraxial rays) focus not only from O to O', 
but between an infinite number of other pairs of points, so long as those pairs of 
points bear the relationship that 1 /s + n/ s' is a constant, characteristic of the lens. 

In particular, an interesting case is that in which s —> 00 . We can see from the 
formula that as one s increases, the other decreases. In other words, if point O 
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goes out, point O' comes in, and vice versa. As point O goes toward infinity, point 
O' keeps moving in until it reaches a certain distance, called the focal length /', 
inside the material. If parallel rays come in, they will meet the axis at a distance 
/'. Likewise, we could imagine it the other way. (Remember the reciprocity rule: 
if light will go from O to O' , of course it will also go from O' to O.) Therefore, if 
we had a light source inside the glass, we might want to know where the focus is. 
In particular, if the light in the glass were at infinity (same problem) where would 
it come to a focus outside? This distance is called /. Of course, we can also put 
it the other way. If we had a light source at / and the light went through the surface, 
then it would go out as a parallel beam. We can easily find out what / and /' are: 

n/f - (n - 1 )/R or /' = Rn/(n - 1), (27.4) 

1 if = (« - 1)/* or / = R/(n - 1). (27.5) 

We see an interesting thing: if we divide each focal length by the corresponding 
index of refraction we get the same result! This theorem, in fact, is general. It is 
true of any system of lenses, no matter how complicated, so it is interesting to 
remember. We did not prove here that it is general — we merely noted it for a single 
surface, but it happens to be true in general that the two focal lengths of a system 
are related in this way. Sometimes Eq. (27.3) is written in the form 

l/s + nfs 9 - 1 if (27.6) 

This is more useful than (27.3) because we can measure / more easily than we can 
measure the curvature and index of refraction of the lens : if we are not interested 
in designing a lens or in knowing how it got that way, but simply lift it off a shelf, 
the interesting quantity is f not the n and the 1 and the i?! 

Now an interesting situation occurs if s becomes less than /. What happens 
then? If s < f then (l/s) > (1//), and therefore s' is negative; our equation 
says that the light will focus only with a negative value of s', whatever that means! 
It does mean something very interesting and very definite. It is still a useful formula, 
in other words, even when the numbers are negative. What it means is shown in 
Fig. 27-3. If we draw the rays which are diverging from O , they will be bent, it is 
true, at the surface, and they will not come to a focus, because O is so close in that 
they are “beyond parallel.” However, they diverge as if they had come from a 
point O' outside the glass. This is an apparent image, sometimes called a virtual 
image. The image O' in Fig. 27-2 is called a real image . If the light really comes to 
a point, it is a real image. But if the light appears to be coming from a point, a 
fictitious point different from the original point, it is a virtual image. So when 
s' comes out negative, it means that O' is on the other side of the surface, and every- 
thing is all right. 

Now consider the interesting case where R is equal to infinity; then we have 
( 1 / s) H - (n/s r ) = 0. In other words, s' — —ns, which means that if we look from 
a dense medium into a rare medium and see a point in the rare medium, it appears 
to be deeper by a factor n. Likewise, we can use the same equation backwards, 
so that if we look into a plane surface at an object that is at a certain distance 
inside the dense medium, it will appear as though the light is coming from not as 
far back (Fig. 27-4). When we look at the bottom of a swimming pool from above, 
it does not look as deep as it really is, by a factor 3/4, which is the reciprocal of the 
index of refraction of water. 

We could go on, of course, to discuss the spherical mirror. But if one appreci- 
ates the ideas involved, he should be able to work it out for himself. Therefore 
we leave it to the student to work out the formula for the spherical mirror, but 
we mention that it is well to adopt certain conventions concerning the distances 
involved: 

(1) The object distance s is positive if the point O is to the left of the surface. 

(2) The image distance s' is positive if the point O' is to the right of the surface. 

(3) The radius of curvature of the surface is positive if the center is to the right 
of the surface. 
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In Fig. 27-2, for example, 5 , s', and R are all positive; in Fig. 27-3, s and R are 
positive, but s' is negative. If we had used a concave surface, our formula (27.3) 
would still give the correct result if we merely make R a negative quantity. 

In working out the corresponding formula for a mirror, using the above 
conventions, you will find that if you put n = - 1 throughout the formula (27.3) 
(as though the material behind the mirror had an index - 1 ), the right formula for 
a mirror results! 

Although the derivation of formula (27.3) is simple and elegant, using least 
time, one can of course work out the same formula using Snell’s law, remembering 
that the angles are so small that the sines of angles can be replaced by the angles 
themselves. 

27-3 The focal length of a lens 

Now we go on to consider another situation, a very practical one. Most of 
the lenses that we use have two surfaces, not just one. How does this affect 
matters? Suppose that we have two surfaces of different curvature, with glass 
filling the space between them (Fig. 27-5). We want to study the problem of 
focusing from a point O to an alternate point O'. How can we do that? The answer 
is this: First, use formula (27.3) for the first surface, forgetting about the second 
surface. This will tell us that the light which was diverging from O will appear 
to be converging or diverging, depending on the sign, from some other point, 
say O'. Now we consider a new problem. We have a different surface, between 
glass and air, in which rays are converging toward a certain point O'. Where will 
they actually converge? We use the same formula again! We find that they con- 
verge at O". Thus, if necessary, we can go through 75 surfaces by just using the 
same formula in succession, from one to the next! 

There are some rather high-class formulas that would save us considerable 
energy in the few times in our lives that we might have to chase the light through 
five surfaces, but it is easier just to chase it through five surfaces when the problem 
arises than it is to memorize a lot of formulas, because it may be we will never have 
to chase it through any surfaces at all! 

In any case, the principle is that when we go through one surface we find a 
new position, a new focal point, and then take that point as the starting point for 
the next surface, and so on. In order to actually do this, since on the second surface 
we are going from n to 1 rather than from 1 to n, and since in many systems there 
is more than one kind of glass, so that there are indices n x , n 2 , ... , we really need 
a generalization of formula (27.3) for a case where there are two different indices, 
n 1 and n 2 , rather than only n. Then it is not difficult to prove that the general 
form of (27.3) is 

(ni/s) + (n 2 /s’) = (n 2 - n{)/R. (27.7) 

Fig. 27-6. A thin lens with two posi- Particularly simple is the special case in which the two surfaces are very close 

tive radii. together so close that we may ignore small errors due to the thickness. If we draw 

the lens as shown in Fig. 27-6, we may ask this question : How must the lens be 
built so as to focus light from O to O ' ? Suppose the light comes exactly to 
the edge of the lens, at point P. Then the excess time in going from O to O' is 
(n^/ls) + (mh 2 /2s'), ignoring for a moment the presence of the thickness T 
of glass of index n 2 . Now, to make the time for the direct path equal to that for 
the path OPO', we have to use a piece of glass whose thickness T at the center is such 
that the delay introduced in going through this thickness is enough to compensate 
for the excess time above. Therefore the thickness of the lens at the center must 
be given by the relationship 

(n!h 2 /2s) + (n y h 2 /2s') = (n 2 - n{)T. (27.8) 

We can also express T in terms of the radii R x and R 2 of the two surfaces. Paying 
attention to our convention (3), we thus find, for Ri < R 2 (a convex lens), 

T = (h 2 /2R 1 ) - (h 2 /2R 2 ). 



Fig. 27-5. Image formation by a 
two-surface lens. 
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Therefore, we finally get 


(«i/s) + ("iA0 - (n 2 - «i)(l/*i » l/i? 2 ). (27.10) 

Now we note again that if one of the points is at infinity, the other will be at a 
point which we will call the focal length / The focal length /is given by 

1 //= (* - l)(l/*i ~ l/* 2 ), (27.11) 

where n = n 2 /n\- 

Now, if we take the opposite case, where s goes to infinity, we see that s' is at 
the focal length /. This time the focal lengths are equal. (This is another special 
case of the general rule that the ratio of the two focal lengths is the ratio of the 
indices of refraction in the two media in which the rays focus. In this particular 
optical system, the initial and final indices are the same, so the two focal lengths 
are equal.) 

Forgetting for a moment about the actual formula for the focal length, if we 
bought a lens that somebody designed with certain radii of curvature and a certain 
index, we could measure the focal length, say, by seeing where a point at infinity 
focuses. Once we had the focal length, it would be better to write our equation in 
terms of the focal length directly, and the formula then is 

(l/.v) + (1 /.O = 1// (27.12) 

Now let us see how the formula works and what it implies in different circum- 
stances. First, it implies that if s or s f is infinite the other one is /. That means that 
parallel light focuses at a distance /, and this in effect defines /. Another interesting 
thing it says is that both points move in the same direction. If one moves to the 
right, the other does also. Another thing it says is that s and s' are equal if they are 
both equal to 2/ In other words, if we want a symmetrical situation, we find that 
they will both focus at a distance 2/ 

27-4 Magnification 

So far we have discussed the focusing action only for points on the axis. Now 
let us discuss also the imaging of objects not exactly on the axis, but a little bit off, 
so that we can understand the properties of magnification. When we set up a lens 
so as to focus light from a small filament onto a “point” on a screen, we notice 
that on the screen we get a “picture” of the same filament, except of a larger or 
smaller size than the true filament. This must mean that the light comes to a focus 
from each point of the filament. In order to understand this a little better, let us 
analyze the thin lens system shown schematically in Fig. 27-7. We know the follow- 
ing facts: 

(1) Any ray that comes in parallel on one side proceeds toward a certain par- 
ticular point called the focus on the other side, at a distance/ from the lens. 

(2) Any ray that arrives at the lens from the focus on one side comes out parallel 
to the axis on the other side. 

This is all we need to establish formula (27.12) by geometry, as follows: Suppose 
we have an object at some distance x from the focus; let the height of the object 
be y. Then we know that one of the rays, namely PQ , will be bent so as to pass 
through the focus R on the other side. Now if the lens will focus point P at all, we 
can find out where if we find out where just one other ray goes, because the new 
focus will be where the two intersect again. We need only use our ingenuity to 
find the exact direction of one other ray. But we remember that a parallel ray goes 
through the focus and vice versa : a ray which goes through the focus will come out 
parallel! So we draw ray PT through U. (It is true that the actual rays which are 
doing the focusing may be much more limited than the two we have drawn, but 
they are harder to figure, so we make believe that we can make this ray.) Since it 
would come out parallel, we draw TS parallel to XW. The intersection S is the 
point we need. This will determine the correct place and the correct height. Let 
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Fig. 27-7. The geometry of imaging 
by a thin lens. 



us call the height / and the distance from the focus, x'. Now we may derive a 
lens formula. Using the similar triangles PVU and TXU, we find 


7 = - (27.13) 

Similarly, from triangles SWR and QXR, we get 

/ y 

-, = T (27.14) 


Solving each for y'/y , we find that 


y - = - = l 

y f x 


(27.15) 


Equation (27.15) is the famous lens formula j in it is everything we need to know 
about lenses: It tells us the magnification, y'/y, in terms of the distances and the 
focal lengths. It also connects the two distances jc and x' with/: 


= f 2 , (27.16) 

which is a much neater form to work with than Eq. (27.12). We leave it to the 
student to demonstrate that if we call s = x + f and s' = x' + / Eq. (27.12) 
is the same as Eq. (27.16). 


27-5 Compound lenses 
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Fig. 27-8. Illustration of the principal 
planes of an optical system. 


Without actually deriving it, we shall briefly describe the general result when 
we have a number of lenses. If we have a system of several lenses, how can we 
possibly analyze it? That is easy. We start with some object and calculate where 
its image is for the first lens, using formula (27. 1 6) or (27. 1 2) or any other equivalent 
formula, or by drawing diagrams. So we find an image. Then we treat this image 
as the source for the next lens, and use the second lens with whatever its focal 
length is to again find an image. We simply chase the thing through the succession 
of lenses. That is all there is to it. It involves nothing new in principle, so we shall 
not go into it. However, there is a very interesting net result of the effects of any 
sequence of lenses on light that starts and ends up in the same medium, say air. 
Any optical instrument — a telescope or a microscope with any number of lenses 
and mirrors— has the following property: There exist two planes, called the 
principal planes of the system (these planes are often fairly close to the first surface 
of the first lens and the last surface of the last lens), which have the following prop- 
erties: (1) If light comes into the system parallel from the first side, it comes out 
at a certain focus, at a distance from the second principal plane equal to the focal 
length, just as though the system were a thin lens situated at this plane. (2) If 
parallel light comes in the other way, it comes to a focus at the same distance / 
from the first principal plane, again as if a thin lens where situated there. (See 
Fig. 27-8.) 

Of course, if we measure the distances x and x’, and y and y' as before, the 
formula (27.16) that we have written for the thin lens is absolutely general, pro- 
vided that we measure the focal length from the principal planes and not from the 
center of the lens. It so happens that for a thin lens the principal planes are coin- 
cident. It is just as though we could take a thin lens, slice it down the middle, and 
separate it, and not notice that it was separated. Every ray that comes in pops out 
immediately on the other side of the second plane from the same point as it went 
into the first plane! The principal planes and the focal length may be found either 
by experiment or by calculation, and then the whole set of properties of the optical 
system are described. It is very interesting that the result is not complicated when 
we are all finished with such a big, complicated optical system. 
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27-6 Aberrations 


Before we get too excited about how marvelous lenses are, we must hasten 
to add that there are also serious limitations, because of the fact that we have 
limited ourselves, strictly speaking, to paraxial rays, the rays near the axis. A real 
lens having a finite size will, in general, exhibit aberrations . For example, a ray 
that is on the axis, of course, goes through the focus ; a ray that is very close to the 
axis will still come to the focus very well. But as we go farther out, the ray begins 
to deviate from the focus, perhaps by falling short, and a ray striking near the 
top edge comes down and misses the focus by quite a wide margin. So, instead of 
getting a point image, we get a smear. This effect is called spherical aberration , 
because it is a property of the spherical surfaces we use in place of the right shape. 
This could be remedied, for any specific object distance, by re-forming the shape of 
the lens surface, or perhaps by using several lenses arranged so that the aberrations 
of the individual lenses tend to cancel each other. 

Lenses have another fault: light of different colors has different speeds, or 
different indices of refraction, in the glass, and therefore the focal length of a 
given lens is different for different colors. So if we image a white spot, the image 
will have colors, because when we focus for the red, the blue is out of focus, or 
vice versa. This property is called chromatic aberration. 

There are still other faults. If the object is off the axis, then the focus really 
isn't perfect any more, when it gets far enough off the axis. The easiest way to 
verify this is to focus a lens and then tilt it so that the rays are coming in at a large 
angle from the axis. Then the image that is formed will usually be quite crude, 
and there may be no place where it focuses well. There are thus several kinds of 
errors in lenses that the optical designer tries to remedy by using many lenses to 
compensate each other’s errors. 

How careful do we have to be to eliminate aberrations? Is it possible to make 
an absolutely perfect optical system? Suppose we had built an optical system that 
is supposed to bring light exactly to a point. Now, arguing from the point of view 
of least time, can we find a condition on how perfect the system has to be? The 
system will have some kind of an entrance opening for the light. If we take the 
farthest ray from the axis that can come to the focus (if the system is perfect, of 
course), the times for all rays are exactly equal. But nothing is perfect, so the 
question is, how wrong can the time be for this ray and not be worth correcting 
any further? That depends on how perfect we want to make the image. But 
suppose we want to make the image as perfect as it possibly can be made. Then, 
of course, our impression is that we have to arrange that every ray takes as nearly 
the same time as possible. But it turns out that this is not true, that beyond a 
certain point we are trying to do something that is too fine, because the theory of 
geometrical optics does not work! 

Remember that the principle of least time is not an accurate formulation, 
unlike the principle of conservation of energy or the principle of conservation of 
momentum. The principle of least time is only an approximation , and it is inter- 
esting to know how much error can be allowed and still not make any apparent 
difference. The answer is that if we have arranged that between the maximal ray — 
the worst ray, the ray that is farthest out — and the central ray, the difference in 
time is less than about the period that corresponds to one oscillation of the light, 
then there is no use improving it any further. Light is an oscillatory thing with a 
definite frequency that is related to the wavelength, and if we have arranged that 
the time difference for different rays is less than about a period, there is no use 
going any further. 


27-7 Resolving power 

Another interesting question — a very important technical question with all 
optical instruments — is how much resolving power they have. If we build a micro- 
scope, we want to see the objects that we are looking at. That means, for instance, 
that if we are looking at a bacterium with a spot on each end, we want to see that 
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Fig. 27-9. The resolving power of an 
optical system. 


there are two dots when we magnify them. One might think that all we have to 
do is to get enough magnification— we can always add another lens, and we can 
always magnify again and again, and with the cleverness of designers, all the 
spherical aberrations and chromatic aberrations can be cancelled out, and there 
is no reason why we cannot keep on magnifying the image. So the limitations of a 
microscope are not that it is impossible to build a lens that magnifies more than 
2000 diameters. We can build a system of lenses that magnifies 10,000 diameters, 
but we still could not see two points that are too close together because of the 
limitations of geometrical optics, because of the fact that least time is not precise. 

To discover the rule that determines how far apart two points have to be so 
that at the image they appear as separate points can be stated in a very beautiful 
way associated with the time it takes for different rays. Suppose that we disregard 
the aberrations now, and imagine that for a particular point P (Fig. 27-9) all the 
rays from object to image T take exactly the same time. (It is not true, because it 
is not a perfect system, but that is another problem.) Now take another nearby 
point, P', and ask whether its image will be distinct from T. In other words, whether 
we can make out the difference between them. Of course, according to geometrical 
optics, there should be two point images, but what we see may be rather smeared 
and we may not be able to make out that there are two points. The condition that 
the second point is focused in a distinctly different place from the first one is that 
the two times for the extreme rays P'ST and P'RT on each side of the big opening 
of the lenses to go from one end to the other, must not be equal from the two 
possible object points to a given image point. Why? Because, if the times were 
equal, of course both would/oc«j at the same point. So the times are not going to 
be equal. But by how much do they have to differ so that we can say that both do 
not come to a common focus, so that we can distinguish the two image points? 
The general rule for the resolution of any optical instrument is this ’ two different 
point sources can be resolved only if one source is focused at such a point that 
the times for the maximal rays from the other source to reach that point, as 
compared with its own true image point, differ by more than one period. It is 
necessary that the difference in time between the top ray and the bottom ray to 
the wrong focus shall exceed a certain amount, namely, approximately the period 
of oscillation of the light: 

ti ~ h > \/v, (27.17) 


where v is the frequency of the light (number of oscillations per second ; also speed 
divided by wavelength). If the distance of separation of the two points is called 
D, and if the opening angle of the lens is called 0, then one can demonstrate that 
(27.17) is exactly equivalent to the statement that D must exceed \/n sin 0, where 
n is the index of refraction at P and X is the wavelength. The smallest things that 
we can see are therefore approximately the wavelength of light. A corresponding 
formula exists for telescopes, which tells us the smallest difference in angle between 
two stars that can just be distinguished.* 


The angle is about X/ 0, where Z) is the lens diameter. Can you see why ? 
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Electromagnetic Radiation 


28-1 Electromagnetism 

The most dramatic moments in the development of physics are those in which 
great syntheses take place, where phenomena which previously had appeared to 
be different are suddenly discovered to be but different aspects of the same thing. 
The history of physics is the history of such syntheses, and the basis of the success 
of physical science is mainly that we are able to synthesize. 

Perhaps the most dramatic moment in the development of physics during the 
19th century occurred to J. C. Maxwell one day in the 1860’s, when he combined 
the laws of electricity and magnetism with the laws of the behavior of light. As a 
result, the properties of light were partly unravelled — that old and subtle stuff 
that is so important and mysterious that it was felt necessary to arrange a special 
creation for it when writing Genesis. Maxwell could say, when he was finished 
with his discovery, “Let there be electricity and magnetism, and there is light!” 

For this culminating moment there was a long preparation in the gradual 
discovery and unfolding of the laws of electricity and magnetism. This story we 
shall reserve for detailed study next year. However, the story is, briefly, as follows. 
The gradually discovered properties of electricity and magnetism, of electric forces 
of attraction and repulsion, and of magnetic forces, showed that although these 
forces were rather complex, they all fell off inversely as the square of the distance. 
We know, for example, that the simple coulomb law for stationary charges is that 
the electric force field varies inversely as the square of the distance. As a conse- 
quence, for sufficiently great distances there is very little influence of one system 
of charges on another. Maxwell noted that the equations or the laws that had been 
discovered up to this time were mutually inconsistent when he tried to put them 
all together, and in order for the whole system to be consistent, he had to add an- 
other term to his equations. With this new term there came an amazing prediction, 
which was that a part of the electric and magnetic fields would fall off much more 
slowly with the distance than the inverse square, namely, inversely as the first 
power of the distance! And so he realized that electric currents in one place can 
affect other charges far away, and he predicted the basic effects with which we are 
familiar today — radio transmission, radar, and so on. 

It seems a miracle that someone talking in Europe can, with mere electrical 
influences, be heard thousands of miles away in Los Angeles. How is it possible? 
It is because the fields do not vary as the inverse square, but only inversely as the 
first power of the distance. Finally, then, even light itself was recognized to be 
electric and magnetic influences extending over vast distances, generated by an 
almost incredibly rapid oscillation of the electrons in the atoms. All these phe- 
nomena we summarize by the word radiation or, more specifically, electromagnetic 
radiation , there being one or two other kinds of radiation also. Almost always, 
radiation means electromagnetic radiation. 

And thus is the universe knit together. The atomic motions of a distant star 
still have sufficient influence at this great distance to set the electrons in our eye 
in motion, and so we know about the stars. If this law did not exist, we would 
all be literally in the dark about the exterior world! And the electric surgings in a 
galaxy five billion light years away — which is the farthest object we have found 
so far — can still influence in a significant and detectable way the currents in the 
great “dish” in front of a radio telescope. And so it is that we see the stars and the 
galaxies. 
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This remarkable phenomenon is what we shall discuss in the present chapter. 
At the beginning of this course in physics we outlined a broad picture of the world, 
but we are now better prepared to understand some aspects of it, and so we shall 
now go over some parts of it again in greater detail. We begin by describing the 
position of physics at the end of the 19th century. All that was then known about 
the fundamental laws can be summarized as follows. 

First, there were laws of forces : one force was the law of gravitation, which 
we have written down several times; the force on an object of mass m, due to 
another of mass M, is given by 

F = GmMe r /r 2 , (28.1) 

where e r is a unit vector directed from m to M, and r is the distance between them. 

Next, the laws of electricity and magnetism, as known at the end of the 19th 
century, are these: the electrical forces acting on a charge q can be described by 
two fields, called E and B, and the velocity v of the charge q, by the equation 


F = ?(E + tXB). 


(28.2) 


To complete this law, we have to say what the formulas for E and B are in a given 
circumstance: if a number of charges are present, E and the B are each the sum of 
contributions, one from each individual charge. So if we can find the E and B 
produced by a single charge, we need only to add all the effects from all the charges 
in the universe to get the total E and B! This is the principle of superposition. 

What is the formula for the electric and magnetic field produced by one in- 
dividual charge? It turns out that this is very complicated, and it takes a great deal 
of study and sophistication to appreciate it. But that is not the point. We write 
down the law now only to impress the reader with the beauty of nature, so to speak, 
i.e., that it is possible to summarize all the fundamental knowledge on one page, 
with notations that he is now familiar with. This law for the fields of an individual 
charge is complete and accurate, so far as we know (except for quantum mechanics) 
but it looks rather complicated. We shall not study all the pieces now; we only 
write it down to give an impression, to show that it can be written, and so that we 
can see ahead of time roughly what it looks like. As a matter of fact, the most 
useful way to write the correct laws of electricity and magnetism is not the way we 
shall now write them, but involves what are called field equations, which we shall 
learn about next year. But the mathematical notations for these are different and 
new, and so we write the law in an inconvenient form for calculation, but in nota- 
tions that we now know. 

The electric field, E, is given by 


E = 


i (*A 

47T€o lr' 2 c dt \r' 2 / 


+ 


\_tf_ 

c 2 dt 2 



(28.3) 


What do the various terms tell us? Take the first term, E - ~qt T »/Aire 0 r f2 . That, 
of course, is Coulomb’s law, which we already know: q is the charge that is pro- 
ducing the field ; e r ' is the unit vector in the direction from the point P where E is 
measured, r is the distance from P to q. But, Coulomb’s law is wrong. The dis- 
coveries of the 19th century showed that influences cannot travel faster than a 
certain fundamental speed c, which we now call the speed of light. It is not correct 
that the first term is Coulomb’s law, not only because it is not possible to know 
where the charge is now and at what distance it is now , but also because the only 
thing that can affect the field at a given place and time is the behavior of the charges 
in th q past. How far in the past? The time delay, or retarded time , so-called, is 
the time it takes, at speed c, to get from the charge to the field point P. The delay 
is rye. 

So to allow for this time delay, we put a little prime on r, meaning how far 
away it was when the information now arriving at P left q. Just for a moment 
suppose that the charge carried a light, and that the light could only come to P 
at the speed c. Then when we look at q , we would not see where it is now, of course, 
but where it was at some earlier time. What appears in our formula is the apparent 
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direction 6 r — the direction it used to be — the so-called retarded direction— and 
at the retarded distance r f . That would be easy enough to understand, too, but it 
is also wrong. The whole thing is much more complicated. 

There are several more terms. The next term is as though nature were trying 
to allow for the fact that the effect is retarded, if we might put it very crudely. It 
suggests that we should calculate the delayed coulomb field and add a correction 
to it, which is its rate of change times the time delay that we use. Nature seems to 
be attempting to guess what the field at the present time is going to be, by taking 
the rate of change and multiplying by the time that is delayed. But we are not 
yet through. There is a third term — the second derivative, with respect to r, of the 
unit vector in the direction of the charge. Now the formula is finished, and that is 
all there is to the electric field from an arbitrarily moving charge. 

The magnetic field is given by 

B - — e r ' X E/c. (28.4) 

We have written these down only for the purpose of showing the beauty of nature 
or, in a way, the power of mathematics. We do not pretend to understand why it 
is possible to write so much in such a small space, but (28.3) and (28.4) contain 
the machinery by which electric generators work, how light operates, all the phe- 
nomena of electricity and magnetism. Of course, to complete the story we also 
need to know something about the behavior of the materials involved — the prop- 
erties of matter — which are not described properly by (28.3). 

To finish with our description of the world of the 19th century we must 
mention one other great synthesis which occurred in that century, one with which 
Maxwell had a great deal to do also, and that was the synthesis of the phenomena 
of heat and mechanics. We shall study that subject soon. 

What had to be added in the 20th century was that the dynamical laws of 
Newton were found to be all wrong, and quantum mechanics had to be introduced 
to correct them. Newton’s laws are approximately valid when the scale of things is 
sufficiently large. These quantum-mechanical laws, combined with the laws of 
electricity, have only recently been combined to form a set of laws called quantum 
electrodynamics . In addition, there were discovered a number of new phenomena, 
of which the first was radioactivity, discovered by Becquerel in 1898 — he just 
sneaked it in under the 19th century. This phenomenon of radioactivity was 
followed up to produce our knowledge of nuclei and new kinds of forces that are 
not gravitational and not electrical, but new particles with different interactions, 
a subject which has still not been unravelled. 

For those purists who know more (the professors who happen to be reading 
this), we should add that when we say that (28.3) is a complete expression of the 
knowledge of electrodynamics, we are not being entirely accurate. There was a 
problem that was not quite solved at the end of the 19th century. When we try to 
calculate the field from all the charges including the charge itself that we want the 
field to act on , we get into trouble trying to find the distance, for example, of a 
charge from itself, and dividing something by that distance, which is zero. The 
problem of how to handle the part of this field which is generated by the very 
charge on which we want the field to act is not yet solved today. So we leave it 
there; we do not have a complete solution to that puzzle yet, and so we shall avoid 
the puzzle for as long as we can. 

28-2 Radiation 

That, then, is a summary of the world picture. Now let us use it to discuss 
the phenomena called radiation. To discuss these phenomena, we must select 
from Eq. (28.3) only that piece which varies inversely as the distance and not as 
the square of the distance. It turns out that when we finally do find that piece, it 
is so simple in its form that it is legitimate to study optics and electrodynamics in an 
elementary way by taking it as “the law” of the electric field produced by a moving 
charge far away. We shall take it temporarily as a given law which we will learn 
about in detail next year. 
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Of the terms appearing in (28.3), the first one evidently goes inversely as the 
square of the distance, and the second is only a correction for delay, so it is easy 
to show that both of them vary inversely as the square of the distance. All of the 
effects we are interested in come from the third term, which is not very complicated, 
after all. What this term says is: look at the charge and note the direction of the 
unit vector (we can project the end of it onto the surface of a unit sphere). As the 
charge moves around, the unit vector wiggles, and the acceleration of that unit 
vector is what we are looking for. That is all. Thus 

f - d 2 t T ' 

4tt e 0 c 2 dt 2 (28.5) 

is a statement of the laws of radiation, because that is the only important term 
when we get far enough away that the fields are varying inversely as the distance. 
(The parts that go as the square have fallen off so much that we are not interested 
in them.) 

Now we can go a little bit further in studying (28.5) to see what it means. 
Suppose a charge is moving in any manner whatsoever, and we are observing it 
from a distance. We imagine for a moment that in a sense it is “lit up” (although 
it is light that we are trying to explain); we imagine it as a little white dot. Then 
we would see this white dot running around. But we don’t see exactly how it is 
running around right now, because of the delay that we have been talking about. 
What counts is how it was moving earlier. The unit vector e r - is pointed toward the 
apparent position of the charge. Of course, the end of e r - goes on a slight curve, 
so that its acceleration has two components. One is the transverse piece, because 
the end of it goes up and down, and the other is a radial piece because it stays on a 
sphere. It is easy to demonstrate that the latter is much smaller and varies as the 
inverse square of r when r is very great. This is easy to see, for when we imagine 
that we move a given source farther and farther away, then the wigglings of e r - 
look smaller and smaller, inversely as the distance, but the radial component 
of acceleration is varying much more rapidly than inversely as the distance. So 
for practical purposes all we have to do is project the motion on a plane at unit 
distance. Therefore we find the following rule: Imagine that we look at the moving 
charge and that everything we see is delayed — like a painter trying to paint a scene 
on a screen at a unit distance. A real painter, of course, does not take into account 
the fact that light is going at a certain speed, but paints the world as he sees it. 
We want to see what his picture would look like. So we see a dot, representing the 
charge, moving about in the picture. The acceleration of that dot is proportional 
to the electric field. That is all— all we need. 

Thus Eq. (28.5) is the complete and correct formula for radiation; even 
relativity effects are all contained in it. However, we often want to apply it to a 
still simpler circumstance in which the charges are moving only a small distance 
at a relatively slow rate. Since they are moving slowly, they do not move an ap- 
preciable distance from where they start, so that the delay time is practically 
constant. Then the law is still simpler, because the delay time is fixed. Thus we 
imagine that the charge is executing a very tiny motion at an effectively constant 
distance. The delay at the distance r is r/c. Then our rule becomes the following: 
If the charged object is moving in a very small motion and it is laterally displaced 
by the distance x(t), then the angle that the unit vector e r - is displaced is x/r, and 
since r is practically constant, the ^-component of d 2 e r >/dt 2 is simply the accelera- 
tion of x itself at an earlier time, and so finally we get the law we want, which is 

E - (,) " 4 (' - D • < 28JS > 

Only the component of a x perpendicular to the line of sight is important. Let 
us see why that is. Evidently, if the charge is moving in and out straight at us, the 
unit vector in that direction does not wiggle at all, and it has no acceleration. So 
it is only the sidewise motion which is important, only the acceleration that we 
see projected on the screen. 
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28-3 The dipole radiator 


As our fundamental “law” of electromagnetic radiation, we are going to 
assume that (28.6) is true, i.e., that the electric field produced by an accelerating 
charge which is moving nonrelativistically at a very large distance r approaches 
that form. The electric field varies inversely as r and is proportional to the accelera- 
tion of the charge, projected onto the “plane of sight,” and this acceleration is not 
today’s acceleration, but the acceleration that it had at an earlier time, the amount 
of delay being a time, r/c. In the remainder of this chapter we shall discuss this 
law so that we can understand it better physically, because we are going to use 
it to understand all of the phenomena of light and radio propagation, such as 
reflection, refraction, interference, diffraction, and scattering. It is the central 
law, and is all we need. All the rest of Eq. (28.3) was written down only to set the 
stage, so that we could appreciate where (28.6) fits and how it comes about. 

We shall discuss (28.3) further next year. In the meantime, we shall accept it 
as true, but not just on a theoretical basis. We may devise a number of experiments 
which illustrate the character of the law. In order to do so, we need an accelerating 
charge. It should be a single charge, but if we can make a great many charges 
move together, all the same way, we know that the field will be the sum of the 
effects of each of the individual charges ; we just add them together. As an example, 
consider two pieces of wire connected to a generator, as shown in Fig. 28-1. 
The idea is that the generator makes a potential difference, or a field, which pulls 
electrons away from piece A and pushes them into B at one moment, and then, an 
infinitesimal time later, it reverses the effect and pulls the electrons out of B and 
pumps them back into A! So in these two wires charges, let us say, are accelerating 
upward in wire A and upward in wire B for one moment, and a moment later they 
are accelerating downward in wire A and downward in wire B . The fact that we 
need two wires and a generator is merely that this is a way of doing it. The net 
result is that we merely have a charge accelerating up and down as though A and B 
were one single wire. A wire that is very short compared with the distance light 
travels in one oscillation period is called an electric dipole oscillator . Thus we 
have the circumstance that we need to apply our law, which tells us that this charge 
makes an electric field, and so we need an instrument to detect an electric field, 
and the instrument we use is the same thing — a pair of wires like A and B! If an 
electric field is applied to such a device, it will produce a force which will pull 
the electrons up on both wires or down on both wires. This signal is detected 
by means of a rectifier mounted between A and B , and a tiny, fine wire carries the 
information into an amplifier, where it is amplified so we can hear the audio- 
frequency tone with which the radiofrequency is modulated. When this probe feels 
an electric field, there will be a loud noise coming out of the loudspeaker, and when 
there is no electric field driving it, there will be no noise. 

Because the room in which the waves we are measuring has other objects in it, 
our electric field will shake electrons in these other objects; the electric field makes 
these other charges go up and down, and in going up and down, these also produce 
an effect on our probe. Thus for a successful experiment we must hold things fairly 
close together, so that the influences from the walls and from ourselves — the 
reflected waves — are relatively small. So the phenomena will not turn out to appear 
to be precisely and perfectly in accord with Eq. (28.6), but will be close enough that 
we shall be able to appreciate the law. 

Now we turn the generator on and hear the audio signal. We find a strong 
field when the detector D is parallel to the generator G at point 1 (Fig. 28-2). We 
find the same amount of field also at any other azimuth angle about the axis of 
(7, because it has no directional effects. On the other hand, when the detector is at 
3 the field is zero. That is all right, because our formula said that the field should 
be the acceleration of the charge projected perpendicular to the line of sight. 
Therefore when we look down on G , the charge is moving toward and away from 
Z>, and there is no effect. So that checks the first rule, that there is no effect when 
the charge is moving directly toward us. Secondly, the formula says that the electric 
field should be perpendicular to r and in the plane of G and r\ so if we put D 
at 1 but rotate it 90°, we should get no signal. And this is just what we find, the 
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Fig. 28-1. A high-frequency signal 
generator drives charges up and down 
on two wires. 


3 



Fig. 28-2. The instantaneous electric 
field on a sphere centered at a localized, 
linearly oscillating charge. 



/ 



Fig. 28-3. Illustration of interference 
of sources. 



Fig. 28-4. Illustration of the vector 
character of the combination of sources. 


electric field is indeed vertical, and not horizontal. When we move D to some inter- 
mediate angle, we see that the strongest signal occurs when it is oriented as shown, 
because although G is vertical, it does not produce a field that is simply parallel 
to itself — it is the projection of the acceleration perpendicular to the line of sight 
that counts. The signal is weaker at 2 than it is at 1 , because of the projection effect. 

28-4 Interference 

Next, we may test what happens when we have two sources side by side a few 
centimeters apart '(Fig. 28-3). The law is that the two sources should add their 
effects at point 1 when both of the sources are connected to the same generator 
and are both moving up and down the same way, so that the total electric field is 
the sum of the two and is twice as strong as it was before. 

Now comes an interesting possibility. Suppose we make the charges in Si 
and S 2 both accelerate up and down, but delay the timing of S 2 so that they are 
180° out of phase. Then the field produced by S x will be in one direction and the 
field produced by S 2 will be in the opposite direction at any instant, and therefore 
we should get no effect at point 1. The phase of oscillation is neatly adjustable by 
means of a pipe which is carrying the signal to S 2 . By changing the length of this 
pipe we change the time it takes the signal to arrive at S 2 and thus we change the 
phase of that oscillation. By adjusting this length, we can indeed find a place where 
there is no more signal left, in spite of the fact that both Si and S 2 are moving! 
The fact that they are both moving can be checked, because if we cut one out, we 
can see the motion of the other. So the two of them together can produce zero if 
everything is adjusted correctly. 

Now, it is very interesting to show that the addition of the two fields is in 
fact a vector addition. We have just checked it for up and down motion, but let 
us check two nonparallel directions. First, we restore Si and S 2 to the same phase; 
that is, they are again moving together. But now we turn Si through 90°, as 
shown in Fig. 28-4. Now we should have at point 1 the sum of two effects, one of 
which is vertical and the other horizontal. The electric field is the vector sum of 
these two in-phase signals— they are both strong at the same time and go through 
zero together; the total field should be a signal R at 45°. If we turn D to get the 
maximum noise, it should be at about 45°, and not vertical. And if we turn it at 
right angles to that direction, we should get zero, which is easy to measure. Indeed, 
we observe just such behavior! 

Now, how about the retardation? How can we demonstrate that the signal 
is retarded? We could, with a great deal of equipment, measure the time at which 
it arrives, but there is another, very simple way. Referring again to Fig. 28-3, 
suppose that S j and S> are in phase. They are both shaking together, and they 
produce equal electric fields at point 1. But suppose we go to a certain place 2 
which is closer to «S , 2 and farther from Si. Then, in accordance with the principle 
that the acceleration should be retarded by an amount equal to r/c, if the retarda- 
tions are not equal, the signals are no longer in phase. Thus it should be possible 
to find a position at which the distances of D from S i and S 2 differ by some amount 
A, in such a manner that there is no net signal. That is, the distance A is to be 
the distance light goes in one-half an oscillation of the generator. We may go 
still further, and find a point where the difference is greater by a whole cycle; 
that is to say, the signal from the first antenna reaches point 3 with a delay in time 
that is greater than that of the second antenna by just the length of time it takes 
for the electric current to oscillate once, and therefore the two electric fields pro- 
duced at 3 are in phase again. At point 3 the signal is strong again. 

This completes our discussion of the experimental verification of some of the 
important features of Eq. (28.6). Of course we have not really checked the 1 /r 
variation of the electric field strength, or the fact that there is also a magnetic 
field that goes along with the electric field. To do so would require rather sophis- 
ticated techniques and would hardly add to our understanding at this point. In 
any case, we have checked those features that are of the greatest importance for 
our later applications, and we shall come back to study some of the other properties 
of electromagnetic waves next year. 
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Interference 


29-1 Electromagnetic waves 

In this chapter we shall discuss the subject of the preceding chapter more 
mathematically. We have qualitatively demonstrated that there are maxima and 
minima in the radiation field from two sources, and our problem now is to describe 
the field in mathematical detail, not just qualitatively. 

We have already physically analyzed the meaning of formula (28.6) quite 
satisfactorily, but there are a few points to be made about it mathematically. In 
the first place, if a charge is accelerating up and down along a line, in a motion of 
very small amplitude, the field at some angle 6 from the axis of the motion is in a 
direction at right angles to the line of sight and in the plane containing both the 
acceleration and the line of sight (Fig. 29-1). If the distance is called r , then at 
time / the electric field has the magnitude 


E(t) = 


qa(t — r/c) sin 9 
47 re Q c 2 r 


(29.1) 


where a(t ~ r/c) is the acceleration at the time (/ — r/c), called the retarded 
acceleration. 

Now it would be interesting to draw a picture of the field under different 
conditions. The thing that is interesting, of course, is the factor a(t — r/c), and to 
understand it we can take the simplest case, 6 = 90°, and plot the field graphically. 
What we had been thinking of before is that we stand in one position and ask how 
the field there changes with time. But instead of that, we are now going to see what 
the field looks like at different positions in space at a given instant. So what we 
want is a “snapshot” picture which tells us what the field is in different places. 
Of course it depends upon the acceleration of the charge. Suppose that the charge 
at first had some particular motion: it was initially standing still, and it suddenly 
accelerated in some manner, as shown in Fig. 29-2, and then stopped. Then, 
a little bit later, we measure the field at a different place. Then we may assert that 
the field will appear as shown in Fig. 29-3. At each point the field is determined 
by the acceleration of the charge at an earlier time, the amount earlier being the 
delay r/c . The field at farther and farther points is determined by the acceleration at 
earlier and earlier times. So the curve in Fig. 29-3 is really, in a sense, a “reversed” 
plot of the acceleration as a function of time; the distance is related to time by a 
constant scale factor c, which we often take as unity. This is easily seen by consider- 
ing the mathematical behavior of a(t — r/c). Evidently, if we add a little time 
At, we get the same value for a(t — r/c) as we would have if we had subtracted a 
little distance: A r = — c At. 

Stated another way: if we add a little time At, we can restore a(t — r/c) to its 
former value by adding a little distance Ar = c At. That is, as time goes on the 
field moves as a wave outward from the source. That is the reason why we sometimes 
say light is propagated as waves. It is equivalent to saying that the field is delayed, 
or to saying that the electric field is moving outward as time goes on. 

An interesting special case is that where the charge q is moving up and down 
in an oscillatory manner. The case which we studied experimentally in the last 
chapter was one in which the displacement x at any time t was equal to a certain 
constant x 0 , the magnitude of the oscillation, times cos c ot. Then the acceleration is 


a = — co 2 Xq cos oj/ = aa cos cat, 


(29.2) 
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Fig. 29-1. The electric field E due 
to a positive charge whose retarded 
acceleration is a'. 



Fig. 29-2. The acceleration of a 
certain charge as a function of time. 



Fig. 29-3. The electric field as a 
function of position at a later time. (The 
1 /r variation is ignored.) 
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where a 0 is the maximum acceleration, — o> 2 jt 0 . Putting this formula into (29.1), 
we find 


E = 


— q sin 0 


a 0 cos o)(t — r/c) 
47 re 0 rc 2 


(29.3) 


Now, ignoring the angle d and the constant factors, let us see what that looks like 
as a function of position or as a function of time. 



Fig. 29-4. The energy flowing within 
the cone 046CD is independent of the 
distance r at which it is measured. 


29-2 Energy of radiation 

First of all, at any particular moment or in any particular place, the strength 
of the field varies inversely as the distance r , as we mentioned previously. Now 
we must point out that the energy content of a wave, or the energy effects that such 
an electric field can have, are proportional to the square of the field, because if, 
for instance, we have some kind of a charge or an oscillator in the electric field, 
then if we let the field act on the oscillator, it makes it move. If this is a linear 
oscillator, the acceleration, velocity, and displacement produced by the electric 
field acting on the charge are all proportional to the field. So the kinetic energy 
which is developed in the charge is proportional to the square of the field. So we 
shall take it that the energy that a field can deliver to a system is proportional 
somehow to the square of the field. 

This means that the energy that the source can deliver decreases as we get 
farther away; in fact, it varies inversely as the square of the distance . But that has 
a very simple interpretation: if we wanted to pick up all the energy we could from 
the wave in a certain cone at a distance r x (Fig. 29-4), and we do the same at an- 
other distance r 2 , we find that the amount of energy per unit area at any one 
place goes inversely as the square of r, but the area of the surface intercepted by 
the cone goes directly as the square of r . So the energy that we can take out of the 
wave within a given conical angle is the same, no matter how far away we are! 
In particular, the total energy that we could take out of the whole wave by putting 
absorbing oscillators all around is a certain fixed amount. So the fact that the 
amplitude of E varies as \/r is the same as saying that there is an energy flux 
which is never lost, an energy which goes on and on, spreading over a greater and 
greater effective area. Thus we see that after a charge has oscillated, it has lost 
some energy which it can never recover; the energy keeps going farther and farther 
away without diminution. So if we are far enough away that our basic approxima- 
tion is good enough, the charge cannot recover the energy which has been, as we 
say, radiated away. Of course the energy still exists somewhere, and is available 
to be picked up by other systems. We shall study this energy “loss” further in 
Chapter 32. 

Let us now consider more carefully how the wave (29.3) varies as a function 
of time at a given place, and as a function of position at a given time. Again we 
ignore the l/r variation and the constants. 


29-3 Sinusoidal waves 

First let us fix the position r, and watch the field as a function of time. It is 
oscillatory at the angular frequency co. The angular frequency co can be defined 
as the rate of change of phase with time (radians per second). We have already 
studied such a thing, so it should be quite familiar to us by now. The period is 
the time needed for one oscillation, one complete cycle, and we have worked that 
out too; it is 27t/oj, because co times the period is one cycle of the cosine. 

Now we introduce a new quantity which is used a great deal in physics. This 
has to do with the opposite situation, in which we fix t and look at the wave as a 
function of distance r. Of course we notice that, as a function of r, the wave (29.3) 
is also oscillatory. That is, aside from l/r, which we are ignoring, we see that E 
oscillates as we change the position. So, in analogy with w, we can define a quantity 
called the wave number , symbolized as k. This is defined as the rate of change of 
phase with distance (radians per meter). That is, as we move in space at a fixed 
time, the phase changes. 
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There is another quantity that corresponds to the period, and we might call 
it the period in space, but it is usually called the wavelength, symbolized X. The 
wavelength is the distance occupied by one complete cycle. It is easy to see, then, 
that the wavelength is lir/k, because k times the wavelength would be the number 
of radians that the whole thing changes, being the product of the rate of change 
of the radians per meter, times the number of meters, and we must make a 2i r 
change for one cycle. So k\ — 2i r is exactly analogous to co / 0 = 2w. 

Now in our particular wave there is a definite relationship between the fre- 
quency and the wavelength, but the above definitions of k and co are actually quite 
general. That is, the wavelength and the frequency may not be related in the 
same way in other physical circumstances. However, in our circumstance the 
rate of change of phase with distance is easily determined, because if we call 
0 = co(7 — r/c) the phase, and differentiate (partially) with respect to distance 
r, the rate of change, d0/rV, is 


d<j> 

7)r 



(29.4) 


There are many ways to represent the same thing, such as 

X = c/ 0 (29.5) \v = c (29.7) 

co — ck (29.6) coX = 2wc f (29.8) 


Why is the wavelength equal to c times the period? That’s very easy, of course, 
because if we sit still and wait for one period to elapse, the waves, travelling at the 
speed c, will move a distance ct 0 , and will of course have moved over just one 
wavelength. 

In a physical situation other than that of light, k is not necessarily related to 
co in this simple way. If we call the distance along an axis x, then the formula for 
a cosine wave moving in a direction x with a wave number k and an angular fre- 
quency co will be written in general as cos (c ot — kx ). 

Now that we have introduced the idea of wavelength, we may say something 
more about the circumstances in which (29.1) is a legitimate formula. We recall 
that the field is made up of several pieces, one of which varies inversely as /*, another 
part which varies inversely as / 2 , and others which vary even faster. It would be 
worth while to know in what circumstances the l/r part of the field is the most 
important part, and the other parts are relatively small. Naturally, the answer is 
“if we go ‘far enough’ away,” because terms which vary inversely as the square 
ultimately become negligible compared with the 1 jr term. How far is “far enough” ? 
The answer is, qualitatively, that the other terms are of order X/r smaller than the 
1 fr term. Thus, so long as we are beyond a few wavelengths, (29.1) is an excellent 
approximation to the field. Sometimes the region beyond a few wavelengths is 
called the “wave zone.” 

29-4 Two dipole radiators 


o 



a *0 
(a) 



a=w 

( 6 ) 


Next let us discuss the mathematics involved in combining the effects of two 
oscillators to find the net field at a given point. This is very easy in the few cases 
that we considered in the previous chapter. We shall first describe the effects 
qualitatively, and then more quantitatively. Let us take the simple case, where the 
oscillators are situated with their centers in the same horizontal plane as the de- 
tector, and the line of vibration is vertical. 

Figure 29-5(a) represents the top view of two such oscillators, and in this 
particular example they are half a wavelength apart in a N-S direction, and are 
oscillating together in the same phase, which we call zero phase. Now we would 
like to know the intensity of the radiation in various directions. By the intensity 
we mean the amount of energy that the field carries past us per second, which is 
proportional to the square of the field, averaged in time. So the thing to look at, 
when we want to know how bright the light is, is the square of the electric field, 
not the electric field itself. (The electric field tells the strength of the force felt by a 


Fig. 29-5. The intensities in various 
directions from two dipole oscillators 
one-half wavelength apart. Left: in 
phase (a = 0). Right: one-half period 
out of phase [a = 7r). 
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stationary charge, but the amount of energy that is going past, in watts per square 
meter, is proportional to the square of the electric field. We shall derive the constant 
of proportionality in the next chapter.) If we look at the array from the W side, 
both oscillators contribute equally and in phase, so the electric field is twice as 
strong as it would be from a single oscillator. Therefore the intensity is four times 
as strong as it would be if there were only one oscillator. (The numbers in Fig. 
29-5 represent how strong the intensity would be in this case, compared with what 
it would be if there were only a single oscillator of unit strength.) Now, in either 
the N or S direction along the line of the oscillators, since they are half a wavelength 
apart, the effect of one oscillator turns out to be out of phase by exactly half an 
oscillation from the other, and therefore the fields add to zero. At a certain par- 
ticular intermediate angle (in fact, at 30°) the intensity is 2, and it falls off, 4, 2, 0, 
and so forth. We have to learn how to find these numbers at other angles. It is a 
question of adding two oscillations with different phases. 

Let us quickly look at some other cases of interest. Suppose the oscillators are 
again one-half a wavelength apart, but the phase a of one is set half a period behind 
the other in its oscillation (Fig. 29— 5b). In the W direction the intensity is now 
zero, because one oscillator is “pushing” when the other one is “pulling.” But in 
the N direction the signal from the near one comes at a certain time, and that of 
the other comes half a period later. But the latter was originally half a period 
behind in timing, and therefore it is now exactly in time with the first one, and so 
the intensity in this direction is 4 units. The intensity in the direction at 30° is 
still 2, as we can prove later. 

Now we come to an interesting case which shows up a possibly useful feature. 
Let us remark that one of the reasons that phase relations of oscillators are in- 
teresting is for beaming radio transmitters. For instance, if we build an antenna 
system and want to send a radio signal, say, to Hawaii, we set the antennas up as 
in Fig. 29-5 (a) and we broadcast with our two antennas in phase, because Hawaii 
is to the west of us. Then we decide that tomorrow we are going to broadcast 
toward Alberta, Canada. Since that is north, not west, all we have to do is to 
reverse the phase of one of our antennas, and we can broadcast to the north. 
So we can build antenna systems with various arrangements. Ours is one of the 
simplest possible ones; we can make them much more complicated, and by chang- 
ing the phases in the various antennas we can send the beams in various directions 
and send most of the power in the direction in which we wish to transmit, without 
ever moving the antenna! In both of the preceding cases, however, while we are 

2 x/ 4 — 2 broadcasting toward Alberta we are wasting a lot of power on Easter Island, and 

it would be interesting to ask whether it is possible to send it in only one direction, 
At first sight we might think that with a pair of antennas of this nature the result 
a . t /2 is always going to be symmetrical. So let us consider a case that comes out un- 

symmetrical, to show the possible variety. 

Fig. 29-6. A pair of dipole antennas the antennas af e separated by one-quarter wavelength, and if the N one 

giving maximum power in one direction. * s one-fourth period behind the S one in time, then what happens (Fig. 29-6)? 

In the W direction we get 2, as we will see later. In the S direction we get zero , 
because the signal from S comes at a certain time; that from N comes 90° later in 
time , but it is already 90° behind in its built-in phase, therefore it arrives, altogether, 
180° out of phase, and there is no effect. On the other hand, in the N direction, 
the N signal arrives earlier than the S signal by 90° in time, because it is a quarter 
wavelength closer. But its phase is set sathat it is oscillating 90° behind in time, 
which just compensates the delay difference, and therefore the two signals appear 
together in phase, making the field strength twice as large, and the energy four 
times as great. 

Thus, by using some cleverness in spacing and phasing our antennas, we can 
send the power all in one direction. But still it is distributed over a great range of 
angles. Can we arrange it so that it is focused still more sharply in a particular 
direction? Let us consider the case of Hawaii again, where we are sending the 
beam east and west but it is spread over quite an angle, because even at 30° we are 
still getting half the intensity — we are wasting the power. Can we do better than 
that? Let us take a situation in which the separation is ten wavelengths (Fig. 
29-4 



29-7), which is more nearly comparable to the situation in which we experimented 
in the previous chapter, with separations of several wavelengths rather than a 
small fraction of a wavelength. Here the picture is quite different. 

If the oscillators are ten wavelengths apart (we take the in-phase case to make 
it easy), we see that in the E-W direction, they are in phase, and we get a strong 
intensity, four times what we would get if one of them were there alone. On the 
other hand, at a very small angle away, the arrival times differ by 180° and the 
intensity is zero. To be precise, if we draw a line from each oscillator to a distant 
point and the difference A in the two distances is X/2, half an oscillation, then they 
wilt be out of phase. So this first null occurs when that happens. (The figure is 
not drawn to scale; it is only a rough sketch.) This means that we do indeed have 
a very sharp beam in the direction we want, because if we just move over a little 
bit we lose all our intensity. Unfortunately for practical purposes, if we were 
thinking of making a radio broadcasting array and we doubled the distance A, then 
we would be a whole cycle out of phase, which is the same as being exactly in 
phase again! Thus we get many successive maxima and minima, just as we found 
with the 2^X spacing in Chapter 28. 

Now how can we arrange to get rid of all these extra maxima, or “lobes,” as 
they are called? We could get rid of the unwanted lobes in a rather interesting way. 
Suppose that we were to place another set of antennas between the two that we 
already have. That is, the outside ones are still 10X apart, but between them, say 
every 2X, we have put another antenna, and we drive them all in phase. There are 
now six antennas, and if we looked at the intensity in the E-W direction, it would, 
of course, be much higher with six antennas than with one. The field would be 
six times and the intensity thirty-six times as great (the square of the field). We 
get 36 units of intensity in that direction. Now if we look at neighboring points, 
we find a zero as before, roughly, but if we go farther, to where we used to get a 
big “bump,” we get a much smaller “bump” now. Let us try to see why. 

The reason is that although we might expect to get a big bump when the 
distance A is exactly equal to the wavelength, it is true that dipoles 1 and 6 are then 
in phase and are cooperating in trying to get some strength in that direction. But 
numbers 3 and 4 are roughly £ a wavelength out of phase with 1 and 6, and although 
1 and 6 push together, 3 and 4 push together too, but in opposite phase. Therefore 
there is very little intensity in this direction — but there is something; it does not 
balance exactly. This kind of thing keeps on happening; we get very little bumps, 
and we have the strong beam in the direction where we want it. But in this particu- 
lar example, something else will happen: namely, since the distance between suc- 
cessive dipoles is 2X, it is possible to find an angle where the distance 5 between 
successive dipoles is exactly one wavelength, so that the effects from all of them are 
in phase again. Each one is delayed relative to the next one by 360°, so they all 
come back in phase, and we have another strong beam in that direction! It is 
easy to avoid this in practice because it is possible to put the dipoles closer than one 
wavelength apart. If we put in more antennas, closer than one wavelength apart, 
then this cannot happen. But the fact that this can happen at certain angles, if the 
spacing is bigger than one wavelength, is a very interesting and useful phenomenon 
in other applications — not radio broadcasting, but in diffraction gratings. 



Fig. 29-7. The intensity pattern for 
two dipoles separated by 10X. 



Fig. 29-8. A six-dipole antenna ar- 
ray and part of its intensity pattern. 


29-5 The mathematics of interference 

Now we have finished our analysis of the phenomena of dipole radiators 
qualitatively, and we must learn how to analyze them quantitatively. To find the 
effect of two sources at some particular angle in the most general case, where the 
two oscillators have some intrinsic relative phase a from one another and the 
strengths A i and A 2 are not equal, we find that we have to add two cosines having 
the same frequency, but with different phases. It is very easy to find this phase 
difference; it is made up of a delay due to the difference in distance, and the 
intrinsic, built-in phase of the oscillation. Mathematically, we have to find the 
sum R of two waves: R = A\ cos (cot + <j>i) + A 2 cos (cot + How do we 
do it? 
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Fig. 29-9. A geometrical method for 
combining two cosine waves. The entire 
diagram is thought of as rotating counter- 
clockwise with angular frequency co. 


It is really very easy, and we presume that we already know how to do it. 
However, we shall outline the procedure in some detail. First, we can, if we are 
clever with mathematics and know enough about cosines and sines, simply work 
it out. The easiest such case is the one where A x and A 2 are equal, let us say they 
are both equal to A. In those circumstances, for example (we could call this the 
trigonometric method of solving the problem), we have 

R = ^4 [cos ( cot T - <t>i) -f- cos (c ot -j- <£ 2 )]* (29.9) 

Once, in our trigonometry class, we may have learned the rule that 

cos A + cos B = 2cos^(^ + i?)cos£04 “ ^)* (29.10) 

If we know that, then we can immediately write R as 

R — 2 A cos J(0i — 0 2 )cos (cot ^<f> 1 + J<£ 2 )* (29.11) 

So we find that we have an oscillatory wave with a new phase and a new amplitude. 
In general, the result will be an oscillatory wave with a new amplitude A R , which 
we may call the resultant amplitude, oscillating at the same frequency but with a 
phase difference <j> Ri called the resultant phase. In view of this, our particular case 
has the following result: that the resultant amplitude is 

A R = 2A.co$%(<j> x — <j> 2 \ (29.12) 

and the resultant phase is the average of the two phases, and we have completely 
solved our problem. 

Now suppose that we cannot remember that the sum of two cosines is twice 
the cosine of half the sum times the cosine of half the difference. Then we may use 
another method of analysis which is more geometrical. Any cosine function of 
a )t can be considered as the horizontal projection of a rotating vector. Suppose 
there were a vector Ai of length A x rotating with time, so that its angle with the 
horizontal axis is cot + <j> x . (We shall leave out the cot in a minute, and see that it 
makes no difference.) Suppose that we take a snapshot at the time t = 0, although, 
in fact, the picture is rotating with angular velocity co (Fig. 29-9). The projection 
of A 1 along the horizontal axis is precisely A x cos (cot + 0i). Now at t = 0 the 
second wave could be represented by another vector, A 2 , of length A 2 and at an 
angle 0 2 , and also rotating. They are both rotating with the same angular velocity 
co, and therefore the relative positions of the two are fixed. The system goes around 
like a rigid body. The horizontal projection of A 2 is A 2 cos (cot + <t> 2 ). But we 
know from the theory of vectors that if we add the two vectors in the ordinary 
way, by the parallelogram rule, and draw the resultant vector A R , the x-component 
of the resultant is the sum of the x-components of the other two vectors. That 
solves our problem. It is easy to check that this gives the correct result for the 
special case we treated above, where A x = A 2 = A. In this case, we see from 
Fig. 29-9 that A^ lies midway between A! and A 2 and makes an angle %(<t> 2 — <j> x ) 
with each. Therefore we see that A R = 2 A cos \(<f> 2 — </> 1 ), as before. Also, as 
we see from the triangle, the phase of A R , as it goes around, is the average angle 
of A 1 and A 2 when the two amplitudes are equal. Clearly, we can also solve for 
the case where the amplitudes are not equal, just as easily. We can call that the 
geometrical way of solving the problem. 

There is still another way of solving the problem, and that is the analytical 
way. That is, instead of having actually to draw a picture like Fig. 29-9, we can 
write something down which says the same thing as the picture: instead of drawing 
the vectors, we write a complex number to represent each of the vectors. The real 
parts of the complex numbers are the actual physical quantities. So in our par- 
ticular case the waves could be written in this way: A x e %{<ai+ * l) [the real part of 
this is A 1 cos (cot + <t> x )] and A 2 e l(wt+4>2) . Now we can add the two: 

R = A ie i(ut+ +J + A 2 e i{(at ^ = (A x e^ + A 2 e^)e iut (29.13) 
or 

R = A x e^ + A 2 e^ = A R e^. (29.14) 
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This solves the problem that we wanted to solve, because it represents the result as 
a complex number of magnitude Ar and phase <j>R. 

To see how this method works, let us find the amplitude Ar which is the 
“length” of R. To get the “length” of a complex quantity, we always multiply 
the quantity by its complex conjugate, which gives the length squared. The 
complex conjugate is the same expression, but with the sign of the fs reversed. 
Thus we have 

A% = + A^’XAie-** 1 + A 2 e~^\ (29.15) 


In multiplying this out, we get A% + A\ (here the e’s cancel), and for the cross 
terms we have 

A l A 2 (e i{ +i- <i> * ) + 

Now 

e l9 + e~ ld = cos 8 + i sin 8 + cos 8 — / sin 6. 


That is to say, e %9 + e 19 — 2 cos 6. Our final result is therefore 

Ar = A\ + A 2 + 2A\A 2 cos (02 — 0 i)* (29.16) 

As we see, this agrees with the length of Ar in Fig. 29-9, using the rules of 
trigonometry. 

Thus the sum of the two effects has the intensity A\ we would get with one of 
them alone, plus the intensity A% we would get with the other one alone, plus a 
correction. This correction we call the interference effect. It is really only the differ- 
ence between what we get simply by adding the intensities, and what actually 
happens. We call it interference whether it is positive or negative. (Interference in 
ordinary language usually suggests opposition or hindrance, but in physics we 
often do not use language the way it was originally designed!) If the interference 
term is positive, we call that case constructive interference, horrible though it may 
sound to anybody other than a physicist! The opposite case is called destructive 
interference. 

Now let us see how to apply our general formula (29.16) for the case of two 
oscillators to the special situations which we have discussed qualitatively. To 
apply this general formula, it is only necessary to find what phase difference, 
<£i — 0 2 , exists between the signals arriving at a given point. (It depends only 
on the phase difference, of course, and not on the phase itself.) So let us consider 
the case where the two oscillators, of equal amplitude, are separated by some dis- 
tance d and have an intrinsic relative phase a. (When one is at phase zero, the 
phase of the other is a.) Then we ask what the intensity will be in some azimuth 
direction 6 from the E-W line. [Note that this is not the same 8 as appears in 
(29.1). We are torn between using an unconventional symbol like or the con- 
ventional symbol 8 (Fig. 29-10).] The phase relationship is found by noting that 
the difference in distance from P to the two oscillators is d sin 8 , so that the phase 
difference contribution from this is the number of wavelengths in d sin 8 , multiplied 
by 27 t. (Those who are more sophisticated might want to multiply the wave number 
k, which is the rate of change of phase with distance, by d sin 8; it is exactly the 
same.) The phase difference due to the distance difference is thus 2ird sin 0/X, but, 
due to the timing of the oscillators, there is an additional phase a. So the phase 
difference at arrival would be 



Fig. 29-10. Two oscillators of equal 
amplitude, with a phase difference a 
between them. 


02 — 0i — oc + 2irdsin8/\. (29.17) 

This takes care of all the cases. Thus all we have to do is substitute this expression 
into (29.16) for the case A i = A 2 , and we can calculate all the various results for 
two antennas of equal intensity. 

Now let us see what happens in our various cases. The reason we know, for 
example, that the intensity is 2 at 30° in Fig. 29-5 is the following: the two oscilla- 
tors are apart, so at 30°, d sin 8 — X/4. Thus0 2 — 0i = 2xX/4X = 7r/2, and 
so the interference term is zero. (We are adding two vectors at 90°.) The result is 
the hypotenuse of a 45° right-angle triangle, which is y/2 times the unit amplitude; 
squaring it, we get twice the intensity of one oscillator alone. All the other cases 
can be worked out in this same way. 
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30 


Diffraction 


30-1 The resultant amplitude due to n equal oscillators 

This chapter is a direct continuation of the previous one, although the name 
has been changed from Interference to Diffraction. No one has ever been able to 
define the difference between interference and diffraction satisfactorily. It is just a 
question of usage, and there is no specific, important physical difference between 
them. The best we can do, roughly speaking, is to say that when there are only a few 
sources, say two, interfering, then the result is usually called interference, but if 
there is a large number of them, it seems that the word diffraction is more often 
used. So, we shall not worry about whether it is interference or diffraction, but 
continue directly from where we left off in the middle of the subject in the last 
chapter. 

Thus we shall now discuss the situation where there are n equally spaced os- 
cillators, all of equal amplitude but different from one another in phase, either 
because they are driven differently in phase, or because we are looking at them 
at an angle such that there is a difference in time delay. For one reason or another, 
we have to add something like this: 


R — ^4 [cos c ot cos (o)t 0) -|- cos(co/ -f- 20) T* * * * ~b cos(cotf -j- (n — 1)$)L 

(30.1) 


where 0 is the phase difference between one oscillator and the next one, as seen in 
a particular direction. Specifically, <j> = a + 2.ird sin 0/X. Now we must add all 
the terms together. We shall do this geometrically. The first one is of length A , 
and it has zero phase. The next is also of length A and it has a phase equal to 0 . 
The next one is again of length A and it has a phase equal to 2<f> , and so on. So we 
are evidently going around an equiangular polygon with n sides (Fig. 30-1). 

Now the vertices, of course, all lie on a circle, and we can find the net amplitude 
most easily if we find the radius of that circle. Suppose that Q is the center of the 
circle. Then we know that the angle OQS is just a phase angle 0. (This is because 
the radius QS bears the same geometrical relation to A 2 as QO bears to A x , so 
they form an angle 0 between them.) Therefore the radius r must be such that 
A = 2/* sin tf>/2, which fixes r . But the large angle OQT is equal to n<t > , and we 
thus find that Ar = 2 r sin n<j>/ 2. Combining these two results to eliminate r, we 
get 


Ar 


^ sin n<(>/2 
sin 0/2 


(30.2) 


The resultant intensity is thus 


sin 2 n<t>/2 
1 ~ sin 2 0/2 


(30.3) 


Now let us analyze this expression and study some of its consequences. In 
the first place, we can check it for n = 1. It checks: / = / 0 . Next, we check it 
for n — 2: writing sin 0 = 2 sin 0/2 cos 0/2, we find that Ar = 2A cos 0/2, 
which agrees with (29.12). 

Now the idea that led us to consider the addition of several sources was that 
we might get a much stronger intensity in one direction than in another; that the 
nearby maxima which would have been present if there were only two sources 
will have gone down in strength. In order to see this effect, we plot the curve that 
comes from (30.3), taking n to be enormously large and plotting the region near 

30-1 


30-1 The resultant amplitude due to 
n equal oscillators 

30-2 The diffraction grating 

30-3 Resolving power of a grating 

30-4 The parabolic antenna 

30-5 Colored films; crystals 

30-6 Diffraction by opaque screens 

30-7 The field of a plane of 
oscillating charges 


T 



Fig. 30-1. The resultant amplitude 
of n = 6 equally spaced sources with net 
successive phase differences 0. 
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Fig. 30-2. The intensity as a function 
of phase angle for a large number of 
oscillators of equal strength. 



Fig. 30-3. A linear array of n equal 
oscillators, driven with phases a s = sa. 


<t> — 0. In the first place, if is exactly 0, we have 0/0, but if <i> is infinitesimal, the 
ratio of the two sines squared is simply n 2 , since the sine and the angle are approxi- 
mately equal. Thus the intensity of the maximum of the curve is equal to n 2 times 
the intensity of one oscillator. That is easy to see, because if they are all in phase, 
then the little vectors have no relative angle and all n of them add up so the ampli- 
tude is n times, and the intensity n 2 times, stronger. 

As the phase <j> increases, the ratio of the two sines begins to fall off, and the 
first time it reaches zero is when n<t>/2 = r, because sin t = 0. In other words, 
0 = 2ir/n corresponds to the first minimum in the curve (Fig. 30-2). In terms 
of what is happening with the arrows in Fig. 30-1, the first minimum occurs when 
all the arrows come back to the starting point; that means that the total accumu- 
lated angle in all the arrows, the total phase difference between the first and last 
oscillator, must be 27r to complete the circle. 

Now we go to the next maximum, and we want to see that it is really much 
smaller than the first one, as we had hoped. We shall not go precisely to the maxi- 
mum position, because both the numerator and the denominator of (30.3) are 
variant, but sin <j>/2 varies quite slowly compared with sin n<t>/2 when n is large, 
so when sin n<j>/2 = 1 we are very close to the maximum. The next maximum of 
sin 2 n<f>/2 comes at n<f>/ 2 — 37r/2, or <j> = 3i r/n. This corresponds to the arrows 
having traversed the circle one and a half times. On putting <t> = 3ir/n into the 
formula to find the size of the maximum, we find that sin 2 3t/2 = 1 in the nu- 
merator (because that is why we picked this angle), and in the denominator we 
have sin 2 3w/2n. Now if n is sufficiently large, then this angle is very small and 
the sine is equal to the angle; so for all practical purposes, we can put sin 3ir/2n = 
3ir/2n. Thus we find that the intensity at this maximum is I = / 0 (4n 2 /9x 2 ). But 
n 2 I Q was the maximum intensity, and so we have 4/9ir 2 times the maximum in- 
tensity, which is about 0.047, less than 5 percent, of the maximum intensity! Of 
course there are decreasing intensities farther out. So we have a very sharp central 
maximum with very weak subsidiary maxima on the sides. 

It is possible to prove that the area of the whole curve, including all the little 
bumps, is equal to 2i rnl 0 , or twice the area of the dotted rectangle in Fig. 30-2. 

Now let us consider further how we may apply Eq. (30.3) in different cir- 
cumstances, and try to understand what is happening. Let us consider our sources 
to be all on a line, as drawn in Fig. 30-3. There are n of them, all spaced by a 
distance d , and we shall suppose that the intrinsic relative phase, one to the next, 
is a. Then if we are observing in a given direction 0 from the normal, there is an 
additional phase 2ird sin 0/A because of the time delay between each successive 
two, which we talked about before. Thus 

<t> = a + 2wd sin 0/A 
= a + kd sin 0. 

First, we shall take the case a = 0. That is, all oscillators are in phase, and 
we want to know what the intensity is as a function of the angle 6. In order to 
find out, we merely have to put <f> = kd sin 8 into formula (30.3) and see what 
happens. In the first place, there is a maximum when <f> = 0. That means that 
when all the oscillators are in phase there is a strong intensity in the direction 
0 = 0. On the other hand, an interesting question is, where is the first minimum? 
That occurs when <f> = 2i r/n. In other words, when 27r</sin0/A = 2w/n, we 
get the first minumum of the curve. If we get rid of the 27r’s so we can look at it a 
little better, it says that 

nd sin 0 = A. (30.5) 

Now let us understand physically why we get a minimum at that position, nd is 
the total length L of the array. Referring to Fig. 30-3, we see that nd sin 0 = 
L sin 0 = A. What (30.5) says is that when A is equal to one wavelength , we get a 
minimum. Now why do we get a minimum when A = A? Because the contribu- 
tions of the various oscillators are then uniformly distributed in phase from 0° to 
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360°. The arrows (Fig. 30-1) are going around a whole circle — we are adding equal 
vectors in all directions, and such a sum is zero. So when we have an angle such 
that A = X, we get a minimum. That is the first minimum. 

There is another important feature about formula (30.3), which is that if the 
angle <t> is increased by any multiple of hr, it makes no difference to the formula. 
So we will get other strong maxima at <t> = 27 t , 47 r, 67 r, and so forth. Near each of 
these great maxima the pattern of Fig. 30-2 is repeated. We may ask ourselves, 
what is the geometrical circumstance that leads to these other great maxima? 
The condition is that <f> = linn, where m is any integer. That is, 2 wd sin 0/X = 
2irm. Dividing by 2 x, we see that 

d sin 8 = mX. (30.6) 

This looks like the other formula, (30.5). No, that formula was nd sin 8 = X. 
The difference is that here we have to look at the individual sources, and when we 
say d sin 8 — mX, that means that we have an angle 8 such that 5 = mX. In other 
words, each Source is now contributing a certain amount, and successive ones are 
out of phase by a whole multiple of 360°, and therefore are contributing in phase , 
because out of phase by 360° is the same as being in phase. So they all contribute 
in phase and produce just as good a maximum as the one for m = 0 that we dis- 
cussed before. The subsidiary bumps, the whole shape of the pattern, is just like 
the one near <£ = 0, with exactly the same minima on each side, etc. Thus such an 
array will send beams in various directions — each beam having a strong central 
maximum and a certain number of weak “side lobes.” The various strong beams 
are referred to as the zero-order beam, the first-order beam, etc., according to the 
value of m. m is called the order of the beam. 

We call attention to the fact that if d is less than X, Eq. (30.6) can have no 
solution except m = 0, so that if the spacing is too small there is only one possible 
beam, the zero-order one centered at 8 = 0. (Of course, there is also a beam in 
the opposite direction.) In order to get subsidiary great maxima, we must have 
the spacing d of the array greater than one wavelength. 


30-2 The diffraction grating 

In technical work with antennas and wires it is possible to arrange that all 
the phases of the little oscillators, or antennas, are equal. The question is whether 
and how we can do a similar thing with light. We cannot at the present time literally 
make little optical-frequency radio stations and hook them up with infinitesimal 
wires and drive them all with a given phase. But there is a very easy way to do what 
amounts to the same thing. 

Suppose that we had a lot of parallel wires, equally spaced at a spacing d , 
and a radiofrequency source very far away, practically at infinity, which is generat- 
ing an electric field which arrives at each one of the wires at the same phase (it is 
so far away that the time delay is the same for all of the wires). (One can work out 
cases with curved arrays, but let us take a plane one.) Then the external electric 
field will drive the electrons up and down in each wire. That is, the field which is 
coming from the original source will shake the electrons up and down, and in 
moving, these represent new generators . This phenomenon is called scattering: 
a light wave from some source can induce a motion of the electrons in a piece of 
material, and these motions generate their own waves. Therefore all that is 
necessary is to set up a lot of wires, equally spaced, drive them with a radiofrequency 
source far away, and we have the situation that we want, without a whole lot of 
special wiring. If the incidence is normal, the phases will be equal, and we will 
get exactly the circumstance we have been discussing. Therefore, if the wire 
spacing is greater than the wavelength, we will get a strong intensity of scattering 
in the normal direction, and in certain other directions given by (30.6). 

This can also be done with light ! Instead of wires, we use a flat piece of glass 
and make notches in it such that each of the notches scatters a little differently 
than the rest of the glass. If we then shine light on the glass, each one of the notches 
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Fig. 30-4. The path difference for 
rays scattered from adjacent rulings of a 
grating is d sin 0 OU , — d sin 0 in . 


will represent a source, and if we space the lines very finely, but not closer than a 
wavelength (which is technically almost impossible anyway), then we would expect 
a miraculous phenomenon: the light not only will pass straight through, but there 
will also be a strong beam at a finite angle, depending on the spacing of the notches! 
Such objects have actually been made' and are in common use — they are called 
diffraction gratings . 

In one of its forms, a diffraction grating consists of nothing but a plane glass 
sheet, transparent and colorless, with scratches on it. There are often several 
hundred scratches to the millimeter, very carefully arranged so as to be equally 
spaced. The effect of such a grating can be seen by arranging a projector so as to 
throw a narrow, vertical line of light (the image of a slit) onto a screen. When we 
put the grating into the beam, with its scratches vertical, we see that the line is still 
there but, in addition, on each side we have another strong patch of light which is 
colored . This, of course, is the slit image spread out over a wide angular range, 
because the angle 0 in (30.6) depends upon X, and lights of different colors, as we 
know, correspond to different frequencies, and therefore different wavelengths. 
The longest visible wavelength is red, and since d sin 0 = X, that requires a larger 
0. And we do, in fact, find that red is at a greater angle out from the central image! 
There should also be a beam on the other side, and indeed we see one on the screen. 
Then, there might be another solution of (30.6) when m — 2. We do see that there 
is something vaguely there — very weak — and there are even other beams beyond. 

We have just argued that all these beams ought to be of the same strength, 
but we see that they actually are not and, in fact, not even the first ones on the right 
and left are equal! The reason is that the grating has been carefully built to do just 
this. How? If the grating consists of very fine notches, infinitesimally wide, spaced 
evenly, then all the intensities would indeed be equal. But, as a matter of fact, 
although we have taken the simplest case, we could also have considered an array 
of pairs of antennas, in which each member of the pair has a certain strength and 
some relative phase. In this case, it is possible to get intensities which are different 
in the different orders. A grating is often made with little “sawtooth” cuts instead 
of little symmetrical notches. By carefully arranging the “sawteeth,” more light 
may be sent into one particular order of spectrum than into the others. In a 
practical grating, we would like to have as much light as possible in one of the 
orders. This may seem a complicated point to bring in, but it is a very clever thing 
to do, because it makes the grating more useful. 

So far, we have taken the case where all the phases of the sources are equal. 
But we also have a formula for <j> when the phases differ from one to the next 
by an angle a. That requires wiring up our antennas with a slight phase shift 
between each one. Can we do that with light? Yes, we can do it very easily, for 
suppose that there were a source of light at infinity, at an angle such that the light 
is coming in at an angle 0j n , and let us say that we wish to discuss the scattered 
beam, which is leaving at an angle 0 out . The 0 ou t is the same 0 as we have had before, 
but the 0i n is merely a means for arranging that the phase of each source is 
different: the light coming from the distant driving source first hits one scratch, 
then the next, then the next, and so on, with a phase shift from one to the other, 
which, as we see, is a = ~d sin 0i n /X. Therefore we have the formula for a grating 
in which light both comes in and goes out at an angle: 

<j> — 2 wd sin 0 ou tA “ 27rc/ sin 0i n /X. (30.7) 

Let us try to find out where we get strong intensity in these circumstances. The 
condition for strong intensities is, of course, that <t> should be a multiple of 2x. 
There are several interesting points to be noted. 

One case of rather great interest is that which corresponds to m = 0, where 
d is less than X; in fact, this is the only solution. In this case we see that sin 0 out = 
sin 0i n , which means that the light comes out in the same direction as the light 
which was exciting the grating. We might think that the light “goes right through.” 
No, it is different light that we are talking about. The light that goes right through 
is from the original source; what we are talking about is the new light which is 
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generated by scattering . It turns out that the scattered light is going in the same 
direction as the original light, in fact it can interfere with it — a feature which we 
will study later. 

There is another solution for this same case. For a given 0 in , 0 O ut may be the 
supplement of 0j n . So not only do we get a beam in the same direction as the in- 
coming beam but also one in another direction, which, if we consider it carefully, 
is such that the angle of incidence is equal to the angle of scattering . This we call the 
reflected beam . 

So we begin to understand the basic machinery of reflection: the light that 
comes in generates motions of the atoms in the reflector, and the reflector then 
regenerates a new wave , and one of the solutions for the direction of scattering, the 
only solution if the spacing of the scatterers is small compared with one wavelength, 
is that the angle at which the light comes out is equal to the angle at which it comes 
in! 

Next, we discuss the special case when d — > 0. That is, we have just a solid 
piece of material, so to speak, but of finite length. In addition, we want the phase 
shift from one scatterer to the next to go to zero. In other words, we put more and 
more antennas between the other ones, so that each of the phase differences is 
getting smaller, but the number of antennas is increasing in such a way that the 
total phase difference, between one end of the line and the other, is constant. 
Let us see what happens to (30.3) if we keep the difference in phase n<j> from one 
end to the other constant (say n<f> = <£), letting the number go to infinity and the 
phase shift <t> of each one go to zero. But now <f> is so small that sin <j> = <£, and if 
we also recognize n 2 / 0 as / m , the maximum intensity at the center of the beam, 
we find 

/ = 4/ m sin 2 £$/<!> 2 . (30.8) 

This limiting case is what is shown in Fig. 30-2. 

In such circumstances we find the same general kind of a picture as for finite 
spacing with d > X; all the side lobes are practically the same as before, but there 
are no higher-order maxima. If the scatterers are all in phase, we get a maximum 
in the direction 0 ou t = 0, and a minimum when the distance A is equal to X, just 
as for finite d and n. So we can even analyze a continuous distribution of 
scatterers or oscillators, by using integrals instead of summing. 

As an example, suppose there were a long line of oscillators, with the charge 
oscillating along the direction of the line (Fig. 30-5). From such an array the 
greatest intensity is perpendicular to the line. There is a little bit of intensity up 
and down from the equatorial plane, but it is very slight. With this result, we can 
handle a more complicated situation. Suppose we have a set of such lines, each 
producing a beam only in a plane perpendicular to the line. To find the intensity 
in various directions from a series of long wires, instead of infinitesimal wires, is 
the same problem as it was for infinitesimal wires, so long as we are in the central 
plane perpendicular to the wires; we just add the contribution from each of the 
long wires. That is why, although we actually analyzed only tiny antennas, we 
might as well have used a grating with long, narrow slots. Each of the long slots 
produces an effect only in its own direction, not up and down, but they are all 
set next to each other horizontally, so they produce interference that way. 

Thus we can build up more complicated situations by having various distribu- 
tions of scatterers in lines, planes, or in space. The first thing we did was to con- 
sider scatterers in a line, and we have just extended the analysis to strips; we can 
work it out by just doing the necessary summations, adding the contributions from 
the individual scatterers. The principle is always the same. 

30-3 Resolving power of a grating 

We are now in a position to understand a number of interesting phenomena. 
For example, consider the use of a grating for separating wavelengths. We noticed 
that the whole spectrum was spread out on the screen, so a grating can be used as 
an instrument for separating light into its different wavelengths. One of the 
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Fig. 30-5. The intensity pattern of a 
continuous line of oscillators has a single 
strong maximum and many weak “side 
lobes.” 




Fig. 30-6. Illustration of the Rayleigh 
criterion. The maximum of one pattern 
falls on the first minimum of the other. 


interesting questions is: supposing that there were two sources of slightly different 
frequency, or slightly different wavelength, how close together in wavelength could 
they be such that the grating would be unable to tell that there were really two 
different wavelengths there? The red and the blue were clearly separated. But 
when one wave is red and the other is slightly redder, very close, how close can they 
be? This is called the resolving power of the grating, and one way of analyzing 
the problem is as follows. Suppose that for light of a certain color we happen to 
have the maximum of the diffracted beam occurring at a certain angle. If we vary 
the wavelength the phase lird sin 0/A is different, so of course the maximum 
occurs at a different angle. That is why the red and blue are spread out. How 
different in angle must it be in order for us to be able to see it? If the two maxima 
are exactly on top of each other, of course we cannot see them. If the maximum 
of one is far enough away from the other, then we can see that there is a double 
bump in the distribution of light. In order to be able to just make out the double 
bump, the following simple criterion, called Rayleigh's criterion , is usually used 
(Fig. 30-6). It is that the first minimum from one bump should sit at the maximum 
of the other. Now it is very easy to calculate, when one minimum sits on the other 
maximum, how much the difference in wavelength is. The best way to do it is 
geometrically. 

In order to have a maximum for wavelength A', the distance A (Fig. 30-3) 
must be rik', and if we are looking at the rath-order beam, it is mn\ f . In other words, 
27 rd sin 0/A' = 27rra, so nd sin 0, which is A, is A' times n, or mn\ f . For the other 
beam, of wavelength A, we want to have a minimum at this angle. That is, we want 
A to be exactly one wavelength A more than ranA. That is, A = mn\ + A = mn\'. 
Thus if A' = A + AA, we find 

AA/A = l/mn. (30.9) 

The ratio A/AA is called the resolving power of a grating; we see that it is equal to 
the total number of lines in the grating, times the order. It is not hard to prove 
that this formula is equivalent to the formula that the error in frequency is equal 
to the reciprocal time difference between extreme paths that are allowed to interfere :* 

Av = 1 /T. 

In fact, that is the best way to remember it, because the general formula works 
not only for gratings, but for any other instrument whatsoever, while the special 
formula (30.9) depends on the fact that we are using a grating. 

30-4 The parabolic antenna 

Now let us consider another problem in resolving power. This has to do with 
the antenna of a radio telescope, used for determining the position of radio sources 
in the sky, i.e., how large they are in angle. Of course if we use any old antenna 
and find signals, we would not know from what direction they came. We are 
very interested to know whether the source is in one place or another. One way 
we can find out is to lay out a whole series of equally spaced dipole wires on the 
Australian landscape. Then we take all the wires from these antennas and feed 
them into the same receiver, in such a way that all the delays in the feed lines are 
equal. Thus the receiver receives signals from all of the dipoles in phase. That is, 
it adds all the waves from every one of the dipoles in the same phase. Now what 
happens? If the source is directly above the array, at infinity or nearly so, then 
its radiowaves will excite all the antennas in the same phase, so they all feed the 
receiver together. 

Now suppose that the radio source is at a slight angle 0 from the vertical. 
Then the various antennas are receiving signals a little out of phase. The receiver 
adds all these out-of-phase signals together, and so we get nothing, if the angle 

* In our case T = A/c = mrik/c> where c is the speed of light. The frequency v = c/A, 
soAv = c AA/A 2 . 
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6 is too big. How big may the angle be? Answer: we get zero if the angle A/L = d 
(Fig. 30-3) corresponds to a 360° phase shift, that is, if A is the wavelength X 
This is because the vector contributions form together a complete polygon with 
zero resultant. The smallest angle that can be resolved by an antenna array of 
length Lise = X/L. Notice that the receiving pattern of an antenna such as this 
is exactly the same as the intensity distribution we would get if we turned the 
receiver around and made it into a transmitter. This is an example of what is 
called a reciprocity principle. It turns out, in fact, to be generally true for any 
arrangement of antennas, angles, and so on, that if we first work out what the 
relative intensities would be in various directions if the receiver were a transmitter 
instead, then the relative directional sensitivity of a receiver with the same external 
wiring, the same array of antennas, is the same as the relative intensity of emission 
would be if it were a transmitter. 

Some radio antennas are made in a different way. Instead of having a whole 
lot of dipoles in a long line, with a lot of feed wires, we may arrange them not in a 
line but in a curve, and put the receiver at a certain point where it can detect the 
scattered waves. This curve is cleverly designed so that if the radiowaves are 
coming down from above, and the wires scatter, making a new wave, the wires 
are so arranged that the scattered waves reach the receiver all at the same time 
(Fig. 26-12). In other words, the curve is a parabola , and when the source is ex- 
actly on its axis, we get a very strong intensity at the focus. In this case we under- 
stand very clearly what the resolving power of such an instrument is. The arranging 
of the antennas on a parabolic curve is not an essential point. It is only a con- 
venient way to get all the signals to the same point with no relative delay and 
without feed wires. The angle such an instrument can resolve is still 6 = X/L, 
where L is the separation of the first and last antennas. It does not depend on the 
spacing of the antennas and they may be very close together or in fact be all one 
piece of metal. Now we are describing a telescope mirror, of course. We have found 
the resolving power of a telescope! (Sometimes the resolving power is written 
0—1 .22 X/L, where L is the diameter of the telescope. The reason that it is not 
exactly X/L is this: when we worked out that 6 = X/L, we assumed that all the 
lines of dipoles were equal in strength, but when we have a circular telescope, 
which is the way we usually arrange a telescope, not as much signal comes from 
the outside edges, because it is not like a square, where we get the same intensity 
all along a side. We get somewhat less because we are using only part of the tele- 
scope there; thus we can appreciate that the effective diameter is a little shorter 
than the true diameter, and that is what the 1 .22 factor tells us. In any case, it seems 
a little pedantic to put such precision into the resolving power formula.*) 


30-5 Colored films; crystals 

The above, then, are some of the effects of interference obtained by adding 
the various waves. But there are a number of other examples, and even though we 
do not understand the fundamental mechanism yet, we will some day, and we can 
understand even now how the interference occurs. For example, when a light 
wave hits a surface of a material with an index n, let us say at normal incidence, 
some of the light is reflected. The reason for the reflection we are not in a position 
to understand right now; we shall discuss it later. But suppose we know that some 
of the light is reflected both on entering and leaving a refracting medium. Then, 
if we look at the reflection of a light source in a thin film, we see the sum of two 
waves, if the thicknesses are small enough, these two waves will produce an inter- 
ference, either constructive or destructive, depending on the signs of the phases. 
It might be, for instance, that for red light, we get an enhanced reflection, but for 


* This is because Rayleigh’s criterion is a rough idea in the first place, It tells you where 
it begins to get very hard to tell whether the image was made by one or by two stars. 
Actually, if sufficiently careful measurements of the exact intensity distribution over the 
diffracted image spot can be made, the fact that two sources make the spot can be proved 
even if 8 is less than X/L. 
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blue light, which has a different wavelength, perhaps we get a destructively inter- 
fering reflection, so that we see a bright red reflection. If we change the thickness, 
i.e., if we look at another place where the film is thicker, it may be reversed, the red 
interfering and the blue not, so it is bright blue, or green, or yellow, or whatnot. 
So we see colors when we look at thin films and the colors change if we look at 
different angles, because we can appreciate that the timings are different at different 
angles. Thus we suddenly appreciate another hundred thousand situations involv- 
ing the colors that we see on oil fims, soap bubbles, etc. at different angles. But the 
principle is all the same: we are only adding waves at different phases. 

As another important application of diffraction, we may mention the following. 
We used a grating and we saw the diffracted image on the screen. If we had used 
monochromatic light, it would have been at a certain specific place. Then there 
were various higher-order images also. From the positions of the images, we could 
tell how far apart the lines on the grating were, if we knew the wavelength of the 
light. From the difference in intensity of the various images, we could find out the 
shape of the grating scratches, whether the grating was made of wires, sawtooth 
notches, or whatever, without being able to see them . This principle is used to dis- 
cover the positions of the atoms in a crystal . The only complication is that a crystal 
is three-dimensional; it is a repeating three-dimensional array of atoms. We cannot 
use ordinary light, because we must use something whose wavelength is less than 
the space between the atoms or we get no effect; so we must use radiation of 
very short wavelength, i.e., x-rays. So, by shining x-rays into a crystal and by 
noticing how intense is the reflection in the various orders, we can determine the 
arrangement of the atoms inside without ever being able to see them with the eye! 
It is in this way that we know the arrangement of the atoms in various substances, 
which permitted us to draw those pictures in the first chapter, showing the arrange- 
ment of atoms in salt, and so on. We shall later come back to this subject and dis- 
cuss it in more detail, and therefore we say no more about this most remarkable 
idea at present. 


30-6 Diffraction by opaque screens 



Fig. 30-7. A distant light source 
casts a shadow of an opaque object on a 
screen. 


Now we come to a very interesting situation. Suppose that we have an opaque 
sheet with holes in it, and a light on one side of it. We want to know what the 
intensity is on the other side. What most people say is that the light shines through 
the holes, and produces an effect on the other side. It will turn out that one gets 
the right answer, to an excellent approximation, if he assumes that there are sources 
distributed with uniform density across the open holes, and that the phases of 
these sources are the same as they would have been if the opaque material were 
absent. Of course, actually there are no sources at the holes, in fact that is the only 
place that there are certainly no sources. Nevertheless, we get the correct diffraction 
patterns by considering the holes to be the only places that there are sources; that 
is a rather peculiar fact. We shall explain later why this is true, but for now let us 
just suppose that it is. 

In the theory of diffraction there is another kind of diffraction that we should 
briefly discuss. It is usually not discussed in an elementary course as early as this, 
only because the mathematical formulas involved in adding these little vectors are 
a little elaborate. Otherwise it is exactly the same as we have been doing all along. 
All the interference phenomena are the same; there is nothing very much more 
advanced involved, only the circumstances are more complicated and it is harder 
to add the vectors together, that is all. 

Suppose that we have light coming in from infinity, casting a shadow of an 
object. Figure 30-7 shows a screen on which the shadow of an object AB is made 
by a light source very far away compared with one wavelength. Now we would 
expect that outside the shadow, the intensity is all bright, and inside it, it is all dark. 
As a matter of fact, if we plot the intensity as a function of position near the shadow 
edge, the intensity rises and then overshoots, and wobbles, and oscillates about 
in a very peculiar manner near this edge (Fig. 30-8). We now shall discuss the 
reason for this. If we use the theorem that we have not yet proved, then we can 
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replace the actual problem by a set of effective sources uniformly distributed over 
the open space beyond the object. 

We imagine a large number of very closely spaced antennas, and we want the 
intensity at some point P. That looks just like what we have been doing. Not 
quite ; because our screen is not at infinity. We do not want the intensity at infinity, 
but at a finite point. To calculate the intensity at some particular place, we have to 
add the contributions from all the antennas. First there is an antenna at D , 
exactly opposite P; if we go up a little bit in angle, let us say a height h , then there 
is an increase in delay (there is also a change in amplitude because of the change in 
distance, but this is a very small effect if we are at all far away, and is much less 
important than the difference in the phases). Now the path difference EP — DP 
is h 2 /2s, so that the phase difference is proportional to the square of how far we 
go from D , while in our previous work s was infinite, and the phase difference was 
linearly proportional to h. When the phases are linearly proportional, each vector 
adds at a constant angle to the next vector. What we now need is a curve which 
is made by adding a lot of infinitesimal vectors with the requirement that the angle 
they make shall increase, not linearly, but as the square of the length of the curve. 
To construct that curve involves slightly advanced mathematics, but we can always 
construct it by actually drawing the arrows and measuring the angles. In any case, 
we get the marvelous curve (called Cornu’s spiral) shown in Fig. 30-8. Now how 
do we use this curve? 

If we want the intensity, let us say, at point P , we add a lot of contributions of 
different phases from point D on up to infinity, and from D down only to point B P . 
So we start at B P in Fig. 30-8, and draw a series of arrows of ever-increasing angle. 
Therefore the total contribution above point B P all goes along the spiraling curve. 
If we were to stop integrating at some place, then the total amplitude would be a 
vector from B to that point; in this particular problem we are going to infinity, 
so the total answer is the vector B^ x . Now the position on the curve which 
corresponds to point B P on the object depends upon where point P is located, 
since point D, the inflection point, always corresponds to the position of point P. 
Thus, depending upon where P is located above B , the beginning point will fall 
at various positions on the lower left part of the curve, and the resultant vector 
Bp x will have many maxima and minima (Fig. 30-9). 



Fig. 30-8. The addition of ampli- 
tudes for many in-phase oscillators whose 
phase delays vary as the square of the 
distance from point D of the previous 
figure. 


Fig. 30-9. The intensity near the 
edge of a shadow. The geometrical 
shadow edge is at xq. 



On the other hand, if we are at Q , on the other side of P , then we are using 
only one end of the spiral curve, and not the other end. In other words, we do not 
even start at D , but at Bq , so on this side we get an intensity which continuously 
falls off as Q goes farther into the shadow. 

One point that we can immediately calculate with ease, to show that we really 
understand it, is the intensity exactly opposite the edge. The intensity here is 1/4 
that of the incident light. Reason: Exactly at the edge (so the endpoint B of the 
arrow is at D in Fig. 30-8) we have half the curve that we would have had if we 
were far into the bright region. If our point R is far into the light we go from one 
end of the curve to the other, that is, one full unit vector; but if we are at the edge 
of the shadow, we have only half the amplitude — 1/4 the intensity. 

In this chapter we have been finding the intensity produced in various direc- 
tions from various distributions of sources. As a final example we shall derive a 
formula which we shall need for the next chapter on the theory of the index of 
refraction. Up to this point relative intensities have been sufficient for our purpose, 
but this time we shall find the complete formula for the field in the following situa- 
tion. 
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Fig. 30-10. Radiation field of a 
sheet of oscillating charges. 


30-7 The field of a plane of oscillating charges 

Suppose that we have a plane full of sources, all oscillating together, with 
their motion in the plane and all having the same amplitude and phase. What is 
the field at a finite, but very large, distance away from the plane? (We cannot 
get very close, of course, because we do not have the right formulas for the field 
close to the sources.) If we let the plane of the charges be the XY- plane, then we 
want to find the field at the point P far out on the Z-axis (Fig. 30-10). We suppose 
that there are v charges per unit area of the plane, and that each one of them has a 
charge q. All of the charges move with simple harmonic motion, with the same 
direction, amplitude, and phase. We let the motion of each charge, with respect 
to its own average position, be-x 0 cos ut. Or, using the complex notation and re- 
membering that the real part represents the actual motion, the motion can be 
described by x 0 e iwt . 

Now we find the field at the point P from all of the charges by finding the 
field there from each charge q, and then adding the contributions from all the 
charges. We know that the radiation field is proportional to the acceleration of 
the charge, which is — u 2 xoe ,at (and is the same for every charge). The electric 
field that we want at the point P due to a charge at the point Q is proportional to 
the acceleration of the charge q, but we have to remember that the field at the point 
P at the instant t is given by the acceleration of the charge at the earlier time 
t' = t - r/c, where r/c is the time it takes the waves to travel the distance r 
from Q to P. Therefore the field at P is proportional to 


-« 2 * 0 e < “ ( ‘- r/e) - (30.10) 

Using this value for the acceleration as seen from P in our formula for the electric 
field at large distances from a radiating charge, we get 


Electric field at P\ _ q w 2 x 0 e i " (<_, ' /c> 

.from charge at Q/ ~ 4we 0 c 2 r (approx.). 


(30.11) 


Now this, formula is not quite right, because we should have used not the 
acceleration of the charge but its component perpendicular to the line QP. We shall 
suppose, however, that the point P is so far away, compared with the distance of 
the point Q from the axis (the distance p in Fig. 30—9), for those changes that we 
need to take into account, that we can leave out the cosine factor (which would 
be nearly equal to 1 anyway). 

To get the total field at P , we now add the effects of all the charges in the plane. 
We should, of course, make a vector sum. But since the direction of the electric 
field is nearly the same for all the charges, we may, in keeping with the approxima- 
tion we have already made, just add the magnitudes of the fields. To our approxi- 
mation the field at P depends only on the distance r, so all charges at the same r 
produce equal fields. So we add, first, the fields of those charges in a ring of width 
dp and radius p. Then, by taking the integral over all p, we will obtain the total field. 

The number of charges in the ring is the product of the surface area of the 
ring, 2 7 rp dp, and 17 , the number of charges per unit area. We have, then, 

/ n t—r/c) 

4 ^ °— rrlirpdp. (30.12) 


We wish to evaluate this integral from p = 0 to p = 00 . The variable t, of 
course, is to be held fixed while we do the integral, so the only varying quantities 
are p and r. Leaving out all the constant factors, including the factor e lict , for the 
moment, the integral we wish is 

rP=<x> . 

/ e -io>r}c 

J„. < 30 -' 3 > 

To do this integral we need to use the relation between r and p: 


r 


2 


P 2 + z 2 . 
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Since z is independent of p, when we take the differential of this equation, we get 

2 r dr — 2 p dp , 


which is lucky, since in our integral we can replace p dp by r dr and the r will cancel 
the one in the denominator. The integral we want is then the simpler one 



(30.15) 


To integrate an exponential is very easy. We divide by the coefficient of r in the 
exponent and evaluate the exponential at the limits. But the limits of r are not the 
same as the limits of p. When p = 0, we have r = z, so the limits of r are z to 
infinity. We get for the integral 

_ !L r e - <ao _ e -(i«/c)*i (30.16) 

ico 

where we have written oo for (r/c) oo, since they both just mean a very large 
number! 

Now is a mysterious quantity. Its real part, for example, is cos ( — oo), 
which, mathematically speaking, is completely indefinite (although we would 
expect it to be somewhere — or everywhere (?) — between +1 and —1!). But in a 
physical situation, it can mean something quite reasonable, and usually can just 
be taken to be zero. To see that this is so in our case, we go back to consider again 
the original integral (30.15). 

We can understand (30.15) as a sum of many small complex numbers, each of 
magnitude A r, and with the angle 8 = — c or/c in the complex plane. We can try 
to evaluate the sum by a graphical method. In Fig. 30-11 we have drawn the first 
five pieces of the sum. Each segment of the curve has the length A r and is placed 
at the angle Ad — — co A r/c with respect to the preceding piece. The sum for 
these first five pieces is represented by the arrow from the starting point to the 
end of the fifth segment. As we continue to add pieces we shall trace out a polygon 
until we get back to the starting point (approximately) and then start around once 
more. Adding more pieces, we just go round and round, staying close to a circle 
whose radius is easily shown to be c/co. We can see now why the integral does not 
give a definite answer! 

But now we have to go back to the physics of the situation. In any real 
situation the plane of charges cannot be infinite in extent, but must sometime stop. 
If it stopped suddenly, and was exactly circular in shape, our integral would have 
some value on the circle in Fig. 30-11. If, however, we let the number of charges 
in the plane gradually taper off at some large distance from the center (or else stop 
suddenly but in an irregular shape so for larger p the entire ring of width dp no 
longer contributes), then the coefficient 17 in the exact integral would decrease 
toward zero. Since we are adding smaller pieces but still turning through the same 
angle, the graph of our integral would then become a curve which is a spiral. The 
spiral would eventually end up at the center of our original circle, as drawn in 
Fig. 30-12. The physically correct integral is the complex number A in the figure 
represented by the interval from the starting point to the center of the circle, which 
is just equal to 

1. e -***i° (30.17) 

ico 1 



Fig. 30-11. 
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as you can work out for yourself. This is the same result we would get from Eq. 
(30.16) if we set e~ l< * = 0 . 

(There is also another reason why the contribution to the integral tapers off 
for large values of r, and that is the factor we have omitted for the projection of 
the acceleration on the plane perpendicular to the line Pg.) 

We are, of course, interested only in physical situations, so we will take e 
equal to zero. Returning to our original formula (30.12) for the field and putting 

30-11 



back all of the factors that go with the integral, we have the result 

Total field at P = — iu: c 0 e , " ( ‘ _ ’ /c) (30.18) 

(remembering that 1// = — /). 

It is interesting to note that ( iux 0 e l “ t ) is just equal to the velocity of the charges, 
so that we can also write the equation for the field as 

Total field at P = - ~^L [velocity of charges] at ,_* /c , (30.19) 

which is a little strange, because the retardation is just by the distance z, which is 
the shortest distance from P to the plane of charges. But that is the way it comes 
out— fortunately a rather simple formula. (We may add, by the way, that although 
our derivation is valid only for distances far from the plane of oscillatory charges, 
it turns out that the formula (30.18) or (30.19) is correct at any distance z, even 
forz < X.) 
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31 


The Origin of the Refractive Index 


31-1 The index of refraction 


We have said before that light goes slower in water than in air, and slower, 
slightly, in air than in vacuum. This effect is described by the index of refraction 
n. Now we would like to understand how such a slower velocity could come about. 
In particular, we should try to see what the relation is to some physical assumptions, 
or statements, we made earlier, which were the following: 

(a) That the total electric field in any physical circumstance can always be 
represented by the sum of the fields from all the charges in the universe. 

(b) That the field from a single charge is given by its acceleration evaluated with 
a retardation at the speed c, always (for the radiation field). 

But, for a piece of glass, you might think: “Oh, no, you should modify all 
this. You should say it is retarded at the speed c/n .” That, however, is not right, 
and we have to understand why it is not. 

It is approximately true that light or any electrical wave does appear to travel 
at the speed c/n through a material whose index of refraction is n, but the fields are 
still produced by the motions of all the charges— including the charges moving in 
the material— and with these basic contributions of the field travelling at the 
ultimate velocity c. Our problem is to understand how the apparently slower 
velocity comes about. 

We shall try to understand the effect in a very simple case. A source which 
we shall call “the external source” is placed a large distance away from a thin 
plate of transparent material, say glass. We inquire about the field at a large 
distance on the opposite side of the plate. The situation is illustrated by the 
diagram of Fig. 31-1, where S and P are imagined to be very far away from the 
plate. According to the principles we have stated earlier, an electric field anywhere 
that is far from all moving charges is the (vector) sum of the fields produced by the 
external source (at S) and the fields produced by each of the charges in the plate 
of glass, every one with its proper retardation at the velocity c. Remember that the 
contribution of each charge is not changed by the presence of the other charges. 
These are our basic principles. The field at P can be written thus: 


or 


E — ^ ^ E eac h charge 

all charges 

E = E s + 'y ] E eac h charges 

all other charges 


(31.1) 

(31.2) 


31-1 The index of refraction 

31-2 The field due to the material 

31-3 Dispersion 

31-4 Absorption 

31-5 The energy carried by an 
electric wave 

31-6 Diffraction of light by a screen 



Fig. 31-1. Electric waves passing 
through a layer of transparent material. 


where E s is the field due to the source alone and would be precisely the field at 
P if there were no material present. We expect the field at P to be different from 
E 5 if there are any other moving charges. 

Why should there be charges moving in the glass? We know that all material 
consists of atoms which contain electrons. When the electric field of the source acts 
on these atoms it drives the electrons up and down, because it exerts a force on the 
electrons. And moving electrons generate a field — they constitute new radiators. 
These new radiators are related to the source S , because they are driven by the 
field of the source. The total field is not just the field of the source *S, but it is 
modified by the additional contribution from the other moving charges. This 
means that the field is not the same as the one which was there before the glass 
was there, but is modified, and it turns out that it is modified in such a way that 
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the field inside the glass appears to be moving at a different speed. That is the idea 
which we would like to work out quantitatively. 

Now this is, in the exact case, pretty complicated, because although we have 
said that all the other moving charges are driven by the source field, that is not 
quite true. If we think of a particular charge, it feels not only the source, but like 
anything else in the world, it feels all of the charges that are moving. It feels, in 
particular, the charges that are moving somewhere else in the glass. So the total 
field which is acting on a particular charge is a combination of the fields from the 
other charges, whose motions depend on what this particular charge is doing! You 
can see that it would take a complicated set of equations to get the complete and 
exact formula. It is so complicated that we postpone this problem until next year. 

Instead we shall work out a very simple case in order to understand all the 
physical principles very clearly. We take a circumstance in which the effects from 
the other atoms is very small relative to the effects from the source. In other words, 
we take a material in which the total field is not modified very much by the motion 
of the other charges. That corresponds to a material in which the index of refraction 
is very close to 1, which will happen, for example, if the density of the atoms is 
very low. Our calculation will be valid for any case in which the index is for any 
reason very close to 1. In this way we shall avoid the complications of the most 
general, complete solution. 

Incidentally, you should notice that there is another effect caused by the motion 
of the charges in the plate. These charges will also radiate waves back toward the 
source S. This backward-going field is the light we see reflected from the surfaces 
of transparent materials. It does not come from just the surface. The backward 
radiation comes from everywhere in the interior, but it turns out that the total effect 
is equivalent to a reflection from the surfaces. These reflection effects are beyond 
our approximation at the moment because we shall be limited to a calculation for a 
material with an index so close to 1 that very little light is reflected. 
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Fig. 31—2. Relation between refrac- 
tion and velocity change. 


Before we proceed with our study of how the index of refraction comes about, 
we should understand that all that is required to understand refraction is to under- 
stand why the apparent wave velocity is different in different materials. The 
bending of light rays comes about just because the effective speed of the waves is 
different in the materials. To remind you how that comes about we have drawn 
in Fig. 31-2 several successive crests of an electric wave which arrives from a 
vacuum onto the surface of a block of glass. The arrow perpendicular to the wave 
crests indicates the direction of travel of the wave. Now all oscillations in the wave 
must have the same frequency, (We have seen that driven oscillations have the 
same frequency as the driving source.) This means, also, that the wave crests for 
the waves on both sides of the surface must have the same spacing along the surface 
because they must travel together, so that a charge sitting at the boundary will 
feel only one frequency. The shortest distance between crests of the wave, however, 
is the wavelength which is the velocity divided by the frequency. On the vacuum 
side it is X 0 = 27rc/a>, and on the other side it is X = 27rv/a)Ov2irc/a)n,ifv = c/n 
is the velocity of the wave. From the figure we can see that the only way for the 
waves to “fit” properly at the boundary is for the waves in the material to be 
travelling at a different angle with respect to the surface. From the geometry of 
the figure you can see that for a “fit” we must have X 0 /sin 6 0 = X/sin 0, or 
sin 0o/sin 0 = «, which is Snell’s law. We shall, for the rest of our discussion, 
consider only why light has an effective speed of c/n in material of index n , and 
no longer worry, in this chapter, about the bending of the light direction. 


We go back now to the situation shown in Fig. 31—1. We see that what we 
have to do is to calculate the field produced at P by all the oscillating charges in 
the glass plate. We shall call this part of the field E a , and it is just the sum written 
as the second term in Eq. (31.2). When we add it to the term E s , due to the source, 
we will have the total field at P. 
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This is probably the most complicated thing that we are going to do this year, 
but it is complicated only in that there are many pieces that have to be put to- 
gether; each piece, however, is very simple. Unlike other derivations where we 
say, “Forget the derivation, just look at the answer!,” in this case we do not 
need the answer so much as the derivation. In other words, the thing to under- 
stand now is the physical machinery for the production of the index. 

To see where we are going, let us first find out what the “correction field” 
E a would have to be if the total field at P is going to look like radiation from the 
source that is slowed down while passing through the thin plate. If the plate had 
no effect on it, the field of a wave travelling to the right (along the z-axis) would be 

E s = £ 0 cosw(i — z/c) (31.3) 

or, using the exponential notation, 

E s = Eoe Mt ~ zlc) . (31.4) 

Now what would happen if the wave travelled more slowly in going through 
the plate? Let us call the thickness of the plate A z. If the plate were not there the 
wave would travel the distance A z in the time A z/c. But if it appears to travel at 
the speed c/n then it should take the longer time nAzjc or the additional time 
At = (n — 1) A z/c. After that it would continue to travel at the speed c again. 
We can take into account the extra delay in getting through the plate by replacing 
t in Eq. (31.4) by ( t — At) or by [/—(« — 1) A z/c]. So the wave after insertion 
of the plate should be written 

lifter plate ^0^ • Vyl.Jj 

We can also write this equation as 

•^aftcr plate & 5 

which says that the wave after the plate is obtained from the wave which could 
exist without the plate, i, e., from E Si by multiplying by the factor “a*/*. 

Now we know that multiplying an oscillating function like e %<at by a factor e l9 just 
says that we change the phase of the oscillation by the angle 6 , which is, of course, 
what the extra delay in passing through the thickness A z has done. It has retarded 
the phase by the amount co(« — 1) A zjc (retarded, because of the minus sign in 
the exponent). 

We have said earlier that the plate should add a field E a to the original field 
E s — E 0 e iu{t ~ zlc) 9 but we have found instead that the effect of the plate is to 
multiply the field by a factor which shifts its phase. However, that is really all right 
because we can get the same result by adding a suitable complex number. It is 
particularly easy to find the right number to add in the case that A z is small, for 
you will remember that if x is a small number then e x is nearly equal to (1 + x). 
We can write, therefore, 

e -«(n-l)A./c = j _ io) ( n _ 1)Az / c . (31.7) 

Using this equality in Eq. (31.6), we have 

£aftor plate = - ? “ ( « ~ 1 ) ^ E 0 e Mt ~* IC \ (31 .8) 

% X ' 

The first term is just the field from the source, and the second term must just be 
equal to E a , the field produced to the right of the plate by the oscillating charges 
of the plate— expressed here in terms of the index of refraction n , and depending, 
of course, on the strength of the wave from the source. 


What we have been doing is easily visualized if we look at the complex number 
diagram in Fig. 31-3. We first draw the number E s (we chose some values for z 
and t so that E s comes out horizontal, but this is not necessary). The delay due to 
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Fig. 31-3. Diagram for the trans- 
mitted wave at a particular t and z. 



slowing down in the plate would delay the phase of this number, that is, it would 
rotate E s through a negative angle. But this is equivalent to adding the small 
vector E a at roughly right angles to E s . But that is just what the factor - / means 
in the second term of Eq. (31.8). It says that if E s is real, then E a is negative 
imaginary or that, in general, E s and E a make a right angle. 

31-2 The field due to the material 

We now have to ask: Is the field E a obtained in the second term of Eq. (31.8) 
the kind we would expect from oscillating charges in the plate? If we can show 
that it is, we will then have calculated what the index n should be! [Since n is the 
only nonfundamental number in Eq. (31.8).] We turn now to calculating what 
field E a the charges in the material will produce. (To help you keep track of the 
many symbols we have used up to now, and will be using in the rest of our calcula- 
tion, we have put them all together in Table 31-1.) 

Table 31-1 

Symbols used in the calculations 

E a = field from the source 
Ea = field produced by charges in the plate 
A z = thickness of the plate 
z = perpendicular distance from the plate 
n — index of refraction 
co = frequency (angular) of the radiation 
N = number of charges per unit volume in the plate 
V = number of charges per unit area of the plate 
q e = charge on an electron 
m = mass of an electron 

coo = resonant frequency of an electron bound in an atom 

If the source 5 (of Fig. 31-1) is far off to the left, then the field E, will have 
the same phase everywhere on the plate, so we can write that in the neighborhood 
of the plate 

E e = Eae^ 1 -!* ( 31 .9) 

Right at the plate, where z = 0, we will have 

E > = E 0 e l °“ (at the plate) (31.10) 

Each of the electrons in the atoms of the plate will feel this electric field and 
will be driven up and down (we assume the direction of E 0 is vertical) by the electric 
force qE. To find what motion we expect for the electrons, we will assume that the 
atoms are little oscillators, that is, that the electrons are fastened elastically to the 
atoms, which means that if a force is applied to an electron its displacement from 
its normal position will be proportional to the force. 

You may think that this is a funny model of an atom if you have heard about 
electrons whirling around in orbits. But that is just an oversimplified picture. 
The correct picture of an atom, which is given by the theory of wave mechanics, 
says that, so far as problems involving light are concerned, the electrons behave as 
though they were held by springs. So we shall suppose that the electrons have a 
linear restoring force which, together with their mass m, makes them behave like 
little oscillators, with a resonant frequency uq. We have already studied such os- 
cillators, and we know that the equation of their motion is written this way: 

m = F ’ 

where F is the driving force. 
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For our problem, the driving force comes from the electric field of the wave 
from the source, so we should use 

F = q.E. = q e E 0 e iut , (31.12) 


where q e is the electric charge on the electron and for E s we use the expression 
E s — E 0 e i<at from (31.10). Our equation of motion for the electron is then 


m 


d 2 x 

dF 


+ 


2 1 
0)qX j 


qeE 0 e" 


(31.13) 


We have solved this equation before, and we know that the solution is 


iwt 

x = xoe , 


where, by substituting in (31.13), we find that 

q e Eo 




m( co 2 — co 2 ) 


so that 


x __ QeEp iut 

m( co 2 — co 2 ) 


(31.14) 


(31.15) 


(31.16) 


We have what we needed to know — the motion of the electrons in the plate. And 
it is the same for every electron, except that the mean position (the “zero” of the 
motion) is, of course, different for each electron. 

Now we are ready to find the field E a that these atoms produce at the point P 9 
because we have already worked out (at the end of Chapter 30) what field is pro- 
duced by a sheet of charges that all move together. Referring back to Eq. (30.19), 
we see that the field E a at P is just a negative constant times the velocity of the 
charges retarded in time the amount z/c. Differentiating x in Eq. (31.16) to get 
the velocity, and sticking in the retardation [or just putting x 0 from (31.15) 
into (30.18)] yields 


E a 


We ; <IeEo ioo(t—z!c) 

r /CO — € 

2«oC L m(u> 2 — co 2 ) 


(31.17) 


Just as we expected, the driven motion of the electrons produced an extra wave 
which travels to the right (that is what the factor e l<ait ~ zlc) says), and the amplitude 
of this wave is proportional to the number of atoms per unit area in the plate 
(the factor rj) and also proportional to the strength of the source field (the factor 
E 0 ). Then there are some factors which depend on the atomic properties (q ef m, 
and co 0 ), as we should expect. 

The most important thing, however, is that this formula (31.17) for E a looks 
very much like the expression for E a that we got in Eq. (31.8) by saying that the 
original wave was delayed in passing through a material with an index of refraction 
n. The two expressions will, in fact, be identical if 


(n - 


1) Az = 


We 

2eo/w(o> 2 — co 2 ) 


(31.18) 


Notice that both sides are proportional to Az, since 77 , which is the number of 
atoms per unit area , is equal to N AZ, where N is the number of atoms per unit 
volume of the plate. Substituting N AZ for 7 ] and cancelling the Az, we get our main 
result, a formula for the index of refraction in terms of the properties of the atoms 
of the material — and of the frequency of the light: 


n = 


1 + 


Nq 2 e 

2e 0 w(wo - « 2 ) 


(31.19) 


This equation gives the “explanation” of the index of refraction that we wished to 
obtain. 
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31-3 Dispersion 

Notice that in the above process we have obtained something very interesting. 
For we have not only a number for the index of refraction which can be computed 
from the basic atomic quantifies, but we have also learned how the index of 
refraction should vary with the frequency « of the light. This is something we 
would never understand from the simple statement that “light travels slower in a 
transparent material.” We still have the problem, of course, of knowing how many 
atoms per unit volume there are, and what is their natural frequency co 0 . We do 
not know this just yet, because it is different for every different material, and we 
cannot get a general theory of that now. Formulation of a general theory of the 
properties of different substances — their natural frequencies, and so on — is 
possible only with quantum atomic mechanics. Also, different materials have 
different properties and different indexes, so we cannot expect, anyway, to get a 
general formula for the index which will apply to all substances. 

However, we shall discuss the formula we have obtained, in various possible 
circumstances. First of all, for most ordinary gases (for instance, for air, most 
colorless gases, hydrogen, helium, and so on) the natural frequencies of the electron 
oscillators correspond to ultraviolet light. These frequencies are higher than the 
frequencies of visible light, that is, a>o is much larger than co of visible light, and to 
a first approximation, we can disregard w 2 in comparison with ojq. Then we find 
that the index is nearly constant. So for a gas, the index is nearly constant. This 
is also true for most other transparent substances, like glass. If we look at our 
expression a little more closely, however, we notice that as u> rises, taking a little 
bit more away from the denominator, the index also rises. So n rises slowly with 
frequency. The index is higher for blue light than for red light. That is the reason 
why a prism bends the light more in the blue than in the red. 

The phenomenon that the index depends upon the frequency is called the 
phenomenon of dispersion, because it is the basis of the fact that light is “dispersed” 
by a prism into a spectrum. The equation for the index of refraction as a function 
of frequency is called a dispersion equation. So we have obtained a dispersion equa- 
tion. (In the past few years “dispersion equations” have been finding a new use in 
the theory of elementary particles.) 

Our dispersion equation suggests other interesting effects. If we have a 
natural frequency w 0 which lies in the visible region, or if we measure the index 
of refraction of a material like glass in the ultraviolet, where w gets near «o, we 
see that at frequencies very close to the natural frequency the index can get enor- 
mously large, because the denominator can go to zero. Next, suppose that w is 
greater than w 0 . This would occur, for example, if we take a material like glass, 
say, and shine x-ray radiation on it. In fact, since many materials which are opaque 
to visible light, like graphite for instance, are transparent to x-rays, we can also 
talk about the index of refraction of carbon for x-rays. All the natural frequencies 
of the carbon atoms would be much lower than the frequency we are using in the 
x-rays, since x-ray radiation has a very high frequency. The index of refraction is 
that given by our dispersion equation if we set « 0 equal to zero (we neglect a>o in 
comparison with to 2 ). 

A similar situation would occur if we beam radiowaves (or light) on a gas of 
free electrons. In the upper atmosphere electrons are liberated from their atoms by 
ultraviolet light from the sun and they sit up there as free electrons. For free 
electrons w 0 = 0 (there is no elastic restoring force). Setting « 0 = 0 in our disper- 
sion equation yields the correct formula for the index of refraction for radiowaves 
in the stratosphere, where N is now to represent the density of free electrons (num- 
ber per unit volume) in the stratosphere. But let us look again at the equation, if 
we beam x-rays on matter, or radiowaves (or any electric waves) on free electrons 
the term (w 2 — w 2 ) becomes negative, and we obtain the result that n is less than 
one. That means that the effective speed of the waves in the substance is faster 
than cl Can that be correct? 

It is correct. In spite of the fact that it is said that you cannot send signals 
any faster than the speed of light, it is nevertheless true that the index of refraction 

of materials at a particular frequency can be either greater or less than 1. This 
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just means that the phase shift which is produced by the scattered light can be 
either positive or negative. It can be shown, however, that the speed at which you 
can send a signal is not determined by the index at one frequency, but depends on 
what the index is at many frequencies. What the index tells us is the speed at which 
the nodes (or crests) of the wave travel. The node of a wave is not a signal by itself. 
In a perfect wave, which has no modulations of any kind, i.e., which is a steady 
oscillation, you cannot really say when it “starts,” so you cannot use it for a timing 
signal. In order to send a signal you have to change the wave somehow, make a 
notch in it, make it a little bit fatter or thinner. That means that you have to 
have more than one frequency in the wave, and it can be shown that the speed at 
which signals travel is not dependent upon the index alone, but upon the way that 
the index changes with the frequency. This subject we must also delay (until 
Chapter 48). Then we will calculate for you the actual speed of signals through 
such a piece of glass, and you will see that it will not be faster than the speed of 
light, although the nodes, which are mathematical points, do travel faster than 
the speed of light. 

Just to give a slight hint as to how that happens, you will note that the real 
difficulty has to do with the fact that the responses of the charges are opposite to 
the field, i.e., the sign has gotten reversed. Thus in our expression for x (Eq. 31.16) 
the displacement of the charge is in the direction opposite to the driving field, 
because (w 2 — co 2 ) is negative for small w 0 - The formula says that when the 
electric field is pulling in one direction, the charge is moving in the opposite direc- 
tion. 

How does the charge happen to be going in the opposite direction? It certainly 
does not start off in the opposite direction when the field is first turned on. When 
the motion first starts there is a transient, which settles down after awhile, and 
only then is the phase of the oscillation of the charge opposite to the driving field. 
And it is then that the phase of the transmitted field can appear to be advanced 
with respect to the source wave. It is this advance in phase which is meant when 
we say that the “phase velocity” or velocity of the nodes is greater than c. In 
Fig. 31-4 we give a schematic idea of how the waves might look for a case where 
the wave is suddenly turned on (to make a signal). You will see from the diagram 
that the signal (i.e., the start of the wave) is not earlier for the wave which ends up 
with an advance in phase. 


Wove with no 
moteriol 

E 

(b) 

Transmitted wove 
with n»l 

Fig. 31-4. Wave “signals.” 

(c) 

Transmitted wove 
with n« l 


Let us now look again at our dispersion equation. We should remark that 
our analysis of the refractive index gives a result that is somewhat simpler than you 
would actually find in nature. To be completely accurate we must add some 
refinements. First, we should expect that our model of the atomic oscillator should 
have some damping force (otherwise once started it would oscillate forever, and 
we do not expect that to happen). We have worked out before (Eq. 23.8) the 
motion of a damped oscillator and the result is that the denominator in Eq. (31.16), 
and therefore in (31.19), is changed from (coq — co 2 ) to (cojj — w 2 + *Ta>), where 
7 is the damping coefficient. 

We need a second modification to take into account the fact that there are 
several resonant frequencies for a particular kind of atom. It is easy to fix up our 
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dispersion equation by imagining that there are several different kinds of oscil- 
lators, but that each oscillator acts separately, and so we simply add the contri- 
butions of all the oscillators. Let us say that there are N k electrons per unit of 
volume, whose natural frequency is w k and whose damping factor is 7*. We 
would then have for our dispersion equation 


n = 


1 + 


ge Nk 

2t ° m k <4 - CO 2 + iy kU 


(31.20) 


We have, finally, a complete expression which describes the index of refraction that 
is observed for many substances.* The index described by this formula varies with 
frequency roughly like the curve shown in Fig. 31-5. 

You will note that so long as co is not too close to one of the resonant frequen- 
cies, the slope of the curve is positive. Such a positive slope is called “normal” 
dispersion (because it is clearly th$ most common occurrence). Very near the 
resonant frequencies, however, there is a small range of co’s for which the slope is 
negative. Such a negative slope is often referred to as “anomalous” (meaning 
abnormal) dispersion, because it seemed unusual when it was first observed, long 
before anyone even knew there were kuch things as electrons. From our point of 
view both slopes are quite “normal”! 


31-4 Absorption 

Perhaps you have noticed something a little strange about the last form 
(Eq. 31.20) we obtained for our dispersion equation. Because of the term /7 we 
put in to take account of damping, the index of refraction is now a complex 
number l What does that mean? By working out what the real and imaginary parts 
of n are we could write 

n = n' - in", (31.21) 

where ri and n" are real numbers. (We use the minus sign in front of the in" 
because then n ff will turn out to be a positive number, as you can show for yourself.) 

We can see what such a complex index means by going back to Eq. (31.6), 
which is the equation of the wave after it goes through a plate of material with an 
index n. If we put our complex n into this equation, and do some rearranging, we 
get 

Rafter plate = e™"** 1 * (31 . 22) 

A B 

The last factors, marked B in Eq. (31.22), are just the form we had before, and 
again describe a wave whose phase has been delayed by the angle a >(«' - 1) Az/c 
in traversing the material. The first term (A) is new and is an exponential factor 
with a real exponent, because there were two fs that cancelled. Also, the exponent 
is negative, so the factor is a real number less than one. It describes a decrease 
in the magnitude of the field and, as we should expect, by an amount which is 
more the larger A z is. As the wave goes through the material, it is weakened. The 
material is “absorbing” part of the wave. The wave comes out the other side with 
less energy. We should not be surprised at this, because the damping we put in 
for the oscillators is indeed a friction force and must be expected to cause a loss 
of energy. We see that the imaginary part n ,f of a complex index of refraction 
represents an absorption (or “attenuation”) of the wave. In fact, n ff is sometimes 
referred to as the “absorption index.” 

We may also point out that an imaginary part to the index n corresponds to 
bending the arrow E a in Fig. 31-3 toward the origin. It is clear why the transmitted 
field is then decreased. 


* Actually, although in quantum mechanics Eq. (31.20) is still valid, its interpretation 
is somewhat different. In quantum mechanics even an atom with one electron, like 
hydrogen, has several resonant frequencies. Therefore N k is not really the number of 
electrons having the frequency a> k , but is replaced instead by Nf kj where N is the number 
of atoms per unit volume and f k (called the oscillator strength) is a factor that tells how 
strongly the atom exhibits each of its resonant frequencies co*. 
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Normally, for instance as in glass, the absorption of light is very small. 
This is to be expected from our Eq. (31.20), because the imaginary part of the 
denominator, /Y fc co, is much smaller than the term (a ? k — co 2 ). But if the light fre- 
quency co is very close to c o k then the resonance term (co* — co 2 ) can become small 
compared with iy k o) and the index becomes almost completely imaginary. The 
absorption of the light becomes the dominant effect. It is just this effect that gives 
the dark lines in the spectrum of light which we receive from the sun. The light 
from the solar surface has passed through the sun’s atmosphere (as well as the 
earth’s), and the light has been strongly absorbed at the resonant frequencies of 
the atoms in the solar atmosphere. 

The observation of such spectral lines in the sunlight allows us to tell the 
resonant frequencies of the atoms and hence the chemical composition of the sun’s 
atmosphere. The same kind of observations tell us about the materials in the stars. 
From such measurements we know that the chemical elements in the sun and in 
the stars are the same as those we find on the earth. 


31-5 The energy carried by an electric wave 

We have seen that the imaginary part of the index means absorption. We 
shall now use this knowledge to find out how much energy is carried by a light 
wave. We have given earlier an argument that the energy carried by light is 
proportional to E 2 , the time average of the square of the electric field in the wave. 
The decrease in E due to absorption must mean a loss of energy, which would go 
into some friction of the electrons and, we might guess, would end up as heat in 
the material. 

If we consider the light arriving on a unit area, say one square centimeter, of 
our plate in Fig. 31-1, then we can write the following energy equation (if we assume 
that energy is conserved, as we do!): 

Energy in per sec = energy out per sec + work done per sec. (31.23) 

For the first term we can write aE 2 , where a is the as yet unknown constant of 
proportionality which relates the average value of E 2 to the energy being carried. 
For the second term we must include the part from the radiating atoms of the 
material, so we should use a(E 8 + E a ) 2 , or (evaluating the square) a(E 2 + 
2KE a + El). 

All of our calculations have been made for a thin layer of material whose 
index is not too far from 1, so that E a would always be much less than E s (just to 
make the calculations easier). In keeping with our approximat ions, we should, 
therefore, leave out the term £ 2 , because it is much smaller than E s E a . You may 
say: “Then you should leave out E s E a also, because it is mu ch sm aller than E 2 .” 
It is true that E s E a is much smaller than E 2 , but we must keep E s E a or our approxi- 
mation will be the one that would apply if we neglected the presence of the material 
completely! One way of checking that our calculations are consistent is to see that 
we always keep terms which are proportional to N A z, the area density of atoms 
in the material, but we leave out terms which are proportional to (N A z) 2 or any 
higher power of TVAz. Ours is what should be called a “low-density approxi- 
mation.” 

In the same spirit, we might remark that our energy ^equation has neglected 
the energy in the reflected wave. But that is OK because this ternvtqo, is propor- 
tional to (YAz) 2 , since the amplitude of the reflected wave is proportional to 
N Az. 

For the last term in Eq. (31.23) we wish to compute the rate at which the 
incoming wave is doing work on the electrons. We know that work is force times 
distance, so the rate of doing work (also called power) is the force times the veloc- 
ity. It is really F • V, but we do not need to worry about the dot product when the 
velocity and force are along the same direction as they are here (except for a 
possible minus sign). So for each atom we take q e E s v for the average rate of 
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Fig. 31-6. Diffraction by a screen. 


doing work. Since there are TV Az atoms in a unit area, the last term in Eq. (31.23) 
should be TV A zq e E s v . Our energy equation now looks like 

aEf = otE$ + 2 aE s E a + TV Azq e ~E~v. (31,24) 

The E 2 terms cancel, and we have 


2aE s E a = TV Az q e E s v. (31.25) 

We now go back to Eq. (30.19), which tells us that for large z 

Ea = ^§r w(ret by z/c) ( 3L26 ) 

(recalling that r, = N Az). Putting Eq. (31.26) into the left-hand side of (31.25), 
we get 

2 a (at z) • v(ret by z/c). 

However, is, (at z) is E s (at atoms) retarded by zjc. Since the average is inde- 
pendent of time, it is the same now as retarded by z/c, or is E s (at atom) • v, the 
same average that appears on the right-hand side of (31.25). The two sides are 
therefore equal if 

^ = 1, or « = €o c. (31.27) 

We have discovered that if energy is to be conserved, the energy carried in an elec- 
tric wave per unit area and per unit time (or what we have called the intensity) 
must be given by e 0 cE*. If we call the intensity S, we have 

{ intensity ) 

or > = e 0 cW, (31.28) 

energy /area/ time J 

where the bar means the time average . We have a nice bonus result from our theory 
of the refractive index! 


31-6 Diffraction of light by a screen 

It is now a good time to take up a somewhat different matter which we can 
handle with the machinery of this chapter. In the last chapter we said that when 
you have an opaque screen and the light can come through some holes, the distribu- 
tion of intensity the diffraction pattern — could be obtained by imagining instead 
that the holes are replaced by sources (oscillators) uniformly distributed over the 
hole. In other words, the diffracted wave is the same as though the hole were a 
new source. We have to explain the reason for that, because the hole is, of course, 
just where there are no sources, where there are no accelerating charges. 

Let us first ask : What is an opaque screen?” Suppose we have a completely 
opaque screen between a source S and an observer at P, as in Fig. 31-6(a). If the 
screen is “opaque” there is no field at P. Why is there no field there? According 
to the basic principles we should obtain the field at P as the field E s of the source 
delayed, plus the field from all the other charges around. But, as we have seen 
above, the charges in the screen will be set in motion by the field E S9 and these 
motions generate a new field which, if the screen is opaque, must exactly cancel 
the field E s on the back side of the screen. You say: “What a miracle that it bal- 
ances exactly ! Suppose it was not exactly right!” If it were not exactly right (re- 
member that this opaque screen has some thickness), the field toward the rear 
part of the screen would not be exactly zero. So, not being zero, it would set 
into motion some other charges in the material of the screen, and thus make a little 
more field, trying to get the total balanced out. So if we make the screen thick 
enough, there is no residual field, because there is enough opportunity to finally 
get the thing quieted down. In terms of our formulas above we would say that the 
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screen has a large and imaginary index, so the wave is absorbed exponentially as it 
goes through. You know, of course, that a thin enough sheet of the most opaque 
material, even gold, is transparent. 

Now let us see what happens with an opaque screen which has holes in it, as 
in Fig. 3 1— 6(b). What do we expect for the field at P? The field at P can be repre- 
sented as a sum of two parts — the field due to the source S plus the field due to the 
wall, i.e., due to the motions of the charges in the walls. We might expect the 
motions of the charges in the walls to be complicated, but we can find out what 
fields they produce in a rather simple way. 

Suppose that we were to take the same screen, but plug up the holes, as indi- 
cated in part (c) of the figure. We imagine that the plugs are of exactly the same 
material as the wall. Mind you, the plugs go where the holes were in case (b). 
Now let us calculate the field at P. The field at P is certainly zero in case (c), but 
it is also equal to the field from the source plus the field due to all the motions of 
the atoms in the walls and in the plugs. We can write the following equations: 

Case (6): Pat p = Ps T - Pwain 

Case (c): Pat p = 0 = E s + P4 ail + Ppiug> 

where the primes refer to the case where the plugs are in place, but P s is, of course, 
the same in both cases. Now if we subtract the two equations, we get 

Pat P ~ (Pwall Pwall) Pplug* 

Now if the holes are not too small (say many wavelengths across), we would not 
expect the presence of the plugs to change the fields which arrive at the walls except 
possibly for a little bit around the edges of the holes. Neglecting this small effect, 
we can set P wa n = Pwaii and obtain that 

Pat P ~ Pplug- 

We have the result that the field at P when there are holes in a screen (case b ) is the 
same (except for sign) as the field that is produced by that part of a complete opaque 
wall which is located where the holes are! (The sign is not too interesting, since we 
are usually interested in intensity which is proportional to the square of the field.) 
It seems like an amazing backwards-forwards argument. It is, however, not only 
true (approximately for not too small holes), but useful, and is the justification 
for the usual theory of diffraction. 

The field Ppi ug is computed in any particular case by remembering that the 
motion of the charges everywhere in the screen is just that which will cancel out 
the field P s on the back of the screen. Once we know these motions, we add the 
radiation fields at P due just to the charges in the plugs. 

We remark again that this theory of diffraction is only approximate, and will 
be good only if the holes are not too small. For holes which are too small the 
Ppiug term will be small and then the difference between Pwaii and Pwaii (which 
difference we have taken to be zero) may be comparable to or larger than the small 
Ppiug term, and our approximation will no longer be valid. 
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Radiation Ramping . Light Scattering 


32-1 Radiation resistance 

In the last chapter we learned that when a system is oscillating, energy is 
carried away, and we deduced a formula for the energy which is radiated by an 
oscillating system. If we know the electric field, then the average of the square 
of the field times e 0 c is the amount of energy that passes per square meter per 
second through a surface normal to the direction in which the radiation is going: 

5 = e 0 c{E 2 ). (32.1) 

Any oscillating charge radiates energy; for instance, a driven antenna radiates 
energy. If the system radiates energy, then in order to account for the conservation 
of energy we must find that power is being delivered along the wires which lead 
into the antenna. That is, to the driving circuit the antenna acts like a resistance , 
or a place where energy can be “lost” (the energy is not really lost, it is really radi- 
ated out, but so far as the circuit is concerned, the energy is lost). In an ordinary 
resistance, the energy which is “lost” passes into heat; in this case the energy which 
is “lost” goes out into space. But from the standpoint of circuit theory, without 
considering where the energy goes, the net effect on the circuit is the same — energy 
is “lost” from that circuit. Therefore the antenna appears to the generator as 
having a resistance, even though it may be made with perfectly good copper. 
In fact, if it is well built it will appear as almost a pure resistance, with very little 
inductance or capacitance, because we would like to radiate as much energy as 
possible out of the antenna. This resistance that an antenna shows is called the 
radiation resistance . 

If a current I is going to the antenna, then the average rate at which power is 
delivered to the antenna is the average of the square of the current times the re- 
sistance. The rate at which power is radiated by the antenna is proportional to 
the square of the current in the antenna, of course, because all the fields are 
proportional to the currents, and the energy liberated is proportional to the square 
of the field. The coefficient of proportionality between radiated power and (I 2 ) 
is the radiation resistance. 

An interesting question is, what is this radiation resistance due to? Let us 
take a simple example: let us say that currents are driven up and down in an an- 
tenna. We find that we have to put work in, if the antenna is to radiate energy. 
If we take a charged body and accelerate it up and down it radiates energy; if it 
were not charged it would not radiate energy. It is one thing to calculate from the 
conservation of energy that energy is lost, but another thing to answer the question, 
against what force are we doing the work? That is an interesting,and very difficult 
question which has never been completely and satisfactorily answered for electrons, 
although it has been for antennas. What happens is this: in an antenna, the fields 
produced by the moving charges in one part of the antenna react on the moving 
charges in another part of the antenna. We can calculate these forces and find 
out how much work they do, and so find the right rule for the radiation resistance. 
When we say “We can calculate — ” that is not quite right — we cannot, because we 
have not yet studied the laws of electricity at short distances; only at large distances 
do we know what the electric field is. We saw the formula (28.3), but at present 
it is too complicated for us to calculate the fields inside the wave zone. Of course, 
since conservation of energy is valid, we can calculate the result all right without 
knowing the fields at short distances. (As a matter of fact, by using this argument 
backwards it turns out that one can find the formula for the forces at short 
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distances only by knowing the field at very large distances, by using the laws of 
conservation of energy, but we shall not go into that here.) 

The problem in the case of a single electron is this: if there is only one charge, 
what can the force act on? It has been proposed, in the old classical theory, that 
the charge was a little ball, and that one part of the charge acted on the other part. 
Because of the delay in the action across the tiny electron, the force is not exactly 
in phase with the motion. That is, if we have the electron standing still, we know 
that “action equals reaction.” So the various internal forces are equal, and there 
is no net force. But if the electron is accelerating, then because of the time delay 
across it, the force which is acting on the front from the back is not exactly the 
same as the force on the back from the front, because of the delay in the effect. 
This delay in the timing makes for a lack of balance, so, as a net effect, the thing 
holds itself back by its bootstraps! This model of the origin of the resistance to 
acceleration, the radiation resistance of a moving charge, has run into many 
difficulties, because our present view of the electron is that it is not a “little ball”; 
this problem has never been solved. Nevertheless we can calculate exactly, of 
course, what the net radiation resistance force must be, i.e., how much loss there 
must be when we accelerate a charge, in spite of not knowing directly the mecha- 
nism of how that force works. 



32-2 The rate of radiation of energy 

Now we shall calculate the total energy radiated by an accelerating charge. 
To keep the discussion general, we shall take the case of a charge accelerating any 
which way, but nonrelativistically. At a moment when the acceleration is, say, 
vertical, we know that the electric field that is generated is the charge multiplied 
by the projection of the retarded acceleration, divided by the distance. So we know 
the electric field at any point, and we therefore know the square of the electric 
field and thus the energy e 0 c£ 2 leaving through a unit area per second. 

The quantity e 0 c appears quite often in expressions involving radiowave prop- 
agation. Its reciprocal is called the impedance of a vacuum , and it is an easy number 
to remember: it has the value l/e Q c = 377 ohms. So the power in watts per square 
meter is equal to the average of the field squared, divided by 377. 

Using our expression (29.1) for the electric field, we find that 


5 = 


q 2 a’ 2 sin 2 fl 
I67r 2 e 0 /* 2 c 3 


(32.2) 


is the power per square meter radiated in the direction 6. We notice that it goes 
inversely as the square of the distance, as we said before. Now suppose we wanted 
the total energy radiated in all directions: then we must integrate (32.2) over all 
directions. First we multiply by the area, to find the amount that flows within a 
little angle dd (Fig. 32-1). We need the area of a spherical section. The way 
to think of it is this: if r is the radius, then the width of the annular segment is 
r dd , and the circumference is 27rr sin 6 , because r sin 6 is the radius of the circle. 
So the area of the little piece of the sphere is 27rr sin 6 times r dd: 


Fig. 32-1. The area of a spherical 
segment is 27rr sin d * r dd. 


dA = 2irr 2 sin d dd . 


(32.3) 


By multiplying the flux [(32.2), the power per square meter] by the area in square 
meters included in the small angle dd , we find the amount of energy that is liberated 
in this direction between d and d + dd\ then we integrate that over all the angles 
d from 0 to 180°: 

p - 1 s" < 3i4 > 


By writing sin’* d — (1 — cos 2 d) sin d it is not hard to show that Jo sin 3 d dd = 
4/3. Using that fact, we finally get 


P = 


2 /2 
q a 


6tt6 0 c 3 


(32.5) 
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This expression deserves some remarks. First of all, since the vector a' had a 
certain direction, the a' 2 in (32.5) would be the square of the vector a', that is, 
a' * a', the length of the vector, squared. Secondly, the flux (32.2) was calculated 
using the retarded acceleration ; that is, the acceleration at the time at which the 
energy now passing through the sphere was radiated. We might like to say that 
this energy was in fact liberated at this earlier time. This is not exactly true; it is 
only an approximate idea. The exact time when the energy is liberated canhot be 
defined precisely. All we can really calculate precisely is what happens in a complete 
motion, like an oscillation or something, where the acceleration finally ceases. 
Then what we find is that the total energy flux per cycle is the average of accelera- 
tion squared, for a complete cycle. This is what should really appear in (32.5). 
Or, if it is a motion with an acceleration that is initially and finally zero, then the 
total energy that has flown out is the time integral of (32.5). 

To illustrate the consequences of formula (32.5) when we have an oscillating 
system, let us see what happens if the displacement x of the charge is oscillating 
so that the acceleration a is The average of the acceleration squared 

over a cycle (remember that we have to be very careful when we square things that 
are written in complex notation—it really is the cosine, and the average of cos 2 cor 
is one-half) thus is 

<* /2 

Therefore 

P — ~ 

127re 0 c 3 

The formulas we are now discussing are relatively advanced and more or 
less modern; they date from the beginning of the twentieth century, and they are 
very famous. Because of their historical value, it is important for us to be able 
to read about them in older books. In fact, the older books also used a system of 
units different from our present mks system. However, all these complications can 
be straightened out in the final formulas dealing with electrons by the following 
rule: The quantity q 2 /Aire 0 , where q e is the electronic charge (in coulombs), has, 
historically, been written as e 2 . It is very easy to calculate that e in the mks system 
is numerically equal to 1.5188 X 10” 14 , because we know that, numerically, 
q e = 1.60206 X 10" 19 and l/47re 0 = 8.98748 X 10 9 . Therefore we shall often 
use the convenient abbreviation 

= (32.7) 

47T€o 

If we use the above numerical value of e in the older formulas and treat them as 
though they were written in mks units, we will get the right numerical results. 
For example, the older form of (32.5) is P = §e 2 a 2 /c 3 . Again, the potential energy 
of a proton and an electron at distance r is ql/Airt^r or e 2 /r, with e = 1.5188 X 
10” 14 mks. 

32-3 Radiation damping 

Now the fact that an oscillator loses a certain energy would mean that if we 
had a charge on the end of a spring (or an electron in an atom) which has a natural 
frequency « 0 , and we start it oscillating and let it go, it will not oscillate forever, 
even if it is in empty space millions of miles from anything. There is no oil, no 
resistance, in an ordinary sense; no “viscosity.” But nevertheless it will not 
oscillate, as we might once have said, “forever,” because if it is charged it is 
radiating energy, and therefore the oscillation will slowly die out. How slowly? 
What is the Q of such an oscillator, caused by the electromagnetic effects, the 
so-called radiation resistance or radiation damping of the oscillator? The Q of any 
oscillating system is the total energy content of the oscillator at any time divided 
by the energy loss per radian: 

v dW/d<t> 


> = £« 4 *o 


2 4 2 

q x o 
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Or (another way to write it), since dW/d<j> = (dfV/dt)/(d<t>/dt) = (dW/dt)/u, 

Q = dW/dt ' (32 ' 8) 

If for a given Q this tells us how the energy of the oscillation dies out, dW/dt = 
-(a>/Q)W } which has the solution W - W 0 e~^ tlQ if W 0 is the initial energy 
(at t = 0). 

To find the Q for a radiator, we go back to (32.8) and use (32.6) for dW/du 
Now what do we use for the energy W of the oscillator? The kinetic energy 
of the oscillator is \mv 2 , and the mean kinetic energy is mo) 2 x 1/4. But we remember 
that for the total energy of an oscillator, on the average half is kinetic and half is 
potential energy, and so we double our result, and find for the total energy of the 
oscillator 

W = %mo> 2 xl (32.9) 

What do we use for the frequency in our formulas? We use the natural frequency 
co 0 because, for all practical purposes, that is the frequency at which our atom is 
radiating, and for m we use the electron mass m e . Then, making the necessary 
divisions and cancellations, the formula comes down to 


1 _ 47 re 2 

Q 3 Xm e c 2 


(32.10) 


(In order to see it better and in a more historical form we write it using our ab- 
breviation ql/Airto = e 2 , and the factor o> 0 /c which was left over has been written 
as 27 r/X.) Since Q is dimensionless, the combination e 2 /m e c 2 must be a property 
only of the electron charge and mass, an intrinsic property of the electron, and it 
must be a length . It has been given a name, the classical electron radius , because the 
early atomic models, which were invented to explain the radiation resistance on 
the basis of the force of one part of the electron acting on the other parts, all needed 
to have an electron whose dimensions were of this general order of magnitude. 
However, this quantity no longer has the significance that we believe that the elec- 
tron really has such a radius. Numerically, the magnitude of the radius is 


r 0 = -Z-5 = 2.82 X 10 -1S m. 

m e c 2 


(32.11) 


Now let us actually calculate the Q of an atom that is emitting light— let us 
say a sodium atom. For a sodium atom, the wavelength is roughly 6000 angstroms, 
in the yellow part of the visible spectrum, and this is a typical wavelength. Thus 

Q = 4^ ~ 5 X 107 ’ (32 ’ 12) 

so the Q of an atom is of the order 10 8 . This means that an atomic oscillator will 
oscillate for 10 8 radians or about 10 7 oscillations, before its energy falls by a factor 
1/e. The frequency of oscillation of light corresponding to 6000 angstroms, 
v = c/X, is on the order of 10 15 cycles/sec, and therefore the lifetime, the time it 
takes for the energy of a radiating atom to die out by a factor 1 /e, is on the order 
of 10 -8 sec. In ordinary circumstances, freely emitting atoms usually take about 
this long to radiate. This is valid only for atoms which are in empty space, not 
being disturbed in any way. If the electron is in a solid and it has to hit other atoms 
or other electrons, then there are additional resistances and different damping. 

The effective resistance term 7 in the resistance law for the oscillator can be 
found from the relation \/Q = 7/w 0 , and we remember that the size of 7 deter- 
mines how wide the resonance curve is (Fig. 23-2). Thus we have just computed 
the widths of spectral lines for freely radiating atoms! Since X = 2irc/«, we find 
that 

AX = lire Aco/co 2 = 2ircl/ui% = 2irc/Qu 0 
= x/e = 4tt/-o/3 = 1.18 X 10~ 14 m. 
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32-4 Independent sources 

In preparation for our second topic, the scattering of light, we must now discuss 
a certain feature of the phenomenon of interference that we neglected to discuss 
previously. This is the question of when interference does not occur. If we have 
two sources Si and S 2 , with amplitudes A i and A 2 , and we make an observation 
in a certain direction in which the phases of arrival of the two signals are $ i and <t> 2 
(a combination of the actual time of oscillation and the delayed time, depending on 
the position of observation), then the energy that we receive can be found by 
compounding the two complex number vectors Ai and A 2i one at angle <f> x and 
the other at angle <t> 2 (as we did in Chapter 30) and we find that the resultant energy 
is proportional to 

A% = A x -T A 2 T - 2A x A 2 cos (<£i — <j> 2 ). (32.14) 

Now if the cross term 2 A X A 2 cos (<£1 — </> 2 ) were not there, then the total energy 
that would be received in a given direction would simply be the sum of the energies, 
A? + A 2 , that would be liberated by each source separately, which is what we 
usually expect. That is, the combined intensity of light shining on something 
from two sources is the sum of the intensities of the two lights. On the other hand, 
if we have things set just right and we have a cross term, it is not such a sum, because 
there is also some interference. If there are circumstances in which this term is of 
no importance, then we would say the interference is apparently lost. Of course, 
in nature it is always there, but we may not be able to detect it. 

Let us consider some examples. Suppose, first, that the two sources are 
7,000,000,000 wavelengths apart, not an impossible arrangement. Then in a given 
direction it is true that there is a very definite value of these phase differences. But, 
on the other hand, if we move just a hair in one direction, a few wavelengths, which 
is no distance at all (our eye already has a hole in it that is so large that we are 
averaging the effects over a range very wide compared with one wavelength) then 
we change the relative phase, and the cosine changes very rapidly. If we take the 
average of the intensity over a little region, then the cosine, which goes plus, minus, 
plus, minus, as we move around, averages to zero. 

So if we average over regions where the phase varies very rapidly with position, 
we get no interference. 

Another example. Suppose that the two sources are two independent radio 
oscillators — not a single oscillator being fed by two wires, which guarantees that 
the phases are kept together, but two independent sources — and that they are not 
precisely tuned at the same frequency (it is very hard to make them at exactly the 
same frequency without actually wiring them together). In this case we have what 
we call two independent sources. Of course, since the frequencies are not exactly 
equal, although they started in phase, one of them begins to get a little ahead of 
the other, and pretty soon they are out of phase, and then it gets still further ahead, 
and pretty soon they are in phase again. So the phase difference between the two 
is gradually drifting with time, but if our observation is so crude that we cannot 
see that little time, if we average over a much longer time, then although the 
intensity swells and falls like what we call “beats” in sound, if these swellings and 
fallings are too rapid for our equipment to follow, then again this term averages out. 

In other words, in any circumstance in which the phase shift averages out, we 
get no interference! 

One finds many books which say that two distinct light sources never interfere. 
This is not a statement of physics, but is merely a statement of the degree of sensi- 
tivity of the technique of the experiments at the time the book was written. What 
happens in a light source is that first one atom radiates, then another atom radiates, 
and so forth, and we have just seen that atoms radiate a train of waves only for 
about 10~ 8 sec; after 10“ 8 sec, some atom has probably taken over, then another 
atom takes over, and so on. So the phases can really only stay the same for about 
10~ 8 sec. Therefore, if we average for very much more than 10“ 8 sec, we do not 
see an interference from two different sources, because they cannot hold their 
phases steady for longer than 10 -8 sec. With photocells, very high-speed detection 
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Scattered radiation 
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Fig. 32-2. A beam of radiation falls 
on an atom and causes the charges 
(electrons) in the atom to move. The 
moving electrons in turn radiate in 
various directions. 


is possible, and one can show that there is an interference which varies with time, 
up and down, in about 10 8 sec. But most detection equipment, of course, does 
not look at such fine time intervals, and thus sees no interference. Certainly with 
the eye, which has a tenth-of-a-second averaging time, there is no chance whatever 
of seeing an interference between two different ordinary sources. 

Recently it has become possible to make light sources which get around this 
effect by making all the atoms emit together in time. The device which does this is 
a very complicated thing, and has to be understood in a quantum-mechanical way. 
It is called a laser , and it is possible to produce from a laser a source in which the 
interference frequency, the time at which the phase is kept constant, is very much 
longer than 10 8 sec. It can be of the order of a hundredth, a tenth, or even one 
second, and so, with ordinary photocells, one can pick up the frequency between 
two different lasers. One can easily detect the pulsing of the beats between two 
laser sources. Soon, no doubt, someone will be able to demonstrate two sources 
shining on a wall, in which the beats are so slow that one can see the wall get bright 
and dark! 

Another case in which the interference averages out is that in which, instead of 
having only two sources, we have many. In this case, we would write the expression 
for Ag as the sum of a whole lot of amplitudes, complex numbers, squared, and 
we would get the square of each one, all added together, plus cross terms between 
every pair, and if the circumstances are such that the latter average out, then there 
will be no effects of interference. It may be that the various sources are located in 
such random positions that, although the phase difference between A 2 and A z is 
also definite, it is very different from that between A x and A 2 , etc. So we would 
get a whole lot of cosines, many plus, many minus, all averaging out. 

So it is that in many circumstances we do not see the effects of interference, 
but see only a collective, total intensity equal to the sum of all the intensities. 


32-5 Scattering of light 

The above leads us to an effect which occurs in air as a consequence of the 
irregular positions of the atoms. When we were discussing the index of refraction, 
we saw that an incoming beam of light will make the atoms radiate again. The 
electric field of the incoming beam drives the electrons up and down, and they 
radiate because of their acceleration. This scattered radiation combines to give a 
beam in the same direction as the incoming beam, but of somewhat different 
phase, and this is the origin of the index of refraction. 

But what can we say about the amount of re-radiated light in some other direc- 
tion? Ordinarily, if the atoms are very beautifully located in a nice pattern, it is 
easy to show that we get nothing in other directions, because we are adding a lot 
of vectors with their phases always changing, and the result comes to zero. But 
if the objects are randomly located, then the total intensity in any direction is the 
sum of the intensities that are scattered by each atom, as we have just discussed. 
Furthermore, the atoms in a gas are in actual motion, so that although the relative 
phase of two atoms is a definite amount now, later the phase would be quite differ- 
ent, and therefore each cosine term will average out. Therefore, to find out how 
much light is scattered in a given direction by a gas, we merely study the effects of 
one atom and multiply the intensity it radiates by the number of atoms. 

Earlier, we remarked that the phenomenon of scattering of light of this nature 
is the origin of the blue of the sky. The sunlight goes through the air, and when 
we look to one side of the sun say at 90° to the beam — we see blue light; what 
we now have to calculate is how much light we see and why it is blue . 

If the incident beam has the electric field E = E 0 ^ w ‘ at the point where the 
atom is located, we know that an electron in the atom will vibrate up and down in 
response to this E (Fig. 32-2). From Eq. (23.8), the amplitude will be 

* Q e En 

x = ~ 7 2 2 : • (32.15) 

m(u o - w 2 + iw 7) 
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We could include the damping and the possibility that the atom acts like several 
oscillators of different frequency and sum over the various frequencies, but for 
simplicity let us just take one oscillator and neglect the damping. Then the response 
to the external electric field, which we have already used in the calculation of the 
index of refraction, is simply 


m( — co 2 ) 


(32.16) 


We could now easily calculate the intensity of light that is emitted in various 
directions, using formula (32.2) and the acceleration corresponding to the above x. 

Rather than do this, however, we shall simply calculate the total amount of 
light scattered in all directions, just to save time. The total amount of light energy 
per second, scattered in all directions by the single atom, is of course given by 
Eq. (32.7) So, putting together the various pieces and regrouping them, we get 


P = [(qW /l2Te 0 c 3 )q 2 e Eo/ m 2 (a> 2 - coo) 2 ] 

= Geoc4)(W3)(?Vl6x 2 €XcV7(co 2 - cog) 2 ] 

= (ie 0 c4)(8ir/-g/3)[co 4 /(« 2 - cog) 2 ] (32.17) 

for the total scattered power, radiated in all directions. 

We have written the result in the above form because it is then easy to re- 
member: First, the total energy that is scattered is proportional to the square of 
the incident field. What does that mean? Obviously, the square of the incident 
field is proportional to the energy which is coming in per second. In fact, the energy 
incident per square meter per second is e 0 c times the average (E 2 ) of the square of 
the electric field, and if E 0 is the maximum value of E , then (E 2 ) — £ E q. In 
other words, the total energy scattered is proportional to the energy per square 
meter that comes in ; the brighter the sunlight that is shining in the sky, the brighter 
the sky is going to look. 

Next, what fraction of the incoming light is scattered? Let us imagine a “tar- 
get” with a certain area, let us say <r, in the beam (not a real, material target, be- 
cause this would diffract light, and so on; we mean an imaginary area drawn in 
space). The total amount of energy that would pass through this surface cr in 
a given circumstance is proportional both to the incoming intensity and to cr, 
and would be 

P = ae 0 cE 2 0 )<r. (32.18) 

Now we invent an idea: we say that the atom scatters a total amount of in- 
tensity which is the amount which would fall on a certain geometrical area, and 
we give the answer by giving that area. That answer, then, is independent of the 
incident intensity; it gives the ratio of the energy scattered to the energy incident 
per square meter. In other words, the ratio 


total energy scattered per second 

: — — : is an area . 

energy incident per square meter per second 

The significance of this area is that, if all the energy that impinged on that area 
were to be spewed in all directions, then that is the amount of energy that would 
be scattered by the atom. 

This area is called a cross section for scattering ; the idea of cross section is 
used constantly, whenever some phenomenon occurs in proportion to the intensity 
of a beam. In such cases one always describes the amount of the phenomenon 
by saying what the effective area would have to be to pick up that much of the beam. 
It does not mean in any way that this oscillator actually has such an area. If there 
were nothing present but a free electron shaking up and down there would be no 
area directly associated with it, physically. It is merely a way of expressing the 
answer to a certain kind of problem; it tells us what area the incident beam would 
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have to hit in order to account for that much energy coming off. Thus, for our case, 


87T/o co 4 

(co 2 - coo) 2 


(32.19) 


(the subscript s is for “scattering”). 

Let us look at some examples. First, if we go to a very low natural frequency 
oo o, or to completely unbound electrons, for which co 0 = 0, then the frequency co 
cancels out and the cross section is a constant. This low-frequency limit, or the 
free electron cross section, is known as the Thompson scattering cross section . 
It is an area whose dimensions are approximately 10~ 15 meter, more or less, on a 
side, i.e., 10 -30 square meter, which is rather small! 

On the other hand, if we take the case of light in the air, we remember that for 
air the natural frequencies of the oscillators are higher than the frequency of the 
light that we use. This means that, to a first approximation, we can disregard co 2 
in the denominator, and we find that the scattering is proportional to the fourth 
power of the frequency. That is to say, light which is of higher frequency by, say, 
a factor of two, is sixteen times more intensely scattered, which is a quite sizable 
difference. This means that blue light, which has about twice the frequency of the 
reddish end of the spectrum, is scattered to a far greater extent than red light. Thus 
when we look at the sky it looks that glorious blue that we see all the time! 

There are several points to be made about the above results. One interesting 
question is, why do we ever see the clouds ? Where do the clouds come from? 
Everybody knows it is the condensation of water vapor. But, of course, the 
water vapor is already in the atmosphere before it condenses, so why don’t we see 
it then? After it condenses it is perfectly obvious. It wasn’t there, now it is there. 
So the mystery of where the clouds come from is not really such a childish mystery 
as “Where does the water come from, Daddy?,” but has to be explained. 

We have just explained that every atom scatters light, and of course the water 
vapor will scatter light, too. The mystery is why, when the water is condensed into 
clouds, does it scatter such a tremendously greater amount of light? 

Consider what would happen if, instead of a single atom, we had an agglom- 
erate of atoms, say two, very close together compared with the wavelength of the 
light. Remember, atoms are only an angstrom or so across, while the wavelength 
of light is some 5000 angstroms, so when they form a clump, a few atoms together, 
they can be very close together compared with the wavelength of light. Then 
when the electric field acts, both of the atoms will move together . The electric field 
that is scattered will then be the sum of the two electric fields in phase, i.e., double 
the amplitude that there was with a single atom, and the energy which is scattered 
is therefore four times what it is with a single atom, not twice! So lumps of atoms 
radiate or scatter more energy than they do as single atoms. Our argument that 
the phases are independent is based on the assumption that there is a real and large 
difference in phase between any two atoms, which is true only if they are several 
wavelengths apart and randomly spaced, or moving. But if they are right next to 
each other, they necessarily scatter in phase, and they have a coherent interference 
which produces an increase in the scattering. 

If we have N atoms in a lump, which is a tiny droplet of water, then each one 
will be driven by the electric field in about the same way as before (the effect of 
one atom on the other is not important; it is just to get the idea anyway) and the j 
amplitude of scattering from each one is the same, so the total field which is 
scattered is A- fold increased. The intensity of the light which is scattered is then 
the square, or A 2 -fold, increased. We would have expected, if the atoms were 
spread out in space, only N times as much as 1 , whereas we get N 2 times as much as 
1! That is to say, the scattering of water in lumps of N molecules each is N times 
more intense than the scattering of the single atoms. So as the water agglomerates 
the scattering increases. Does it increase ad infinitum ? No! When does this 
analysis begin to fail? How many atoms can we put together before we cannot 
drive this argument any further? Answer: If the water drop gets so big that from 
one end to the other is a wavelength or so, then the atoms are no longer all in 
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phase because they are too far apart. So as we keep increasing the size of the drop- 
lets we get more and more scattering, until such a time that a drop gets about the 
size of a wavelength, and then the scattering does not increase anywhere nearly as 
rapidly as the drop gets bigger. Furthermore, the blue disappears, because for long 
wavelengths the drops can be bigger, before this limit is reached, than they can 
be for short wavelengths. Although the short waves scatter more per atom than 
the long waves, there is a bigger enhancement for the red end of the spectrum than 
for the blue end when all the drops are bigger than the wavelength, so the color is 
shifted from the blue toward the red. 

Now we can make an experiment that demonstrates this. We can make 
particles that are very small at first, and then gradually grow in size. We use a 
solution of sodium thiosulfate (hypo) with sulphuric acid, which precipitates very 
fine grains of sulphur. As the sulphur precipitates, the grains first start very small, 
and the scattering is a little bluish. As it precipitates more it gets more intense, and 
then it will get whitish as the particles get bigger. In addition, the light which goes 
straight through will have the blue taken out. That is why the sunset is red, of 
course, because the light that comes through a lot of air to the eye has had a lot 
of blue light scattered out, so it is yellow-red. 

Finally, there is one other important feature which really belongs in the next 
chapter, on polarization, but it is so interesting that we point it out now. This is 
that the electric field of the scattered light tends to vibrate in a particular direction. 
The electric field in the incoming light is oscillating in some way, and the driven 
oscillator goes in this same direction, and if we are situated about at right angles 
to the beam, we will see polarized light, that is to say, light in which the electric 
field is going only one way. In general, the atoms can vibrate in any direction at 
right angles to the beam, but if they are driven directly toward or away from us, we 
do not see it. So if the incoming light has an electric field which changes and os- 
cillates in any direction, which we call unpolarized light, then the light which is 
coming out at 90° to the beam vibrates in only one direction! (See Fig. 32-3.) 

There is a substance called polaroid which has the property that when light 
goes through it, only the piece of the electric field which is along one particular 
axis can get through. We can use this to test for polarization, and indeed we 
find the light scattered by the hypo solution to be strongly polarized. 



Fig. 32-3. Illustration of the origin of 
the polarization of radiation scattered 
at right angles to the incident beam. 
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33 


Polarization 


33-1 The electric vector of light 


In this chapter we shall consider those phenomena which depend on the fact 
that the electric field that describes the light is a vector. In previous chapters we 
have not been concerned with the direction of oscillation of the electric field, except 
to note that the electric vector lies in a plane perpendicular to the direction of 
propagation. The particular direction in this plane has not concerned us. We 
now consider those phenomena whose central feature is the particular direction 
of oscillation of the electric field. 

In ideally monochromatic light, the electric field must oscillate at a definite 
frequency, but since the x-component and the y-component can oscillate independ- 
ently at a definite frequency, we must first consider the resultant effect produced 
by superposing two independent oscillations at right angles to each other. What 
kind of electric field is made up of an x-component and a y-component which 
oscillate at the same frequency? If one adds to an x-vibration a certain amount of 
y-vibration at the same phase, the result is a vibration in a new direction in the 
xy-plane. Figure 33-1 illustrates the superposition of different amplitudes for the 
x-vibration and the y-vibration. But the resultants shown in Fig. 33-1 are not the 
only possibilities; in all of these cases we have assumed that the x-vibration and 
the y-vibration are in phase , but it does not have to be that way. It could be that 
the x-vibration and the y-vibration are out of phase. 


33-1 The electric vector of light 

33-2 Polarization of scattered light 

33-3 Birefringence 

33-4 Polarizers 

33-5 Optical activity 

33-6 The intensity of reflected light 

33-7 Anomalous refraction 
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Fig. 33-1. Superposition of x-vibrations and y-vibrations in phase. 


When the x-vibration and the y-vibration are not in phase, the electric field 
vector moves around in an ellipse, and we can illustrate this in a familiar way. If 
we hang a ball from a support by a long string, so that it can swing freely in a 
horizontal plane, it will execute sinusoidal oscillations. If we imagine horizontal 
x- and ^-coordinates with their origin at the rest position of the ball, the ball can 
swing in either the x- or ^-direction with the same pendulum frequency. By 
selecting the proper initial displacement and initial velocity, we can set the ball in 
oscillation along either the x-axis or the y- axis, or along any straight line in the 
xy-plane. These motions of the ball are analogous to the oscillations of the electric 
field vector illustrated in Fig. 33-1. In each instance, since the x-vibrations and 
the y-vibrations reach their maxima and minima at the same time, the x- and y-os- 
cillations are in phase. But we know that the most general motion of the ball 
is motion in an ellipse, which corresponds to oscillations in which the x- and 
y-directions are not in the same phase. The superposition of x- and y-vibrations 
which are not in phase is illustrated in Fig. 33-2 for a variety of angles between the 
phase of the x-vibration and that of the y-vibration. The general result is that the 
electric vector moves around an ellipse. The motion in a straight line is a particular 
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case corresponding to a phase difference of zero (or an integral multiple of 7r); 
motion in a circle corresponds to equal amplitudes with a phase difference of 90° 
(or any odd integral multiple of 7 t/ 2). 

In Fig. 33-2 we have labeled the electric field vectors in the x- and y-directions 
with complex numbers, which are a convenient representation in which to express 
the phase difference. Do not confuse the real and imaginary components of the 
complex electric vector in this notation with the x- and y-coordinates of the field. 
The x- and y-coordinates plotted in Fig. 33-1 and Fig. 33-2 are actual electric 
fields that we can measure. The real and imaginary components of a complex 
electric field vector are only a mathematical convenience and have no physical 
significance. 
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Fig. 33-2. Superposition of x-vibrations and y-vibrations with equal amplitudes but various relative 
phases. The components E x and E y are expressed in both real and complex notations. 


Now for some terminology. Light is linearly polarized (sometimes called 
plane polarized) when the electric field oscillates on a straight line; Fig. 33-1 
illustrates linear polarization. When the end of the electric field vector travels in 
an ellipse, the light is elliptically polarized . When the end of the electric field vector 
travels around a circle, we have circular polarization. If the end of the electric 
vector, when we look at it as the light comes straight toward us, goes around 
in a counterclockwise direction, we call it right-hand circular polarization. 
Figure 33-2(g) illustrates right-hand circular polarization, and Fig. 33-2(c) shows 
left-hand circular polarization. In both cases the light is coming out of the paper. 
Our convention for labeling left-hand and right-hand circular polarization is 
consistent with that which is used today for all the other particles in physics which 
exhibit polarization (e.g., electrons). However, in some books on optics the 
opposite conventions are used, so one must be careful. 

We have considered linearly, circularly, and elliptically polarized light, which 
covers everything except for the case of unpolarized light. Now how can the light 
be unpolarized when we know that it must vibrate in one or another of these 
ellipses? If the light is not absolutely monochromatic, or if the x- and y-phases 
are not kept perfectly together, so that the electric vector first vibrates in one 
direction, then in another, the polarization is constantly changing. Remember 
that one atom emits during 10“ 8 sec, and if one atom emits a certain polarization, 
and then another atom emits light with a different polarization, the polarizations 
will change every 10~ 8 sec. If the polarization changes more rapidly than we can 
detect it, then we call the light unpolarized, because all the effects of the polarization 
average out. None of the interference effects of polarization would show up with 
unpolarized light. But as we see from the definition, light is unpolarized only if 
we are unable to find out whether the light is polarized or not. 
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33-2 Polarization of scattered light 

The first example of the polarization effect that we have already discussed is 
the scattering of light. Consider a beam of light, for example from the sun, shining 
on the air. The electric field will produce oscillations of charges in the air, and mo- 
tion of these charges will radiate light with its maximum intensity in a plane normal 
to the direction of vibration of the charges. The beam from the sun is unpolarized, 
so the direction of polarization changes constantly, and the direction of vibration 
of the charges in the air changes constantly. If we consider light scattered at 90°, 
the vibration of the charged particles radiates to the observer only when the 
vibration is perpendicular to the observer’s line of sight, and then light will be 
polarized along the direction of vibration. So scattering is an example of one means 
of producing polarization. 


33-3 Birefringence 

Another interesting effect of polarization is the fact that there are substances 
for which the index of refraction is different for light linearly polarized in one 
direction and linearly polarized in another. Suppose that we had some material 
which consisted of long, nonspherical molecules, longer than they are wide, and 
suppose that these molecules were arranged in the substance with their long axes 
parallel. Then what happens when the oscillating electric field passes through this 
substance? Suppose that because of the structure of the molecule, the electrons 
in the substance respond more easily to oscillations in the direction parallel to the 
axes of the molecules than they would respond if the electric field tries to push 
them at right angles to the molecular axis. In this way we expect a different response 
for polarization in one direction than for polarization at right angles to that direc- 
tion. Let us call the direction of the axes of the molecules the optic axis. When 
the polarization is in the direction of the optic axis the index of refraction is 
different than it would be if the direction of polarization were at right angles to it. 
Such a substance is called birefringent. It has two refrangibilities, i.e., two indexes 
of refraction, depending on the direction of the polarization inside the substance. 
What kind of a substance can be birefringent? In a birefringent substance there 
must be a certain amount of lining up, for one reason or another, of unsymmetrical 
molecules. Certainly a cubic crystal, which has the symmetry of a cube, cannot be 
birefringent. But long needlelike crystals undoubtedly contain molecules that 
are asymmetric, and one observes this effect very easily. 

Let us see what effects we would expect if we were to shine polarized light 
through a plate of a birefringent substance. If the polarization is parallel to the 
optic axis, the light will go through with one velocity; if the polarization is per- 
pendicular to the axis, the light is transmitted with a different velocity. An inter- 
esting situation arises when, say, light is linearly polarized at 45° to the optic axis. 
Now the 45° polarization, we have already noticed, can be represented as a super- 
position of the x- and the y-polarizations of equal amplitude and in phase, as 
shown in Fig. 33-2(a). Since the x- and y-polarizations travel with different 
velocities, their phases change at a different rate as the light passes through the 
substance. So, although at the start the x- and y-vibrations are in phase, inside 
the material the phase difference between x- and y-vibrations is proportional to the 
depth in the substance. As the light proceeds through the material the polarization 
changes as shown in the series of diagrams in Fig. 33-2. If the thickness of the 
plate is just right to introduce a 90° phase shift between the x- and y-polarizations, 
as in Fig. 33-2(c), the light will come out circularly polarized. Such a thickness 
is called a quarter-wave plate, because it introduces a quarter-cycle phase difference 
between the x- and the y-polarizations. If linearly polarized light is sent through 
two quarter-wave plates, it will come out plane-polarized again, but at right angles 
to the original direction, as we can see from Fig. 33-2(e). 

One can easily illustrate this phenomenon with a piece of cellophane. Cello- 
phane is made of long, fibrous molecules, and is not isotropic, since the fibers lie 
preferentially in a certain direction. To demonstrate birefringence we need a 
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CELLOPHANE 



Fig. 33-3. An experimental demon- 
stration of the birefringence of cellophane. 
The electric vectors in the light are indi- 
cated by the dotted lines. The pass axes 
of the Polaroid sheets and optic axes of 
the cellophane are indicated by arrows. 
The incident beam is unpolarized. 


beam of linearly polarized light, and we can obtain this conveniently by passing 
unpolarized light through a sheet of polaroid. Polaroid, which we will discuss 
later in more detail, has the useful property that it transmits light that is linearly 
polarized parallel to the axis of the polaroid with very little absorption, but light 
polarized in a direction perpendicular to the axis of the polaroid is strongly 
absorbed. When we pass unpolarized light through a sheet of polaroid, only that 
part of the unpolarized beam which is vibrating parallel to the axis of the polaroid 
gets through, so that the transmitted beam is linearly polarized. This same property 
of polaroid is also useful in detecting the direction of polarization of a linearly 
polarized beam, or in determining whether a beam is linearly polarized or not. 
One simply passes the beam of light through the polaroid sheet and rotates the 
polaroid in the plane normal to the beam. If the beam is linearly polarized, it will 
not be transmitted through the sheet when the axis of the polaroid is normal to 
the direction of polarization. The transmitted beam is only slightly attenuated 
when the axis of the polaroid sheet is rotated through 90°. If the transmitted in- 
tensity is independent of the orientation of the polaroid, the beam is not linearly 
polarized. 

To demonstrate the birefringence of cellophane, we use two sheets of polaroid, 
as shown in Fig. 33-3. The first gives us a linearly polarized beam which we pass 
through the cellophane and then through the second polaroid sheet, which serves 
to detect any effect the cellophane may have had on the polarized light passing 
through it. If we first set the axes of the two polaroid sheets perpendicular to each 
other and remove the cellophane, no light will be transmitted through the second 
polaroid. If we now introduce the cellophane between the two polaroid sheets, and 
rotate the sheet about the beam axis, we observe that in general the cellophane 
makes it possible for some light to pass through the second polaroid. However, 
there are two orientations of the cellophane sheet, at right angles to each other, 
which permit no light to pass through the second polaroid. These orientations in 
which linearly polarized light is transmitted through the cellophane with no 
effect on the direction of polarization must be the directions parallel and per- 
pendicular to the optic axis of the cellophane sheet. 

We suppose that the light passes through the cellophane with two different 
velocities in these two different orientations, but it is transmitted without changing 
the direction of polarization. When the cellophane is turned halfway between 
these two orientations, as shown in Fig. 33-3, we see that the light transmitted 
through the second polaroid is bright. 

It just happens that ordinary cellophane used in commercial packaging is 
very close to a half-wave thickness for most of the colors in white light. Such a 
sheet will turn the axis of linearly polarized light through 90° if the incident 
linearly polarized beam makes an angle of 45° with the optic axis, so that the beam 
emerging from the cellophane is then vibrating in the right direction to pass through 
the second polaroid sheet. 

If we use white light in our demonstration, the cellophane sheet will be of the 
proper half-wave thickness only for a particular component of the white light, 
and the transmitted beam will have the color of this component. The color trans- 
mitted depends on the thickness of the cellophane sheet, and we can vary the 
effective thickness of the cellophane by tilting it so that the light passes through the 
cellophane at an angle, consequently through a longer path in the cellophane. As 
the sheet is tilted the transmitted color changes. With cellophane of different 
thicknesses one can construct filters that will transmit different colors. These 
filters have the interesting property that they transmit one color when the two 
Polaroid sheets have their axes perpendicular, and the complementary color when 
the axes of the two polaroid sheets are parallel. 

Another interesting application of aligned molecules is quite practical. 
Certain plastics are composed of very long and complicated molecules all twisted 
together. When the plastic is solidified very carefully, the molecules are all twisted 
in a mass, so that there are as many aligned in one direction as another, and so 
the plastic is not particularly birefringent. Usually there are strains and stresses 
introduced when the material is solidified, so the material is not perfectly homo- 
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geneous. However, if we apply tension to a piece of this plastic material, it is as 
if we were pulling a whole tangle of strings, and there will be more strings preferen- 
tially aligned parallel to the tension than in any other direction. So when a stress 
is applied to certain plastics, they become birefringent, and one can see the effects 
of the birefringence by passing polarized light through the plastic. If we examine 
the transmitted light through a polaroid sheet, patterns of light and dark fringes will 
be observed (in color, if white light is used). The patterns move as stress is applied 
to the sample, and by counting the fringes and seeing where most of them are, one 
can determine what the stress is. Engineers use this phenomenon as a means of 
finding the stresses in odd-shaped pieces that are difficult to calculate. 

Another interesting example of a way of obtaining birefringence is by means 
of a liquid substance. Consider a liquid composed of long asymmetric molecules 
which carry a plus or minus average charge near the ends of the molecule, so that 
the molecule is an electric dipole. In the collisions in the liquid the molecules 
will ordinarily be randomly oriented, with as many molecules pointed in one direc- 
tion as in another. If we apply an electric field the molecules will tend to line up, 
and the moment they line up the liquid becomes birefringent. With two polaroid 
sheets and a transparent cell containing such a polar liquid, we can devise an 
arrangement with the property that light is transmitted only when the electric 
field is applied. So we have an electrical switch for light, which is called a Kerr 
cell. This effect, that an electric field can produce birefringence in certain liquids, 
is called the Kerr effect. 


33-4 Polarizers 

So far we have considered substances in which the refractive index is different 
for light polarized in different directions. Of very practical value are those crystals 
and other substances in which not only the index, but also the coefficient of ab- 
sorption, is different for light polarized in different directions. By the same argu- 
ments which supported the idea of birefringence, it is understandable that absorp- 
tion can vary with the direction in which the charges are forced to vibrate in an 
anisotropic substance. Tourmaline is an old, famous example and polaroid is 
another. Polaroid consists of a thin layer of small crystals of herapathite (a salt 
of iodine and quinine), all aligned with their axes parallel. These crystals absorb 
light when the oscillations are in one direction, and they do not absorb appreciably 
when the oscillations are in the other direction. 

Suppose that we send light into a polaroid sheet polarized linearly at an 
angle 8 to the passing direction. What intensity will come through? This incident 
light can be resolved into a component perpendicular to the pass direction which 
is proportional to sin 8 , and a component along the pass direction which is pro- 
portional to cos 8. The amplitude which comes out of the polaroid is only the 
cosine 8 part; the sin 8 component is absorbed. The amplitude which passes 
through the polaroid is smaller than the amplitude which entered, by a factor 
cos 8. The energy which passes through the polaroid, i.e., the intensity of the 
light, is proportional to the square of cos 8 . Cos 2 8 , then, is the intensity transmitted 
when the light enters polarized at an angle 8 to the pass direction. The absorbed 
intensity, of course, is sin 2 8 . 

An interesting paradox is presented by the following situation. We know 
that is is not possible to send a beam of light through two polaroid sheets with 
their axes crossed at right angles. But if we place a third polaroid sheet between 
the first two, with its pass axis at 45° to the crossed axes, some light is transmitted. 
We know that polaroid absorbs light, it does not create anything. Nevertheless, 
the addition of a third polaroid at 45° allows more light to get through. The 
analysis of this phenomenon is left as an exercise for the student. 

One of the most interesting examples of polarization is not in complicated 
crystals or difficult substances, but in one of the simplest and most familiar of 
situations — the reflection of light from a surface. Believe it or not, when light is 
reflected from a glass surface it may be polarized, and the physical explanation of 
this is very simple. It was discovered empirically by Brewster that light reflected 
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Fig. 33-4. Reflection of linearly po- 
larized light at Brewster’s angle. The 
polarization direction is indicated by 
dashed arrows; round dots indicate 
polarization normal to the paper. 



Fig. 33-5. A molecule with a shape 
that is not symmetric when reflected in a 
mirror. A beam of light, linearly polarized 
in the y-direction, falls on the molecule. 


from a surface is completely polarized if the reflected beam and the beam refracted 
into the material form a right angle. The situation is illustrated in Fig. 33-4. If 
the incident beam is polarized in the plane of incidence, there will be no reflection 
at all. Only if the incident beam is polarized normal to the plane of incidence will 
it be reflected. The reason is very easy to understand. In the reflecting material 
the light is polarized transversely, and we know that it is the motion of the charges 
in the material which generates the emergent beam, which we call the reflected 
beam. The source of this so-called reflected light is not simply that the incident 
beam is reflected; our deeper understanding of this phenomenon tells us that the 
incident beam drives an oscillation of the charges in the material, which in turn 
generates the reflected beam. From Fig. 33-4 it is clear that only oscillations normal 
to the paper can radiate in the direction of reflection, and consequently the reflected 
beam will be polarized normal to the plane of incidence. If the incident beam is 
polarized in the plane of incidence, there will be no reflected light. 

This phenomenon is readily demonstrated by reflecting a linearly polarized 
beam from a flat piece of glass. If the glass is turned to present different angles of 
incidence to the polarized beam, sharp attenuation of the reflected intensity is 
observed when the angle of incidence passes through Brewster’s angle. This 
attenuation is observed only if the plane of polarization lies in the plane of incidence. 
If the plane of polarization is normal to the plane of incidence, the usual reflected 
intensity is observed at all angles. 


33-5 Optical activity 

Another most remarkable effect of polarization is observed in materials 
composed of molecules which do not have reflection symmetry: molecules shaped 
something like a corkscrew, or like a gloved hand, or any shape which, if viewed 
through a mirror, would be reversed in the same way that a left-hand glove reflects 
as a right-hand glove. Suppose all of the molecules in the substance are the same, 
i.e., none is a mirror image of any other. Such a substance may show an interesting 
effect called optical activity, whereby as linearly polarized light passes through 
the substance, the direction of polarization rotates about the beam axis. 

To understand the phenomenon of optical activity requires some calculation, 
but we can see qualitatively how the effect might come about, without actually 
carrying out the calculations. Consider an asymmetric molecule in the shape of 
a spiral, as shown in Fig. 33-5. Molecules need not actually be shaped like a 
corkscrew in order to exhibit optical activity, but this is a simple shape which we 
shall take as a typical example of those that do not have reflection symmetry. 
When a light beam linearly polarized along the y-direction falls on this molecule, 
the electric field will drive charges up and down the helix, thereby generating a 
current in the y-direction and radiating an electric field E y polarized in the y-direc- 
tion. However, if the electrons are constrained to move along the spiral, they 
must also move in the x-direction as they are driven up and down. When a current 
is flowing up the spiral, it is also flowing into the paper at 2 = z x and out of the 
paper at z = z x + A, if A is the diameter of our molecular spiral. One might 
suppose that the current in the x-direction would produce no net radiation, since 
the currents are in opposite directions on opposite sides of the spiral. However, 
if we consider the x-components of the electric field arriving at z = z 2 , we see 
that the field radiated by the current at z = z x + A and the field radiated from 
z — Z\ arrive at z 2 separated in time by the amount A/c, and thus separated in 
phase by 7r + c oA/c. Since the phase difference is not exactly 7 r, the two fields 
do not cancel exactly, and we are left with a small x-component in the electric field 
generated by the motion of the electrons in the molecule, whereas the driving 
electric field had only a y-component. This small x-component, added to the large 
y-component, produces a resultant field that is tilted slightly with respect to the 
y-axis, the original direction of polarization. As the light moves through the 
material, the direction of polarization rotates about the beam axis. By drawing a 
few examples and considering the currents that will be set in motion by an incident 
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electric field, one can convince himself that the existence of optical activity and the 
sign of the rotation are independent of the orientation of the molecules. 

Corn syrup is a common substance which possesses optical activity. The 
phenomenon is easily demonstrated with a polaroid sheet to produce a linearly 
polarized beam, a transmission cell containing corn syrup, and a second polaroid 
sheet to detect the rotation of the direction of polarization as the light passes 
through the corn syrup. 


33-6 The intensity of reflected light 

Let us now consider quantitatively the reflection coefficient as a function of 
angle. Figure 33-6(a) shows a beam of light striking a glass surface, where it is 
partly reflected and partly refracted into the glass. Let us suppose that the incident 
beam, of unit amplitude, is linearly polarized normal to the plane of the paper. 
We will call the amplitude of the reflected wave b, and the amplitude of the re- 
fracted wave a . The refracted and reflected waves will, of course, be linearly 
polarized, and the electric field vectors of the incident, reflected, and refracted 
waves are all parallel to each other. Figure 33-6(b) shows the same situation, but 
now we suppose that the incident wave, of unit amplitude, is polarized in the plane 
of the paper. Now let us call the amplitude of the reflected and refracted wave 
B and A , respectively. 

We wish to calculate how strong the reflection is in the two situations illus- 
trated in Fig. 33-6(a) and 33— 6(b). We already know that when the angle between 
the reflected beam and refracted beam is a right angle, there will be no reflected 
wave in Fig. 33-6(b), but let us see if we cannot get a quantitative answer — an 
exact formula for B and b as a function of the angle of incidence, /. 

The principle that we must understand is as follows. The currents that are 
generated in the glass produce two waves. First, they produce the reflected wave. 
Moreover, we know that if there were no currents generated in the glass, the in- 
cident wave would continue straight into the glass. Remember that all the sources 
in the world make the net field. The source of the incident light beam produces a 
field of unit amplitude which would move into the glass along the dotted line in 
the figure. This field is not observed, and therefore the currents generated in the 
glass must produce a field of amplitude — 1 which moves along the dotted line. 
Using this fact, we will calculate the amplitude of the refracted waves, a and A . 

In Fig. 33-6(a) we see that the field of amplitude b is radiated by the motion 
of charges inside the glass which are responding to a field a inside the glass, and 
that therefore b is proportional to a. We might suppose that since our two figures 
are exactly the same, except for the direction of polarization, the ratio B/ A would 
be the same as the ratio b/a . This is not quite true, however, because in Fig. 33-6(b) 
the polarization directions are not all parallel to each other, as they are in Fig. 
33-6(a). It is only the component of A which is perpendicular to B , A cos (z + r), 
which is effective in producing B . The correct expression for the proportionality 
is then 

- = (33.1) 

a A cos (j + r) 


Now we use a trick. We know that in both (a) and (b) of Fig. 33-6 the electric 
field in the glass must produce oscillations of the charges which generate a field of 
amplitude — 1, polarized parallel to the incident beam, and moving in the direction 
of the dotted line. But we see from part (b) of the figure that only the component 
of A that is normal to the dashed line has the right polarization to produce this 
field, whereas in Fig. 33-6(a) the full amplitude a is effective, since the polarization 
of wave a is parallel to the polarization of the wave of amplitude — 1. Therefore 
we can write 

A cos ~ = zl , (33.2) 

a —1 


since the two amplitudes on the left side of Eq. (33.2) each produce the wave of 
amplitude —1. 



Fig. 33-6. An incident wave of unit 
amplitude is reflected and refracted at a 
glass surface. In (a) the incident wave is 
linearly polarized normal to the plane of 
the paper. In (b) the incident wave is 
linearly polarized in the direction shown 
by the dotted electric vector. 
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(33.3) 


Dividing Eq. (33.1) by Eq. (33.2), we obtain 

B _ cos (z + r ) 
b cos (z — r ) 9 

a result which we can check against what we already know. If we set (z + r) = 
90°, Eq. (33.3) gives B = 0, as Brewster says it should be, so our results so far 
are at least not obviously wrong. 

We have assumed unit amplitudes for the incident waves, so that \B\ 2 /\ 2 is 
the reflection coefficient for waves polarized in the plane of incidence, and |6| 2 /1 2 
is the reflection coefficient for waves polarized normal to the plane of incidence. 
The ratio of these two reflection coefficients is determined by Eq. (33.3). 

Now we perform a miracle, and compute not just the ratio, but each coefficient 
|i?| 2 and \b\ 2 individually! We know from the conservation of energy that the energy 
in the refracted wave must be equal to the incident energy minus the energy in 
the reflected wave, 1 — \B\ 2 in one case, 1 — \b\ 2 in the other. Furthermore, the 
energy which passes into the glass in Fig. 33-6(b) is to the energy which passes 
into the glass in Fig. 33-6(a) as the ratio of the squares of the refracted amplitudes, 
\A\ 2 /\a\ 2 . One might ask whether we really know how to compute the energy 
inside the glass, because, after all, there are energies of motion of the atoms in 
addition to the energy in the electric field. But it is obvious that all of the various 
contributions to the total energy will be proportional to the square of the amplitude 
of the electric field. Therefore we can write 


1 - \B\ 2 _ \A\ 2 
1 - \b \ 2 M 2 


(33.4) 


We now substitute Eq. (33.2) to eliminate A/a from the expression above, 
and express B in terms of b by means of Eq. (33.3): 


i - \b\ 


2 cos 2 (z + r) 
cos 2 (z — /•)__ 


1 


1 — \b\ 2 cos 2 (z — r ) ’ (33.5) 

This equation contains only one unknown amplitude, b . Solving for \b \ 2 , we obtain 


and, with the aid of (33.3), 


\b\ 2 


\B\ 2 


sin 2 (z — r ) 
sin 2 (z + r) 

tan 2 (z — r ) 
tan 2 (z + /•) 


(33.6) 

(33.7) 


So we have found the reflection coefficient \b \ 2 for an incident wave polarized 
perpendicular to the plane of incidence, and also the reflection coefficient | B 2 for 
an incident wave polarized in the plane of incidence! 

It is possible to go on with arguments of this nature and deduce that b is real. 
To prove this, one must consider a case where light is coming from both sides of 
the glass surface at the same time, a situation not easy to arrange experimentally, 
but fun to analyze theoretically. If we analyze this general case, we can prove that 
b must be real, and therefore, in fact* that b = ± sin (z — r)/sin (z + /•). It is 
even possible to determine the sign by considering the case of a very, very thin 
layer in which there is reflection from the front and from the back surfaces, and 
calculating how much light is reflected. We know how much light should be 
reflected by a thin layer, because we know how much current is generated, and we 
have even worked out the fields produced by such currents. 

One can show by these arguments that 


sin (z — r) B — ~ tan ^ ^ 

sin (z + r ) 5 tan (z + r) 


(33.8) 


These expressions for the reflection coefficients as a function of the angles of 
incidence and refraction are called Fresnel’s reflection formulas. 

If we consider the limit as the angles z and r go to zero, we find, for the case of 
normal incidence, that B 2 ^ b 2 « (z — r) 2 /{i + r) 2 for both polarizations, 
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since the sines are practically equal to the angles, as are also the tangents. But we 
know that sin //sin r = n, and when the angles are small, i/r ~ n. It is thus easy 
to show that the coefficient of reflection for normal incidence is 


B 2 = b 2 


in - l ) 2 
in + l ) 2 * 


It is interesting to find out how much light is reflected at normal incidence 
from the surface of water, for example. For water, n is 4/3, so that the reflection 
coefficient is (1/7) 2 ~ 2%. At normal incidence, only two percent of the light 
is reflected from the surface of water. 


33-7 Anomalous refraction 


The last polarization effect we shall consider was actually one of the first 
to be discovered: anomalous refraction. Sailors visiting Iceland brought back to 
Europe crystals of Iceland spar (CaC0 3 ) which had the amusing property of mak- 
ing anything seen through the crystal appear doubled, i.e., as two images. This 
came to the attention of Huygens, and played an important role in the discovery 
of polarization. As is often the case, the phenomena which are discovered first are 
the hardest, ultimately, to explain. It is only after we understand a physical concept 
thoroughly that we can carefully select those phenomena which most clearly and 
simply demonstrate the concept. 

Anomalous refraction is a particular case of the same birefringence that we 
considered earlier. Anomalous refraction comes about when the optic axis, the 
long axis of our asymmetric molecules, is not parallel to the surface of the crystal. 
In Fig. 33-7 are drawn two pieces of birefringent material, with the optic axis as 
shown. In the upper figure, the incident beam falling on the material is linearly 
polarized in a direction perpendicular to the optic axis of the material. When this 
beam strikes the surface of the material, each point on the surface acts as a source 
of a wave which travels into the crystal with velocity v ± , the velocity of light in 
the crystal when the plane of polarization is normal to the optic axis. The wave- 
front is just the envelope or locus of all these little spherical waves, and this wave- 
front moves straight through the crystal and out the other side. This is just the 
ordinary behavior we would expect, and this ray is called the ordinary ray . 

In the lower figure the linearly polarized light falling on the crystal has its 
direction of polarization turned through 90°, so that the optic axis lies in the plane 
of polarization. When we now consider the little waves originating at any point 
on the surface of the crystal, we see that they do not spread out as spherical waves. 
Light travelling along the optic axis travels with velocity v ± because the polariza- 
tion is perpendicular to the optic axis, whereas the light travelling perpendicular 
to the optic axis travels with velocity ^ because the polarization is parallel to the 
optic axis. In a birefringent material ^ Vj_, and in the figure < v ± . A more 
complete analysis will show that the waves spread out on the surface of an ellipsoid, 
with the optic axis as major axis of the ellipsoid. The envelope of all these elliptical 
waves is the wavefront which proceeds through the crystal in the direction shown. 
Again, at the back surface the beam will be deflected just as it was at the front 
surface, so that the light emerges parallel to the incident beam, but displaced from 
it. Clearly, this beam does not follow Snell’s law, but goes in an extraordinary 
direction. It is therefore called the extraordinary ray. 

When an unpolarized beam strikes an anomalously refracting crystal, it is 
separated into an ordinary ray, which travels straight through in the normal man- 
ner, and an extraordinary ray which is displaced as it passes through the crystal. 
These two emergent rays are linearly polarized at right angles to each other. That 
this is true can be readily demonstrated with a sheet of polaroid to analyze the 
polarization of the emergent rays. We can also demonstrate that our interpretation 
of this phenomenon is correct by sending linearly polarized light into the crystal. 
By properly orienting the direction of polarization of the incident beam, we can 
make this light go straight through without splitting, or we can make it go through 
without splitting but with a displacement. 



Fig. 33-7. The upper diagram shows 
the path of the ordinary ray through a 
doubly refracting crystal. The extraor- 
dinary ray is shown in the lower dia- 
gram. The optic axis lies in the plane of 
the paper. 
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Fig. 33-8. Two oppositely rotating 
vectors of equal amplitude add to produce 
a vector in a fixed direction, but with an 
oscillating amplitude. 


y 



Fig. 33—9. A charge moving in a 
circle in response to circularly polarized 
light. 


We have represented all the various polarization cases in Figs. 33-1 and 33-2 
as superpositions of two special polarization cases, namely x and y in various 
amounts and phases. Other pairs could equally well have been used. Polarization 
along any two perpendicular axes x', / inclined to * and y would serve as well 
[for example, any polarization can be made up of superpositions of cases (a) and 
(e) of Fig. 33-2], It is interesting, however, that this idea can be extended to other 
cases also. For example, any linear polarization can be made up by superposing 
suitable amounts at suitable phases of right and left circular polarizations [cases 
(c) and (g) of Fig. 33-2], since two equal vectors rotating in opposite directions 
add to give a single vector oscillating in a straight line (Fig. 33-8). If the phase of 
one is shifted relative to the other, the line is inclined. Thus all the pictures of 
Fig. 33-1 could be labeled “the superposition of equal amounts of right and left 
circularly polarized light at various relative phases.” As the left slips behind the 
right in phase, the direction of the linear polarization changes. Therefore optically 
active materials are, in a sense, birefringent. Their properties can be described by 
saying that they have different indexes for right- and left-hand circularly polarized 
light. Superposition of right and left circularly polarized light of different intensi- 
ties produces elliptically polarized light. 

Circularly polarized light has another interesting property — it carries angular 
momentum (about the direction of propagation). To illustrate this, suppose that 
such light falls on an atom represented by a harmonic oscillator that can be dis- 
placed equally well in any direction in the plane xy. Then the x-displacement of 
the electron will respond to the E x component of the field, while the ^-component 
responds, equally, to the equal E v component of the field but 90° behind in phase. 
That is, the responding electron goes around in a circle, with angular velocity m, 
in response to the rotating electric field of the light (Fig. 33-9). Depending on 
the damping characteristics of the response of the oscillator, the direction of the 
displacement a of the electron, and the direction of the force q e E on it need not be 
the same but they rotate around together. The E may have a component at right 
angles to a, so work is done on the system and a torque t is exerted. The work done 
per second is rw. Over a period of time T the energy absorbed is tuT, while tT is 
the angular momentum delivered to the matter absorbing the energy. We see 
therefore that a beam of right circularly polarized light containing a total energy 
S carries an angular momentum ( with vector directed along the direction of prop- 
agation) S/m. For when this beam is absorbed that angular momentum is de- 
livered to the absorber. Left-hand circular light carries angular momentum of the 
opposite sign, — S/m. 
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34 


Relativistic Effects in Radiation 


34-1 Moving sources 

In the present chapter we shall describe a number of miscellaneous effects in 
connection with radiation, and then we shall be finished with the classical theory 
of light propagation. In our analysis of light, we have gone rather far and into 
considerable detail. The only phenomena of any consequence associated with 
electromagnetic radiation that we have not discussed is what happens if radiowaves 
are contained in a box with reflecting walls, the size of the box being comparable 
to a wavelength, or are transmitted down a long tube. The phenomena of so-called 
cavity resonators and waveguides we shall discuss later; we shall first use another 
physical example — sound — and then we shall return to this subject. Except for 
this, the present chapter is our last consideration of the classical theory of light. 

We can summarize all the effects that we shall now discuss by remarking that 
they have to do with the effects of moving sources. We no longer assume that the 
source is localized, with all its motion being at a relatively low speed near a fixed 
point. 

We recall that the fundamental laws of electrodynamics say that, at large 
distances from a moving charge, the electric field is given by the formula 


q d 2 e,f 

Aire§c 2 dt 2 


(34.1) 


The second derivative of the unit vector e^ which points in the apparent direction 
of the charge, is the determining feature of the electric field. This unit vector does 
not point toward the present position of the charge, of course, but rather in the 
direction that the charge would seem to be, if the information travels only at the 
finite speed c from the charge to the observer. 

Associated with the electric field is a magnetic field, always at right angles 
to the electric field and at right angles to the apparent direction of the source, 
given by the formula 

B = -e R , X E/c. (34.2) 


Until now we have considered only the case in which motions are nonrela- 
tivistic in speed, so that there is no appreciable motion in the direction of the source 
to be considered. Now we shall be more general and study the case where the mo- 
tion is at an arbitrary velocity, and see what different effects may be expected in 
those circumstances. We shall let the motion be at an arbitrary speed, but of course 
we shall still assume that the detector is very far from the source. 

We already know from our discussion in Chapter 28 that the only things 
that count in d 2 tR>/dt 2 are the changes in the direction of e#'. Let the coor- 
dinates of the charge be (x, y, z), with z measured along the direction of observa- 
tion (Fig. 34-1). At a given moment in time, say the moment r, the three compo- 
nents of the position are x(t), y(r), and z(t). The distance R is very nearly equal 
to R(r) = R 0 + z(r). Now the direction of the vector depends mainly on 
* and y 9 but hardly at all upon z: the transverse components of the unit vector are 
x/R and y/R, and when we differentiate these components we get things like 
R 2 in the denominator: 

d(x/R) _ dx/dt dz x 

dt “ “ R~ dt R 2 ‘ 
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Fig. 34-1. The path of a moving 
charge. The true position at the time 
r is at T, but the retarded position is at A. 
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So, when we are far enough away the only terms we have to worry about are the 
variations of x and y. Thus we take out the factor R 0 and get 


E x = 
Ey = 


Q 


d 2 x r 
4 tt€oC 2 Ro dt 2 

q d 2 / 

47T€oC 2 Rq dt 2 


(34.3) 


where R 0 is the distance, more or less, to q; let us take it as the distance OP to the 
origin of the coordinates ( x , y y z). Thus the electric field is a constant multiplied 
by a very simple thing, the second derivatives of the x - and ^-coordinates. (We 
could put it more mathematically by calling x and y the transverse components of 
the position vector r of the charge, but this would not add to the clarity.) 

Of course, we realize that the coordinates must be measured at the retarded 
time. Here we find that z(r) does affect the retardation. What time is the retarded 
time? If the time of observation is called t (the time at P) then the time r to which 
this corresponds at A is not the time t, but is delayed by the total distance that the 
light has to go, divided by the speed of light. In the first approximation, this delay 
is R 0 /c , a constant (an uninteresting feature), but in the next approximation we 
must include the effects of the position in the z-direction at the time r, because 
if q is a little farther back, there is a little more retardation. This is an effect that 
we have neglected before, and it is the only change needed in order to make our 
results valid for all speeds. 

What we must now do is to choose a certain value of t and calculate the value 
of r from it, and thus find out where x and y are at that r. These are then the 
retarded x and y, which we call x r and y f , whose second derivatives determine the 
field. Thus r is determined by 


and 


t = t + 


Rq | z(t) 
c c 


X'(t) = x(t), /(0 - y(T\ 


(34.4) 


Now these are complicated equations, but it is easy enough to make a geometrical 
picture to describe their solution. This picture will give us a good qualitative feeling 
for how things work, but it still takes a lot of detailed mathematics to deduce 
the precise results of a complicated problem. 


x'(t) 


z 
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Fig. 34-2. A geometrical solution 
Eq. (34.5) to find x'(f). 


of 


34-2 Finding the “apparent" motion 

The above equation has an interesting simplification. If we disregard the un- 
interesting constant delay Rjc, which just means that we must change the origin 
of / by a constant, then it says that 

Ct = CT + z(T ), x' = x(r), y = j<t). (34.5) 

Now we need to find x f and y f as functions of t , not r, and we can do this in the 
following way: Eq. (34.5) says that we should take the actual motion and add a 
constant (the speed of light) times r. What that turns out to mean is shown in 
Fig. 34-2. We take the actual motion of the charge (shown at left) and imagine 
that as it is going around it is being swept away from the point P at the speed c 
(there are no contractions from relativity or anything like that; this is just a mathe- 
matical addition of the cr). In this way we get a new motion, in which the line- 
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of-sight coordinate is ct , as shown at the right. (The figure shows the result for a 
rather complicated motion in a plane, but of course the motion may not be in 
one plane — it may be even more complicated than motion in a plane.) The point 
is that the horizontal (i.e., line-of-sight) distance now is no longer the old z, but 
is z + cr, and therefore is ct. Thus we have found a picture of the curve, x f 
(and /) against t\ All we have to do to find the field is to look at the acceleration 
of this curve, i.e., to differentiate it twice. So the final answer is: in order to find 
the electric field for a moving charge, take the motion of the charge and translate 
it back at the speed c to “open it out”; then the curve, so drawn, is a curve of the 
x f and / positions of the function of t. The acceleration of this curve gives the 
electric field as a function of t . Or, if we wish, we can now imagine that this whole 
“rigid” curve moves forward at the speed c through the plane of sight, so that the 
point of intersection with the plane of sight has the coordinates x f and /. The 
acceleration of this point makes the electric field. This solution is just as exact as 
the formula we started with — it is simply a geometrical representation. 


Fig. 34-3. The x'(t) curve for a 
particle moving at constant speed v = 

O. 94c # a circle. 

If the motion is relatively slow, for instance if we have an oscillator just going 
up and down slowly, then when we shoot that motion away at the speed of light, 
we would get, of course, a simple cosine curve, and that gives a formula we have 
been looking at for a long time: it gives the field produced by an oscillating charge. 
A more interesting example is an electron moving rapidly, very nearly at the 
speed of light, in a circle. If we look in the plane of the circle, the retarded x f (t) 
appears as shown in Fig. 34-3. What is this curve? If we imagine a radius vector 
from the center of the circle to the charge, and if we extend this radial line a little 
bit past the charge, just a shade if it is going fast, then we come to a point on the 
line that goes at the speed of light. Therefore, when we translate the motion back 
at the speed of light, that corresponds to having a wheel with a charge on it rolling 
backward (without slipping) at the speed c; thus we find a curve which is very 
close to a cycloid — it is called a hypocycloid. If the charge is going very nearly at 
the speed of light, the “cusps” are very sharp indeed; if it went at exactly the speed 
of light, they would be actual cusps, infinitely sharp. “Infinitely sharp” is inter- 
esting; it means that near a cusp the second derivative is enormous. Once in each 
cycle we get a sharp pulse of electric field. This is not at all what we would get from 
a nonrelativistic motion, where each time the charge goes around there is an 
oscillation which is of about the same “strength” all the time. Instead, there are 
very sharp pulses of electric field spaced at time intervals 1 /T 0 apart, where T 0 
is the period of revolution. These strong electric fields are emitted in a narrow cone 
in the direction of motion of the charge. When the charge is moving away from 

P, there is very little curvature and there is very little radiated field in the 
direction of P . 

34-3 Synchrotron radiation 

We have very fast electrons moving in circular paths in the synchrotron; they 
are travelling at very nearly the speed c, and it is possible to see the above radiation 
as actual lightl Let us discuss this in more detail. 

In the synchrotron we have electrons which go around in circles in a uniform 
magnetic field. First, let us see why they go in circles. From Eq. (12.10), we know 
that the force on a particle in a magnetic field is given by 




F = q\ X B, 


(34.6) 
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Fig. 34-4. A charged particle moves 
in a circular (or helical) path in a uniform 
magnetic field. 


and it is at right angles both to the field and to the velocity. As usual, the force is 
equal to the rate of change of momentum with time. If the field is directed upward 
out of the paper, the momentum of the particle and the force on it are as shown 
in Fig. 34-4. Since the force is at right angles to the velocity, the kinetic energy, 
and therefore the speed, remains constant. All the magnetic field does is to change 
the direction of motion. In a short time At, the momentum vector changes at right 
angles to itself by an amount Ap — FA t, and therefore p turns through an angle 
AO = A p/p = qvBAt/p, since \F\ = qvB. But in this same time the particle has 
gone a distance As ~ v At. Evidently, the two lines AB and CD will intersect at a 
point O such that OA = OC = R, where As = R AB. Combining this with the 
previous expressions, we find R AB/ At = Rco — v = qvBR/p, from which we find 

P = qBR (34.7) 

and 

a> - qvB/p . (34.8) 

Since this same argument can be applied during the next instant, the next, and so 
on, we conclude that the particle must be moving in a circle of radius R, with angu- 
lar velocity co. 

The result that the momentum of the particle is equal to a charge times the 
radius times the magnetic field is a very important law that is used a great deal. 
It is important for practical purposes because if we have elementary particles which 
all have the same charge and we observe them in a magnetic field, we can measure 
the radii of curvature of their orbits and, knowing the magnetic field, thus deter- 
mine the momenta of the particles. If we multiply both sides of Eq. (34.7) by c, 
and express q in terms of the electronic charge, we can measure the momentum in 
units of the electron volt . In those units our formula is 


/>c(ev) = 3 X l0 8 (q/q e )BR, (34.9) 

where B, R , and the speed of light are all expressed in the mks system, the latter 
being 3 X 10 8 , numerically. 

The mks unit of magnetic field is called a weber per square meter . There is an 
older unit which is still in common use, called a gauss . One weber/m — 2 is equal 
to 10 4 gauss. To give an idea of how big magnetic fields are, the strongest magnetic 
field that one can usually make in iron is about 1.5 X 10 4 gauss; beyond that, the 
advantage of using iron disappears. Today, electromagnets wound with super- 
conducting wire are able to produce steady fields of over 10 5 gauss strength — that 
is, 10 mks units. The field of the earth is a few tenths of a gauss at the equator. 

Returning to Eq. (34.9), we could imagine the synchrotron running at a billion 
electron volts, so pc would be 10 9 for a billion electron volts. (We shall come back 
to the energy in just a moment.) Then, if we had a B corresponding to, say, 10,000 
gauss, which is a good substantial field, one mks unit, then we see that R would 
have to be 3.3 meters. The actual radius of the Caltech synchrotron is 3.7 meters, 
the field is a little bigger, and the energy is 1.5 billion, but it is the same idea. So 
now we have a feeling for why the synchrotron has the size it has. 

We have calculated the momentum, but we know that the total energy, 
including the rest energy, is given by W = yjp^c 2 + m 2 c 4 , and for an electron 
the rest energy corresponding to me 2 is 0.511 X 10 6 ev, so when pc is 10 9 ev we 
can neglect me 2 , and so for all practical purposes W — pc when the speeds are 
relativistic. It is practically the same to say the energy of an electron is a billion 
electron volts as to say the momentum times c is a billion electron volts. If IF — 
10 ev, it is easy to show that the speed differs from the speed of light by but one 
part in eight million! 

We turn now to the radiation emitted by such a particle. A particle moving 
on a circle of radius 3.3 meters, or 20 meters circumference, goes around once in 
roughly the time it takes light to go 20 meters. So the wavelength that should be 
emitted by such a particle would be 20 meters — in the shortwave radio region. 
But because of the piling up effect that we have been discussing (Fig. 34-3), and 
because the distance by which we must extend the radius to reach the speed c is 
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only one part in eight million of the radius, the cusps of the hypocycloid are 
enormously sharp compared with the distance between them. The acceleration, 
which involves a second derivative with respect to time, gets twice the “compression 
factor” of 8 X 10 (i because the time scale is reduced by eight million twice in the 
neighborhood of the cusp. Thus we might expect the effective wavelength to be 
much shorter, to the extent of 64 times 10 12 smaller than 20 meters, and that 
corresponds to the x-ray region. (Actually, the cusp itself is not the entire 
determining factor; one must also include a certain region about the cusp. This 
changes the factor to the 3/2 power instead of the square, but still leaves us above 
the optical region.) Thus, even though a slowly moving electron would have 
radiated 20-meter radiowaves, the relativistic effect cuts down the wavelength so 
much that we can see it! Clearly, the light should be polarized , with the electric 
field perpendicular to the uniform magnetic field. 

To further appreciate what we would observe, suppose that we were to take 
such light (to simplify things, because these pulses are so far apart in time, we 
shall just take one pulse) and direct it onto a diffraction grating, which is a lot of 
scattering wires. After this pulse comes away from the grating, what do we see? 
(We should see red light, blue light, and so on, if we see any light at all.) What 
do we see? The pulse strikes the grating head-on, and all the oscillators in the 
grating, together, are violently moved up and then back down again, just once. 
They then produce effects in various directions, as shown in Fig. 34-5. But the 
point P is closer to one end of the grating than to the other, so at this point the 
electric field arrives first from wire A, next from B , and so on; finally, the pulse 
from the last wire arrives. In short, the sum of the reflections from all the successive 
wires is as shown in Fig. 34-6(a); it is an electric field which is a series of pulses, 
and it is very like a sine wave whose wavelength is the distance between the pulses, 
just as it would be for monochromatic light striking the grating! So, we get colored 
light all right. But, by the same argument, will we not get light from any kind of a 
“pulse”? No. Suppose that the curve were much smoother; then we would add 
all the scattered waves together, separated by a small time between them (Fig. 
34— 6b). Then we see that the field would not shake at all, it would be a very smooth 
curve, because each pulse does not vary much in the time interval between pulses. 

The electromagnetic radiation emitted by relativistic charged particles cir- 
culating in a magnetic field is called synchrotron radiation . It is so named for obvi- 
ous reasons, but it is not limited specifically to synchrotrons, or even to earthbound 
laboratories. It is exciting and interesting that it also occurs in nature! 



Fig. 34-5. The light which strikes a 
grating as a single, sharp pulse is scat- 
tered in various directions as different 
colors. 


(a) 


(b) 


Fig. 34-6. The total electric field due 
to a series of (a) sharp pulses and (b) 
smooth pulses. 



Fig. 34-7. The crab nebula as seen in all colors (no filter). 
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(b) 


Fig. 34-8. The crab nebula as seen 
through a blue filter and a Polaroid, 
(a) Electric vector vertical, (b) Electric 
vector horizontal. 



(a) (b) 


Fig. 34-9. A fast electron passing 
near a nucleus radiates energy in the 
direction of its motion. 


34-4 Cosmic synchrotron radiation 

In the year 1054 the Chinese and Japanese civilizations were among the most 
advanced in the world; they were conscious of the external universe, and they 
recorded, most remarkably, an explosive bright star in that year. (It is amazing 
that none of the European monks, writing all the books of the middle ages, even 
bothered to write that a star exploded in the sky, but they did not.) Today we may 
take a picture of that star, and what we see is shown in Fig. 34-7. On the outside 
is a big mass of red filaments, which is produced by the atoms of the thin gas 
“ringing” at their natural frequencies; this makes a bright line spectrum with 
different frequencies in it. The red happens in this case to be due to nitrogen. On 
the other hand, in the central region is a mysterious, fuzzy patch of light in a 
continuous distribution of frequency, i.e., there are no special frequencies associated 
with particular atoms. Yet this is not dust “lit up” by nearby stars, which is one 
way by which one can get a continuous spectrum. We can see stars through it, so 
it is transparent, but it is emitting light. 

In Fig. 34-8 we look at the same object, using light in a region of the spectrum 
which has no bright spectral line, so that we see only the central region. But in this 
case, also, polarizers have been put on the telescope, and the two views correspond 
to two orientations 90° apart. We see that the pictures are different! That is to say, 
the light is polarized. The reason, presumably, is that there is a local magnetic field, 
and many very energetic electrons are going around in that magnetic field. 

We have just illustrated how the electrons could go around the field in a circle. 
We can add to this, of course, any uniform motion in the direction of the field, since 
the force, qy X B, has no component in this direction and, as we have already 
remarked, the synchrotron radiation is evidently polarized in a direction at right 
angles to the projection of the magnetic field onto the plane of sight. 

Putting these two facts together, we see that in a region where one picture is 
bright and the other one is black, the light must have its electric field completely 
polarized in one direction. This means that there is a magnetic field at right angles 
to this direction, while in other regions, where there is a strong emission in the other 
picture, the magnetic field must be the other way. If we look carefully at Fig. 34-8, 
we may notice that there is, roughly speaking, a general set of “lines” that go one 
way in one picture and at right angles to this in the other. The pictures show a kind 
of fibrous structure. Presumably, the magnetic field lines will tend to extend rela- 
tively long distances in their own direction, and so, presumably, there are long 
regions of magnetic field with all the electrons spiralling one way, while in another 
region the field is the other way and the electrons are also spiralling that way. 

What keeps the electron energy so high for so long a time? After all, it is 900 
years since the explosion — how can they keep going so fast? How they maintain 
their energy and how this whole thing keeps going is still not thoroughly understood. 


34-5 Bremsstrahlung 

We shall nqxt remark briefly on one other interesting effect of a very fast- 
moving particle that radiates energy. The idea is very similar to the one we have 
just discussed. Suppose that there are charged particles in a piece of matter and 
a very fast electron, say, comes by (Fig. 34-9). Then, because of the electric field 
around the atomic nucleus the electron is pulled, accelerated, so that the curve 
of its motion has a slight kink or bend in it. If the electron is travelling at very 
nearly the speed of light, what is the electric field produced in the direction cl 
Remember our rule: we take the actual motion, translate it backwards at speed c, 
and that gives us a curve whose curvature measures the electric field. It was coming 
toward us at the speed y, so we get a backward motion, with the whole picture 
compressed into a smaller distance in proportion as c - v is smaller than c . So, 
if 1 — v/c « 1, there is a very sharp and rapid curvature at B', and when we take 
the second derivative of that we get a very high field in the direction of the motion. 
So when very energetic electrons move through matter they spit radiation in a for- 
ward direction. This is called bremsstrahlung. As a matter of fact, the synchrotron 
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is used, not so much to make high-energy electrons (actually if we could get them 
out of the machine more conveniently we would not say this) as to make very 
energetic photons— gamma rays— by passing the energetic electrons through a solid 
tungsten “target,” and letting them radiate photons from this bremsstrahlung effect. 


34-6 The Doppler effect 

Now we go on to consider some other examples of the effects of moving sources. 
Let us suppose that the source is a stationary atom which is oscillating at one of 
its natural frequencies, co 0 . Then we know that the frequency of the light we would 
observe is w 0 . But now let us take another example, in which we have a similar 
oscillator oscillating with a frequency m,, and at the same time the whole atom, 
the whole oscillator, is moving along in a direction toward the observer at velocity 
v. Then the actual motion in space, of course, is as shown in Fig. 34-10(a). Now 
we play our usual game, we add cr; that is to say, we translate the whole curve 
backward and we find then that it oscillates as in Fig. 34-10(b). In a given amount 
of time r, when the oscillator would have gone a distance vt, on the x'vs.ct diagram 
it goes a distance (c - v)t. So all the oscillations of frequency w r in the time At are 
now found in the interval Ar = (1 - v/c) Ar; they are squashed together, and 
as this curve comes by us at speed c, we will see light of a higher frequency, higher 
by just the compression factor (1 — v/c). Thus we observe 

= (34.10) 

We can, of course, analyze this situation in various other ways. Suppose that 
the atom were emitting, instead of sine waves, a series of pulses, pip, pip, pip, pip, 
at a certain frequency «i. At what frequency would they be received by us? The 
first one that arrives has a certain delay, but the next one is delayed less because in 
the meantime the atom moves closer to the receiver. Therefore, the time between 
the “pips” is decreased by the motion. If we analyze the geometry of the situation, 
we find that the frequency of the pips is increased by the factor 1/(1 — v/c). 

Is co = « 0 /(l — r/c), then, the frequency that would be observed if we took 
an ordinary atom, which had a natural frequency m 0 , and moved it toward the 
receiver at speed vl No; as we well know, the natural frequency wj of a moving 
atom is not the same as that measured when it is standing still, because of the 
relativistic dilation in the rate of passage of time. Thus if w 0 were the true natural 
frequency, then the modified natural frequency «i would be 

(Oi = coo Vl - v 2 /c 2 . (34.11) 

Therefore the observed frequency w is 



Fig. 34-10. The x — z and x' — t 
curves of a moving oscillator. 


_ cop Vl — v 2 /c 2 
1 — v/c 


(34.12) 


The shift in frequency observed in the above situation is called the Doppler 
effect : if something moves toward us the light it emits appears more violet, and if 
it moves away it appears more red. 

We shall now give two more derivations of this same interesting and important 
result. Suppose, now, that the source is standing still and is emitting waves at 
frequency co 0 , while the observer is moving with speed v toward the source. After 
a certain period of time t the observer will have moved to a new position, a distance 
vt from where he was at t = 0. How many radians of phase will he have seen go 
by? A certain number, c o 0 t, went past any fixed point, and in addition the observer 
has swept past some more by his own motion, namely a number vtko (the number 
of radians per meter times the distance). So the total number of radians in the 
time t , or the observed frequency, would be coi = co 0 + k$v. We have made 
this analysis from the point of view of a man at rest; we would like to know how 
it would look to the man who is moving. Here we have to worry again about the 
difference in clock rate for the two observers, and this time that means that we 
have to divide by \/l — v 2 fc l . So if fc 0 is the wave number, the number of radians 
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per meter in the direction of motion, and coo is the frequency, then the observed 
frequency for a moving man is 


= M o + k 0 v 
VI — v*/c 2 


(34.13) 


For the case of light, we know that k 0 = ccq/c. So, in this particular problem, 
the equation would read 


a fa)o(l + v/c) 
VI — v 2 /c 2 


(34.14) 


which looks completely unlike formula (34.12)! Is the frequency that we would 
observe if we move toward a source different than the frequency that we would 
see if the source moved toward us? Of course not! The theory of relativity says 
that these two must be exactly equal. If we Were expert enough mathematicians we 
would probably recognize that these two mathematical expressions are exactly 
equal! In fact, the necessary equality of the two expressions is one of the ways by 
which some people like to demonstrate that relativity requires a time dilation, 
because if we did not put those square-root factors in, they would no longer be 
equal. 

Since we know about relativity, let us analyze it in still a third way, which 
may appear a little more general. (It is really the same thing, since it makes no 
difference how we do it!) According to the relativity theory there is a relationship 
between position and time as observed by one man and position and time as seen 
by another who is moving relative to him. We wrote down those relationships 
long ago (Chapter 16). This is the Lorentz transformation and its inverse: 


, X + vt 

X = , 

X f — V f 

x — 

Vl — v 2 /c 2 

Vl — v 2 /c 2 

t , _ t + vx/c 2 

t - vx'/c 2 

Vl — v 2 /c 2 

Vl — v 2 /c 2 


Ifwe were standing still on the ground, the form of a wave would be cos (c ot — kx); 
all the nodes and maxima and minima would follow this form. But what would a 
man in motion, observing the same physical wave, see? Where the field is zero, the 
positions of all the nodes are the same (when the field is zero , everyone measures 
the field as zero); that is a relativistic invariant. So the form is the same for the 
other man too, except that we must transform it into his frame of reference: 


cos (cat — kx) ~ cos 


t r - vx f /c 2 _ k x f — v f 
Vl ~~ v 2 /c 2 Vl — v 2 /c 2 


If we regroup the terms inside the brackets, we get 


cos (a )t — kx) = cos 
= cos [ 


03 + kv 


t f - 


k + vo)/c 2 


V 1 — v 2 /c 2 vT — v 2 /c 2 

V t' - k' 


*]. (34.16) 


This is again a wave, a cosine wave, in which there is a certain frequency a 
constant multiplying t’, and some other constant, k', multiplying x'. We call k' the 
wave number, or the number of waves per meter, for the other man. Therefore 
the other man will see a new frequency and a new wave number given by 


co + kv 
VI — v 2 /c 2 
k + uv/c 2 
VI — v 2 /c 2 


(34.17) 

(34.18) 


Ifwe look at (34.17), we see that it is the same formula (34.13), that we obtained 
by a more physical argument. 
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34-7 The «, k four-vector 

The relationships indicated in Eqs. (34.17) and (34.18) are very interesting, 
because these say that the new frequency co' is a combination of the old frequency 
co and the old wave number k, and that the new wave number is a combination of 
the old wave number and frequency. Now the wave number is the rate of change 
of phase with distance, and the frequency is the rate of change of phase with time, 
and in these expressions we see a close analogy with the Lorentz transformation of 
the position and time: if co is thought of as being like /, and k is thought of as being 
like x divided by c 2 , then the new co' will be like /', and the new k! will be like 
x'/c 2 . That is to say, under the Lorentz transformation co and k transform the same 
way as do t and x . They constitute what we call a four-vector ; when a quantity has 
four components transforming like time and space, it is a four-vector. Everything 
seems all right, then, except for one little thing: we said that a four-vector has to 
have four components', where are the other two components? We have seen that 
co and k are like time and space in one space direction, but not in all directions, 
and so we must next study the problem of the propagation of light in three space 
dimensions, not just in one direction, as we have been doing up until now. 

Suppose that we have a coordinate system, x, y, z, and a wave which is travel- 
ling along and whose wavefronts are as shown in Fig. 34-11. The wavelength of 
the wave is X, but the direction of motion of the wave does not happen to be in 
the direction of one of the axes. What is the formula for such a wave? The 
answer is clearly cos (c ot — ks ), where k = 2 tt/\ and s is the distance along the 
direction of motion of the wave — the component of the spatial position in the 
direction of motion. Let us put it this way: if r is the vector position of a point 
in space, then s is r • e k , where e k is a unit vector in the direction of motion. 
That is, s is just r cos (r, e*), the component of distance in the direction of motion. 
Therefore our wave is cos (c ot — ke k * r). 

Now it turns out to be very convenient to define a vector k, which is called 
the wave vector , which has a magnitude equal to the wave number, 27r/X, and is 
pointed in the direction of propagation of the waves: 


k = 27rejk/X = ke k . (34.19) 

Using this vector, our wave can be written as cos (cot — k • r), or as cos (cot — 
k x x — k y y — k z z). What is the significance of a component of k , say k x l 
Clearly, k x is the rate of change of phase with respect to x. Referring to Fig. 
34-11, we see that the phase changes as we change x, just as if there were a wave 
along x, but of a longer wavelength . The “wavelength in the x-direction” is longer 
than a natural, true wavelength by the secant of the angle a between the actual 
direction of propagation and the x-axis: 



Fig. 34-1 1 . A plane wave travelling 
in an oblique direction. 


X x — X/COS a. 


(34.20) 


Therefore the rate of change of phase, which is proportional to the reciprocal of 
\ Xi is smaller by the factor cos a; that is just how k x would vary — it would be the 
magnitude of k, times the cosine of the angle between k and the x-axis! 

That, then, is the nature of the wave vector that we use to represent a wave 
in three dimensions. The four quantities co, k x , k v , k z transform in relativity as a 
four-vector, where co corresponds to the time, and k x , k y , k z correspond to the 
x-, y-, and z-components of the four-vector. 

In our previous discussion of special relativity (Chapter 17), we learned that 
there are ways of making relativistic dot products with four-vectors. If we use the 
position vector x„, where p stands for the four components (time and three space 
ones), and if we call the wave vector k^, where the index p again has four values, 
time and three space ones, then the dot product of x M and k ^ is written S'k M x M 
(see Chapter 17). This dot product is an invariant, independent of the coordinate 
system ; what is it equal to ? By the definition of this dot product in four dimensions, 
it is 


S' kfPCp = c ot — k x x — k v y — k z z . 


(34.21) 
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We know from our study of vectors that 27^ is invariant under the Lorentz 
transformation, since k „ is a four-vector. But this quantity is precisely what appears 
inside the cosine for a plane wave, and it ought to be invariant under a Lorentz 
transformation. We cannot have a formula with something that changes inside 
the cosine, since we know that the phase of the wave cannot change when we change 
the coordinate system. 


34-8 Aberration 
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(a) (b) 

Fig. 34-12. A distant source S is 
viewed by (a) a stationary telescope, and 
(b) a laterally moving telescope. 


In deriving Eqs. (34.17) and (34.18), we have taken a simple example where k 
happened to be in a direction of motion, but of course we can generalize it to other 
cases also. For example, suppose there is a source sending out light in a certain 
direction from the point of view of a man at rest, but we are moving along on the 
earth, say (Fig. 34-12). From which direction does the light appear to come? 
To find out, we will have to write down the four components of k M and apply the 
Lorentz transformation. The answer, however, can be found by the following 
argument: we have to point our telescope at an angle to see the light. Why? 
Because light is coming down at the speed c, and we are moving sidewise at the 
speed v, so the telescope has to be tilted forward so that as the light comes down it 
goes “straight” down the tube. It is very easy to see that the horizontal distance 
is vt when the vertical distance is ct, and therefore, if 6 f is the angle of tilt, tan d f = 
v/c. How nice! How nice, indeed— except for one little thing: 6' is not the angle 
at which we would have to set the telescope relative to the earth , because we made 
our analysis from the point of view of a “fixed” observer. When we said the hori- 
zontal distance is vt y the man on the earth would have found a different distance, 
since he measured with a “squashed” ruler. It turns out that, because of that con- 
traction effect, 


which is equivalent to 


tan 6 = 


v/c 

V 1 — v 2 /c 2 


sin 6 = v/c. 


(34.22) 

(34.23) 


It will be instructive for the student to derive this result, using the Lorentz trans- 
formation. 

This effect, that a telescope has to be tilted, is called aberration , and it has been 
observed. How can we observe it? Who can say where a given star should be? 
Suppose we do have to look in the wrong direction to see a star; how do we know 
it is the wrong direction? Because the earth goes around the sun. Today we have 
to point the telescope one way ; six months later we have to tilt the telescope the 
other way. That is how we can tell that there is such an effect. 



Fig. 34-13. The magnetic force on a 
charge which is driven by the electric 
field is in the direction of the light beam. 


34-9 The momentum of light 

Now we turn to a different topic. We have never, in all our discussion of the 
past few chapters, said anything about the effects of the magnetic field that is 
associated with light. Ordinarily, the effects of the magnetic field are very small, 
but there is one interesting and important effect which is a consequence of the 
magnetic field. Suppose that light is coming from a source and is acting on a 
charge and driving that charge up and down. We will suppose that the electric 
field is in the x-direction, so the motion of the charge is also in the ^-direction: it 
has a position x and a velocity v, as shown in Fig. 34-13. The magnetic field is at 
right angles to the electric field. Now as the electric field acts on the charge and 
moves it up and down, what does the magnetic field do? The magnetic field acts 
on the charge (say an electron) only when it is moving; but the electron is moving, 
it is driven by the electric field, so the two of them work together: While the thing 
is going up and down it has a velocity and there is a force on it, B times v times q ; 
but in which direction is this force? It is in the direction of the propagation of light. 
Therefore, when light is shining on a charge and it is oscillating in response to that 
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charge, there is a driving force in the direction of the light beam. This is called 
radiation pressure or light pressure. 

Let us determine how strong the radiation pressure is. Evidently it is F = qvB 
or, since everything is oscillating, it is the time average of this, (F). From (34.2) the 
strength of the magnetic field is the same as the strength of the electric field divided 
by c, so we need to find the average of the electric field, times the velocity, times the 
charge, times 1/c: (F) = q{vE)/c. But the charge q times the field E is the electric 
force on a charge, and the force on the charge times the velocity is the work dW/dt 
being done on the charge! Therefore the force, the “pushing momentum,” that is 
delivered per second by the light, is equal to 1/c times the energy absorbed from the 
light per second! That is a general rule, since we did not say how strong the oscilla- 
tor was, or whether some of the charges cancel out. In any circumstance where light 
is being absorbed , there is a pressure . The momentum that the light delivers is al- 
ways equal to the energy that is absorbed, divided by c: 


{F)= dWld i 


(34.24) 


That light carries energy we already know. We now understand that it also 
carries momentum , and further, that the momentum carried is always 1/c times 
the energy. 

When light is emitted from a source there is a recoil effect: the same thing in 
reverse. If an atom is emitting an energy W in some direction, then there is a 
recoil momentum p = W/c. If light is reflected normally from a mirror, we get 
twice the force. 

That is as far as we shall go using the classical theory of light. Of course we 
know that there is a quantum theory, and that in many respects light acts like a 
particle. The energy of a light-particle is a constant times the frequency: 


W - hv = ha> (34.25) 

We now appreciate that light also carries a momentum equal to the energy divided 
by c, so it is also true that these effective particles, these photons , carry a momentum 

p - W/c = ho)f c = hk. (34.26) 

The direction of the momentum is, of course, the direction of propagation of the 
light. So, to put it in vector form, 

W = ftw, p = hk. (34.27) 

We also know, of course, that the energy and momentum of a particle should form 
a four-vector. We have just discovered that a) and k form a four-vector. Therefore 
it is a good thing that (34.27) has the same constant in both cases; it means that the 
quantum theory and the theory of relativity are mutually consistent. 

Equation (34.27) can be written more elegantly as = hk ^ a relativistic 
equation, for a particle associated with a wave. Although we have discussed this 
only for photons, for which k (the magnitude of k) equals co/c and p = W/c, the 
relation is much more general. In quantum mechanics all particles, not only 
photons, exhibit wavelike properties, but the frequency and wave number of the 
waves is related to the energy and momentum of particles by (34.27) (called the 
deBroglie relations) even when p is not equal to W/c. 

In the last chapter we saw that a beam of right or left circularly polarized 
light also carries angular momentum in an amount proportional to the energy £ of 
the wave. In the quantum picture, a beam of circularly polarized light is regarded 
as a stream of photons, each carrying an angular momentum =*=h along the direc- 
tion of propagation. That is what becomes of polarization in the corpuscular point 
of view — the photons carry angular momentum like spinning rifle bullets. But this 
“bullet” picture is really as incomplete as the “wave” picture, and we shall have 
to discuss these ideas more fully in a later chapter on Quantum Behavior. 
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35 


Color Vision 


35-1 The human eye 


The phenomenon of colors depends partly on the physical world. We discuss 
the colors of soap films and so on as being produced by interference. But also, 
of course, it depends on the eye, or what happens behind the eye, in the brain. 
Physics characterizes the light that enters the eye, but after that, our sensations are 
the result of photochemical-neural processes and psychological responses. 

There are many interesting phenomena associated with vision which involve 
a mixture of physical phenomena and physiological processes, and the full appreci- 
ation of natural phenomena, as we see them, must go beyond physics in the usual 
sense. We make no apologies for making these excursions into other fields, because 
the separation of fields, as we have emphasized, is merely a human convenience, 
and an unnatural thing. Nature is not interested in our separations, and many of 
the interesting phenomena bridge the gaps between fields. 

In Chapter 3 we have already discussed the relation of physics to the other 
sciences in general terms, but now we are going to look in some detail at a specific 
field in which physics and other sciences are very, very closely interrelated. That 
area is vision. In particular, we shall discuss color vision. In the present chapter 
we shall discuss mainly the observable phenomena of human vision, and in the 
next chapter we shall consider the physiological aspects of vision, both in man and 
in other animals. 

It all begins with the eye; so, in order to understand what phenomena we see, 
some knowledge of the eye is required. In the next chapter we shall discuss in 
some detail how the various parts of the eye work, and how they are interconnected 
with the nervous system. For the present, we shall describe only briefly how the 
eye functions (Fig. 35-1). 

Light enters the eye through the cornea ; we have already discussed how it is 
bent and is imaged on a layer called the retina in the back of the eye, so that different 
parts of the retina receive light from different parts of the visual field outside. The 
retina is not absolutely uniform: there is a place, a spot, in the center of our field 
of view which we use when we are trying to see things very carefully, and at which we 
have the greatest acuity of vision ; it is called the fovea or macula. The side parts of 
the eye, as we can immediately appreciate from our experience in looking at things, 
are not as effective for seeing detail as is the center of the eye. There is also a spot 
in the retina where the nerves carrying all the information run out; that is a blind 
spot. There is no sensitive part of the retina here, and it is possible to demonstrate 
that if we close, say, the left eye and look straight at something, and then move a 
finger or another small object slowly out of the field of view it suddenly disappears 
somewhere. The only practical use of this fact that we know of is that some physiol- 
ogist became quite a favorite in the court of a king of France by pointing this out 
to him; in the boring sessions that he had with his courtiers, the king could amuse 
himself by “cutting off their heads” by looking at one and watching another’s 
head disappear. 

Figure 35-2 shows a magnified view of the inside of the retina in somewhat 
schematic form. In different parts of the retina there are different kinds of struc- 
tures. The objects that occur more densely near the periphery of the retina are 
called rods . Closer to the fovea, we find, besides these rod cells, also cone cells. 
We shall describe the structure of these cells later. As we get close to the fovea, the 
number of cones increases, and in the fovea itself there are in fact nothing but cone 
cells, packed very tightly, so tightly that the cone cells are much finer, or narrower 
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Fig. 35-1. The eye. 



Fig. 35-2. The structure of the retina. 
(Light enters from below.) 



here than anywhere else. So we must appreciate that we see with the cones right 
in the middle of the field of view, but as we go to the periphery we have the other 
cells, the rods. Now the interesting thing is that in the retina each of the cells which 
is sensitive to light is not connected by a fiber directly to the optic nerve, but is 
connected to many other cells, which are themselves connected to each other. 
There are several kinds of cells: there are cells that carry the information toward 
the optic nerve, but there are others that are mainly interconnected “horizontally.” 
There are essentially four kinds of cells, but we shall not go into these details now. 
The main thing we emphasize is that the light signal is already being “thought 
about.” That is to say, the information from the various cells does not immediately 
go to the brain, spot for spot, but in the retina a certain amount of the information 
has already been digested, by a combining of the information from several visual 
receptors. It is important to understand that some brain-function phenomena 
occur in the eye itself. 


35-2 Color depends on intensity 

One of the most striking phenomena of vision is the dark adaptation of the 
eye. If we go into the dark from a brightly lighted room, we cannot see very well 
for a while, but gradually things become more and more apparent, and eventually 
we can see something where we could see nothing before. If the intensity of the 
light is very low, the things that we see have no color. It is known that this dark- 
adapted vision is amost entirely due to the rods, while the vision in bright light is 
due to the cones. As a result, there are a number of phenomena that we can easily 
appreciate because of this transfer of function from the cones and rods together, 
to just the rods. 

There are many situations in which, if the light intensity were stronger, we 
could see color, and we would find these things quite beautiful. One example is 
that through a telescope we nearly always see “black and white” images of faint 
nebulae, but W. C. Miller of the Mt. Wilson and Palomar Observatories had the 
patience to make color pictures of some of these objects. Nobody has ever really 
seen these colors with the eye, but they are not artificial colors, it is merely that the 
light intensity is not strong enough for the cones in our eye to see them. Among 
the more spectacular such objects are the ring nebula and the Crab nebula. The 
former shows a beautiful blue inner part, with a bright red outer halo, and the latter 
shows a general bluish haze permeated by bright red-orange filaments. 

In the bright light, apparently, the rods are at very low sensitivity but, in the 
dark, as time goes on they pick up their ability to see light. The variations in light 
intensity for which one can adapt is over a million to one. Nature does not do 
all this with just one kind of cell, but she passes her job from bright-light-seeing 
cells, the color-seeing cells, the cones, to low-intensity, dark-adapted cells, the rods. 
Among the interesting consequences of this shift is, first, that there is no color, and 
second, that there is a difference in the relative brightness of differently colored 
objects. It turns out that the rods see better toward the blue than the cones do, 
and the cones can see, for example, deep red light, while the rods find that absolutely 
impossible to see. So red light is black so far as the rods are concerned. Thus two 
pieces of colored paper, say blue and red, in which the red might be even brighter 
than the blue in good light, will, in the dark, appear completely reversed. It is a 
very striking effect. If we are in the dark and can find a magazine or something that 
has colors and, before we know for sure what the colors are, we judge the lighter 
and darker areas, and if we then carry the magazine into the light, we may see this 
very remarkable shift between which was the brightest color and which was not. 
The phenomenon is called the Purkinje effect . 

In Fig. 35-3, the dashed curve represents the sensitivity of the eye in the dark, 
i.e., using the rods, while the solid curve represents it in the light. We see that the 
peak sensitivity of the rods is in the green region and that of the cones is more in 
the yellow region. If there is a red-colored page (red is about 650 mju) we can see 
it if it is brightly lighted, but in the dark it is almost invisible. 
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Fig. 35-3. The spectral sensitivity of 
the eye. Dashed curve, rods; solid curve, 
cones. 



Another effect of the fact that rods take over in the dark, and that there are 
no rods in the fovea, is that when we look straight at something in the dark, our 
vision is not quite as acute as when we look to one side. A faint star or nebula can 
sometimes be seen better by looking a little to one side than directly at it, because 
we do not have sensitive rods in the middle of the fovea. 

Another interesting effect of the fact that the number of cones decreases as 
we go farther to the side of the field of view is that even in a bright light color 
disappears as the object goes far to one side. The way to test that is to look in 
some particular fixed direction, let a friend walk in from one side with colored 
cards, and try to decide what color they are before they are right in front of you. 
One finds that he can see that the cards are there long before he can determine the 
color. When doing this, it is advisable to come in from the side opposite the blind 
spot, because it is otherwise rather confusing to almost see the color, then not 
see anything, then to see the color again. 

Another interesting phenomenon is that the periphery of the retina is very 
sensitive to motion. Although we cannot see very well from the corner of our eye, 
if a little bug moves and we do not expect anything to be moving over there, 
we are immediately sensitive to it. We are all “wired up” to look for something 
jiggling to the side of the field. 

35-3 Measuring the color sensation 

Now we go to the cone vision, to the brighter vision, and we come to the 
question which is rhost characteristic of cone vision, and that is color. As we 
know, white light can be split by a prism into a whole spectrum of wavelengths 
which appear to us to have different colors; that is what colors are, of course: 
appearances. Any source of light can be analyzed by a grating or a prism, and 
one can determine the spectral distribution, i.e., the “amount” of each wavelength. 
A certain light may have a lot of blue, considerable red, very little yellow, and so 
on. That is all very precise in the sense of physics, but the question is, what color 
will it appear to be? It is evident that the different colors depend somehow upon 
the spectral distribution of the light, but the problem is to find what characteristics 
of the spectral distribution produce the various sensations. For example, what do 
we have to do to get a green color? We all know that we can simply take a piece 
of the spectrum which is green. But is that the only way to get green, or orange, 
or any other color? 

Is there more than one spectral distribution which produces the same apparent 
visual effect? The answer is, definitely yes. There is a very limited number of visual 
effects, in fact just a three-dimensional manifold of them, as we shall shortly see, 
but there is an infinite number of different curves that we can draw for the light 
that comes from different sources. Now the question we have to discuss is, under 
what conditions do different distributions of light appear as exactly the same color 
to the eye? 
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The most powerful psycho-physical technique in color judgment is to use 
the eye as a null instrument . That is, we do not try to define what consititutes a 
green sensation, or to measure in what circumstances we get a green sensation, 
because it turns out that this is extremely complicated. Instead, we study the 
conditions under which two stimuli are indistinguishable . Then we do not 
have to decide whether two people see the same sensation in different cir- 
cumstances, but only whether, if for one person two sensations are the same, they 
are also the saifie for another. We do not have to decide whether, when one sees 
something green, what it feels like inside is the same as what it feels like inside 
someone else when he sees something green; we do not know anything about that. 

To illustrate the possibilities, we may use a series of four projector lamps which 
have filters on them, and whose brightnesses are continuously adjustable over a 
wide range: one has a red filter and makes a spot of red light on the screen, the 
next one has a green filter and makes a green spot, the third one has a blue filter, 
and the fourth one is a white circle with a black spot in the middle of it. Now if 
we turn on some red light, and next to it put some green, we see that in the area 
of overlap it produces a sensation which is not what we call reddish green, but a 
new color, yellow in this particular case. By changing the proportions of the red 
and the green, we can go through various shades of orange and so forth. If we 
have set it for a certain yellow, we can also obtain that same yellow, not by mixing 
these two colors but by mixing some other ones, perhaps a yellow filter with white 
light, or something like that, to get the same sensation. In other words, it is possible 
to make various colors in more than one way by mixing the lights from various 
filters. 

What we have just discovered may be expressed analytically as follows. A 
particular yellow, for example, can be represented by a certain symbol Y , which is 
the “sum” of certain amounts of red-filtered light (7?) and green-filtered light (G). 
By using two numbers, say r and g , to describe how bright the (R) and (G) are, we 
can write a formula for this yellow: 

Y = rR + gG. (35.1) 

The question is, can we make all the different colors by adding together two or 
three lights of different, fixed colors? Let us see what can be done in that connec- 
tion. We certainly cannot get all the different colors by mixing only red and green, 
because, for instance, blue never appears in such a mixture. However, by putting 
in some blue the central region, where all three spots overlap, may be made to 
appear to be a fairly nice white. By mixing the various colors and looking at this 
central region, we find that we can get a considerable range of colors in that region 
by changing the proportions, and so it is not impossible that all the colors can be 
made by mixing these three colored lights. We shall discuss to what extent this is 
true; it is in fact essentially correct, and we shall shortly see how to define the 
proposition better. 

In order to illustrate our point, we move the spots on the screen so that they 
all fall on top of each other, and then we try to match a particular color which 
appears in the annular ring made by the fourth lamp. What we once thought was 
“white” coming from the fourth lamp now appears yellowish. We may try to match 
that by adjusting the red and green and blue as best we can by a kind of trial and 
error, and we find that we can approach rather closely this particular shade of 
“cream” color. So it is not hard to believe that we can make all colors. We shall 
try to make yellow in a moment, but before we do that, there is one color that might 
be very hard to make. People who give lectures on color make all the “bright” 
colors, but they never make brown , and it is hard to recall ever having seen brown 
light. As a matter of fact, this color is never used for any stage effect, one never 
sees a spotlight with brown light; so we think it might be impossible to make brown. 
In order to find out whether it is possible to make brown, we point out that brown 
light is merely something that we are not used to seeing without its background. 
As a matter of fact, we can make it by mixing some red and yellow. To prove 
that we are looking at brown light, we merely increase the brightness of the annular 
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background against which we see the very same light, and we see that that is, in 
fact, what we call brown! Brown is always a dark color next to a lighter back- 
ground. We can easily change the character of the brown. For example, if we take 
some green out we get a reddish brown, apparently a chocolatey reddish brown, 
and if we put more green into it, in proportion, we get that horrible color which 
all the uniforms of the Army are made of, but the light from that color is not so 
horrible by itself ; it is of yellowish green, but seen against a light background. 

Now we put a yellow filter in front of the fourth light and try to match that. 
(The intensity must of course be within the range of the various lamps; we cannot 
match something which is too bright, because we do not have enough power in 
the lamp.) But we can match the yellow; we use a green and red mixture, and put 
in a touch of blue to make it even more perfect. Perhaps we are ready to believe 
that, under good conditions, we can make a perfect match of any given color. 

Now let us discuss the laws of color mixture. In the first place, we found that 
different spectral distributions can produce the same color; next, we saw that 
“any” color can be made by adding together three special colors, red, blue, and 
green. The most interesting feature of color mixing is this: if we have a certain 
light, which we may call X , and if it appears indistinguishable from Y , to the eye 
(it may be a different spectral distribution, but it appears indistinguishable), we 
call these colors “equal,” in the sense that the eye sees them as equal, and we write 

X = Y. (35.2) 

Here is one of the great laws of color: if two spectral distributions are indistinguish- 
able, and we add to each one a certain light, say Z (if we write X + Z, this means 
that we shine both lights on the same patch), and then we take Y and add the same 
amount of the same other light, Z, the new mixtures are also indistinguishable: 

X + Z = Y + Z. (35.3) 

We have just matched our yellow; if we now shine pink light on the whole thing, 
it will still match. So adding any other light to the matched lights leaves a match. 
In other words, we can summarize all these color phenomena by saying that once 
we have a match between two colored lights, seen next to each other in the same 
circumstances, then this match will remain, and one light can be substituted for 
the other light in any other color mixing situation. In fact, it turns out, and it is 
very important and interesting, that this matching of the color of lights is not 
dependent upon the characteristics of the eye at the moment of observation : we 
know that if we look for a long time at a bright red surface, or a bright red light, 
and then look at a white paper, it looks greenish, and other colors are also dis- 
torted by our having looked so long at the bright red. If we now have a match 
between, say, two yellows, and we look at them and make them match, then we 
look at a bright red surface for a long time, and then turn back to the yellow, it 
may not look yellow any more; I do not know what color it will look, but it will 
not look yellow. Nevertheless the yellows will still look matched , and so, as the 
eye adapts to various levels of intensity, the color match still works, with the 
obvious exception of when we go into the region where the intensity of the light 
gets so low that we have shifted from cones to rods; then the color match is no 
longer a color match, because we are using a different system. 

The second principle of color mixing of lights is this: any color at all can be 
made from three different colors , in our case, red, green, and blue lights. By suitably 
mixing the three together we can make anything at all, as we demonstrated with 
our two examples. Further, these laws are very interesting mathematically. For 
those who are interested in the mathematics of the thing, it turns out as follows. 
Suppose that we take our three colors, which were red, green, and blue, but label 
them A , B , and C, and call them our primary colors. Then any color could be 
made by certain amounts of these three: say an amount a of color A , an amount 
b of color B , and an amount c of color C makes X: 


X = aA + bB + cC. 


(35.4) 
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Now suppose another color Y is made from the same three colors: 


Y = a' A + b’B + c'C. (35.5) 

Then it turns out that the mixture of the two lights (it is one of the consequences 
of the laws that we have already mentioned) is obtained by taking the sum of the 
components of Xand Y : 

Z=X+Y = (a + a f )A + (b + b f )B + (c + c')C. (35.6) 

It is just like the mathematics of the addition of vectors, where (< a , b, c) are the 
components of one vector, and (a\ b\ c f ) are those of another vector, and the 
new light Z is then the “sum” of the vectors. This subject has always appealed to 
physicists and mathematicians. In fact, Schrodinger wrote a wonderful paper on 
color vision in which he developed this theory of vector analysis as applied to the 
mixing of colors. 

Now a question is, what are the correct primary colors to use? There is no 
such thing as “the” correct primary colors for the mixing of lights. There may be, 
for practical purposes, three paints that are more useful than others for getting a 
greater variety of mixed pigments, but we are not discussing that matter now. 
Any three differently colored lights whatsoever* can always be mixed in the correct 
proportion to produce any color whatsoever. Can we demonstrate this fantastic 
fact? Instead of using red, green, and blue, let us use red, blue, and yellow in our 
projector. Can we use red, blue, and yellow to make, say, green? 

By mixing these three colors in various proportions, we get quite an array of 
different colors, ranging over quite a spectrum. But as a matter of fact, after a lot 
of trial and error, we find that nothing ever looks like green. The question is, can 
we make green? The answer is yes. How? By projecting some red onto the green , 
then we can make a match with a certain mixture of yellow and blue! So we have 
matched them, except that we had to cheat by putting the red on the other side. 
But since we have some mathematical sophistication, we can appreciate that what we 
really showed was not that X could always be made, say, of red, blue, and yellow, 
but by putting the red on the other side we found that red plus X could be made 
out of blue and yellow. Putting it on the other side of the equation, we can interpret 
that as a negative amount , so if we will allow that the coefficients in equations like 
(35.4) can be both positive and negative, and if we interpret negative amounts to 
mean that we have to add those to the other side , then any color can be matched by 
any three, and there is no such thing as “the” fundamental primaries. 

We may ask whether there are three colors that come only with positive 
amounts for all mixings. The answer is no. Every set of three primaries requires 
negative amounts for some colors, and therefore there is no unique way to define 
a primary. In elementary books they are said to be red, green, and blue, but that 
is merely because with these a wider range of colors is available without minus 
signs for some of the combinations. 

35-4 The chromaticity diagram 

Now let us discuss the combination of colors on a mathematical level as a 
geometrical proposition. If any one color is represented by Eq. (35.4), we can plot 
it as a vector in space by plotting along three axes the amounts a , Z>, and c , and 
then a certain color is a point. If another color is a', b', c ', that color is located 
somewhere else. The sum of the two, as we know, is the color which comes from 
adding these as vectors. We can simplify this diagram and represent everything 
on a plane by the following observation: if we had a certain color light, and merely 
doubled a and b and c, that is, if we make them all stronger in the same ratio, it is 
the same color, but brighter. So if we agree to reduce everything to the same light 
intensity , then we can project everything onto a plane, and this has been done in 
Fig. 35-4. It follows that any color obtained by mixing a given two in some pro- 


* Except, of course, if one of the three can be matched by mixing the other two. 
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portion will lie somewhere on a line drawn between the two points. For instance, 
a fifty-fifty mixture would appear halfway between them, and 1/4 of one and 3/4 
of the other would appear 1/4 of the way from one point to the other, and so on. 
If we use a blue and a green and a red, as primaries, we see that all the colors that 
we can make with positive coefficients are inside the dotted triangle, which contains 
almost all of the colors that we can ever see, because all the colors that we can ever 
see are enclosed in the oddly shaped area bounded by the curve. Where did this 
area come from? Once somebody made a very careful match of all the colors that 
we can see against three special ones. But we do not have to check all colors that 
we can see, we only have to check the pure spectral colors, the lines of the spectrum. 
Any light can be considered as a sum of various positive amounts of various pure 
spectral colors — pure from the physical standpoint. A given light will have a cer- 
tain amount of red, yellow, blue, and so on — spectral colors. So if we know how 
much of each of our three chosen primaries is needed to make each of these pure 
components, we can calculate how much of each is needed to make our given color. 
So, if we find out what the color coefficients of all the spectral colors are for any 
given three primary colors, then we can work out the whole color mixing table. 

An example of such experimental results for mixing three lights together is 
given in Fig. 35-5. This figure shows the amount of each of three different particular 
primaries, red, green and blue, which is required to make each of the spectral 
colors. Red is at the left end of the spectrum, yellow is next, and so on, all the way 
to blue. Notice that at some points minus signs are necessary. It is from such 
data that it is possible to locate the position of all of the colors on a chart, where 
the x- and the y-coordinates are related to the amounts of the different primaries 
that are used. That is the way that the curved boundary line has been found. It is 
the locus of the pure spectral colors. Now any other color can be made by adding 
spectral lines, of course, and so we find that anything that can be produced by 
connecting one part of this curve to another is a color that is available in nature. 
The straight line connects the extreme violet end of the spectrum with the extreme 
red end. It is the locus of the purples. Inside the boundary are colors that can be 
made with lights, and outside it are colors that cannot be made with lights, and 
nobody has ever seen them (except, possibly, in after-images!). 

35-5 The mechanism of color vision 

Now the next aspect of the matter is the question, why do colors behave in this 
way? The simplest theory, proposed by Young and Helmholtz, supposes that in 
the eye there are three different pigments which receive the light and that these 
have different absorption spectra, so that one pigment absorbs strongly, say, in 
the red, another absorbs strongly in the blue, another absorbs in the green. Then 
when we shine a light on them we will get different amounts of absorptions in the 
three regions, and these three pieces of information are somehow maneuvered in 
the brain or in the eye, or somewhere, to decide what the color is. It is easy to 
demonstrate that all of the rules of color mixing would be a consequence of this 
proposition. There has been considerable debate about the thing because the next 
problem, of course, is to find the absorption characteristics of each of the three 
pigments. It turns out, unfortunately, that because we can transform the color 
coordinates in any manner we want to, we can only find all kinds of linear 
combinations of absorption curves by the color-mixing experiments, but not the 
curves for the individual pigments. People have tried in various ways to obtain a 
specific curve which does describe some particular physical property of the eye. 
One such curve is called a brightness curve , demonstrated in Fig. 35-3. In this 
figure are two curves, one for eyes in the dark, the other for eyes in the light; 
the latter is the cone brightness curve. This is measured by finding what is the 
smallest amount of colored light we need in order to be able to just see it. This 
measures how sensitive the eye is in different spectral regions. There is another 
very interesting way to measure this. If we take two colors and make them 
appear in an area, by flickering back and forth from one to the other, we see 
a flicker if the frequency is too low. However, as the frequency increases, 
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Fig. 35-4. The standard chromaticity 
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Fig. 35-5. The color coefficients of 
pure spectral colors in terms of a certain 
set of standard primary colors. 



the flicker will ultimately disappear at a certain frequency that depends on the 
brightness of the light, let us say at 16 repetitions per second. Now if we adjust 
the brightness or the intensity of one color against the other, there comes an 
intensity where the flicker at 16 cycles disappears. To get flicker with the brightness 
so adjusted, we have to go to a much lower frequency in order to see a flicker of the 
color. So, we get what we call a flicker of the brightness at a higher frequency and, 
at a lower frequency, a flicker of the color. It is possible to match two colors for 
“equal brightness” by this flicker technique. The results are almost, but not exactly, 
the same as those obtained by measuring the threshold sensitivity of the eye for 
seeing weak light by the cones. Most workers use the flicker system as a definition 
of the brightness curve. 

Now, if there are three color-sensitive pigments in the eye, the problem is to 
determine the shape of the absorption spectrum of each one. How? We know 
there are people who are color blind — eight percent of the male population, and 
one-half of one percent of the female population. Most of the people who are 
color blind or abnormal in color vision have a different degree of sensitivity than 
others to a variation of color, but they still need three colors to match. However, 
there are some who are called dichromats , for whom any color can be matched 
using only two primary colors. The obvious suggestion, then, is to say that they are 
missing one of the three pigments. If we can find three kinds of color-blind dichro- 
mats who have different color-mixing rules, one kind should be missing the red , 
another the green , and another the blue pigmentation. By measuring all these types 
we can determine the three curves! It turns out that there are three types of dichro- 
matic color blindness; there are two common types and a third very rare type, 
and from these three it has been possible to deduce the pigment absorption spectra. 




Fig. 35-6. Loci of colors confused by 
deuteranopes. 


Fig. 35-7. Loci of colors confused 
by protanopes. 


Figure 35-6 shows the color mixing of a particular type of color-blind person 
called a deuteranope. For him, the loci of constant colors are not points, but 
certain lines, along each of which the color appears to him to be the same. If the 
theory that he is missing one of the three pieces of information is right, all these 
lines should intersect at a point. If we carefully measure on this graph, they do 
intersect perfectly. Obviously, therefore, this has been made by a mathematician 
and does not represent real data! As a matter of fact, if we look at the latest paper 
with real data, it turns out that in the graph of Fig. 35-6, the point of focus of all 
the lines is not exactly at the right place. Using the lines in the above figure, we 
cannot find reasonable spectra; we need negative and positive absorptions in 
different regions. But using the new data of Yustova, it turns out that each of the 
absorption curves is everywhere positive. 
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Figure 35-7 shows a different kind of color blindness, that of the protanope, 
which has a focus near the red end of the boundary curve. Yustova gets approxi- 
mately the same position in this case. Using the three different kinds of color 
blindness, the three pigment response curves have finally been deter min er! and 
are shown in Fig. 35-8. Finally? Perhaps. There is a question as to whether the 
three-pigment idea is right, whether color blindness results from lack of one 
pigment, and even whether the color-mix data on color blindness are right. Differ- 
ent workers get different results. This field is still very much under development. 


35-6 Physiochemistry of color vision 

Now, what about checking these curves against actual pigments in the eye? 
The pigments that can be obtained from a retina consist mainly of a pigment called 
visual purple. The most remarkable features of this are, first, that it is in the eye 
of almost every vertebrate animal, and second, that its response curve fits beauti- 
fully with the sensitivity of the eye, as seen in Fig. 35-9, in which are plotted on the 
same scale the absorption of visual purple and the sensitivity of the dark-adapted 
eye. This pigment is evidently the pigment that we see with in the dark: visual 
purple is the pigment for the rods, and it has nothing to do with color vision. 
This fact was discovered in 1877. Even today it can be said that the color pigments 
of the cones have never been obtained in a test tube. In 1958 it could be said 
that the color pigments had never been seen at all. But since that time, two of 
them have been detected by Rushton by a very simple and beautiful technique. 

The trouble is, presumably, that since the eye is so weakly sensitive to bright 
light compared with light of low intensity, it needs a lot of visual purple to see with, 
but not much of the color pigments for seeing colors. Rushton’s idea is to leave 
the pigment in the eye, and measure it anyway. What he does is this. There is an 
instrument called an opthalmoscope for sending light into the eye through the lens 
and then focusing the light that comes back out. With it one can measure how 
much is reflected. So one measures the reflection coefficient of light which has gone 
twice through the pigment (reflected by a back layer in the eyeball, and coming 
out through the pigment of the cone again). Nature is not always so beautifully 
designed. The cones are interestingly designed so that the light that comes into the 
cone bounces around and works its way down into the little sensitive points at the 
apex. The light goes right down into the sensitive point, bounces at the bottom and 
comes back out again, having traversed a considerable amount of the color-vision 
pigment; also, by looking at the fovea, where there are no rods, one is not confused 
by visual purple. But the color of the retina has been seen a long time ago: it is a 
sort of orangey pink; then there are all the blood vessels, and the color of the 
material at the back, and so on. How do we know when we are looking at the 
pigment? Answer: First we take a color-blind person, who has fewer pigments and 
for whom it is therefore easier to make the analysis. Second, the various pigments, 
like visual purple, have an intensity change when they are bleached by light; when 
we shine light on them they change their concentration. So, while looking at the 
absorption spectrum of the eye, Rushton put another beam in the whole eye, which 
changes the concentration of the pigment, and he measured the change in the 
spectrum, and the difference, of course, has nothing to do with the amount of 
blood or the color of the reflecting layers, and so on, but only the pigment, and in 
this manner Rushton obtained a curve for the pigment of the protanope eye, which 
is given in Fig. 35-10. 

The second curve in Fig. 35-10 is a curve obtained with a normal eye. This 
was obtained by taking a normal eye and, having already determined what one 
pigment was, bleaching the other one in the red where the first one is insensitive. 
Red light has no effect on the protanope eye, but does in the normal eye, and thus 
one can obtain the curve for the missing pigment. The shape of one curve fits 
beautifully with Yustova’s green curve, but the red curve is a little bit displaced. 
So perhaps we are getting on the right track. Or perhaps not— the latest work 
with deuteranopes does not show any definite pigment missing. 
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Fig. 35-9. The sensitivity curve of 
the dark-adapted eye, compared with 
the absorption curve of visual purple. 
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Fig. 35-10. Absorption spectrum of 
the color pigment of a protanope color- 
blind eye (squares) and a normal eye 
(dots). 


Color is not a question of the physics of the light itself. Color is a sensation , 
and the sensation for different colors is different in different circumstances. For 
instance, if we have a pink light, made by superimposing crossing beams of white 
light and red light (all we can make with white and red is pink, obviously), we may 
show that white light may appear blue. If we place an object in the beams, it casts 
two shadows — one illuminated by the white light alone and the other by the red. 
For most people the “white” shadow of an object looks blue, but if we keep ex- 
panding this shadow until it covers the entire screen, we see that it suddenly appears 
white, not blue! We can get other effects of the same nature by mixing red, yellow, 
and white light. Red, yellow, and white light can produce only orangey yellows, 
and so on. So if we mix such lights roughly equally, we get only orange light. 
Nevertheless, by casting different kinds of shadows in the light, with various 
overlaps of colors, one gets quite a series of beautiful colors which are not in the 
light themselves (that is only orange), but in our sensations. We clearly see many 
different colors that are quite unlike the “physical” ones in the beam. It is very 
important to appreciate that a retina is already “thinking” about the light; it is 
comparing what it sees in one region with what it sees in another, although not 
consciously. What we know of how it does that is the subject of the next chapter. 
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36 


Mechanisms of Seeing 


36-1 The sensation of color 


In discussing the sense of sight, we have to realize that (outside of a gallery 
of modern art!) one does not see random spots of color or spots of light. When 
we look at an object we see a man or a things in other words, the brain interprets 
what we see. How it does that, no one knows, and it does it, of course, at a very 
high level. Although we evidently do learn to recognize what a man looks like 
after much experience, there are a number of features of vision which are more 
elementary but which also involve combining information from different parts of 
what we see. To help us understand how we make an interpretation of an entire 
image, it is worth while to study the earliest stages of the putting together of in- 
formation from the different retinal cells. In the present chapter we shall concen- 
trate mainly on that aspect of vision, although we shall also mention a number of 
side issues as we go along. 

An example of the fact that we have an accumulation, at a very elementary 
level, of information from several parts of the eye at the same time, beyond our 
voluntary control or ability to learn, was that blue shadow which was produced 
by white light whe;i both white and red were shining on the same screen. This 
effect at least involves the knowledge that the background of the screen is pink, 
even though, When we are looking at the blue shadow, it is only “white” light 
coming into a particular spot in the eye; somewhere, pieces of information have 
been put together. The more complete and familiar the context is, the more the eye 
will make corrections for peculiarities. In fact, Land has shown that if we mix that 
apparent blue and the red in various proportions, by using two photographic 
transparencies with absorption in front of the red and the white in different pro- 
portions, it can be made to represent a real scene, with real objects, rather faithfully. 
In this case we get a lot of intermediate apparent colors too, analogous to what we 
would get by mixing red and blue-green ; it seems to be an almost complete set of 
colors, but if we look very hard at them, they are not so very good. Even so, it is 
surprising how much can be obtained from just red and white. The more the scene 
looks like a real situation, the more one is able to compensate for the fact that all the 
light is actually nothing but pink! 

Another example is the appearance of “colors” in a black-and-white rotating 
disc, whose black and white areas are as shown in Fig. 36-1. When the disc is 
rotated, the variations of light and dark at any one radius are exactly the same; it 
is only the background that is different for the two kinds of “stripes.” Yet one of 
the “rings” appears colored with one color and the other with another* No one 
yet understands the reason fot those colors, but it is clear that information is being 
put together at a very elementary level, in the eye itself, most likely. 

Almost all present-day theories of color vision agree that the color-mixing data 
indicate that there are only three pigments in the cones of the eye, and that it is 
the spectral absorption in those three pigments that fundamentally produces 
the color sense. But the total sensation that is associated with the absorption 
characteristics of the three pigments acting together is not necessarily the sum of 
the individual sensations. We all agree that yellow Simply does not seem to be 
reddish green ; in fact it might be a tremendous surprise to most people to discover 
that light is, in fact, a mixture of colors, because presumably the sensation of light 
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Fig. 36-1 . When a disc like the 
above is spun, colors appear in only one 
of the two darker “rings." If the spin 
direction is reversed, the colors appear 
in the other ring. 


* The colors depend on speed of rotation, on the brightness of illumination, and to 
some extent on who looks at them and how intently he stares at them. 
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is due to some other process than a simple mixture like a chord in music, where 
the three notes are there at the same time and if we listen hard we can hear them 
individually. We cannot look hard and see the red and the green. 

The earliest theories of vision said that there are three pigments and three 
kinds of cones, each kind containing one pigment; that a nerve runs from each 
cone to the brain, so that the three pieces of information are carried to the brain ; 
and then in the brain, anything can happen. This, of course, is an incomplete idea: 
it does no good to discover that the information is carried along the optic nerve 
to the brain, because we have not even started to solve the problem. We must 
ask more basic questions: Does it make any difference where the information is put 
together? Is it important that it be carried right up into the brain in the optic 
nerve, or could the retina do some analysis first? We have seen a picture of the 
retina as an extremely complicated thing with lots of interconnections (Fig. 35-2) 
and it might make some analyses. 

As a matter of fact, people who study anatomy and the development of the 
eye have shown that the retina is, in fact, the brain : in the development of the em- 
bryo, a piece of the brain comes out in front, and long fibers grow back, con- 
necting the eyes to the brain. The retina is organized in just the way the brain is 
organized and, as someone has beautifully put it, “The brain has developed a way 
to look out upon the world.” The eye is a piece of brain that is touching light, so 
to speak, on the outside. So it is not at all unlikely that some analysis of the color 
has already been made in the retina. 

This gives us a very interesting opportunity. None of the other senses involves 
such a large amount of calculation, so to speak, before the signal gets into a nerve 
that one can make measurements on. The calculations for all the rest of the senses 
usually happen in the brain itself, where it is very difficult to get at specific places 
to make measurements, because there are so many interconnections. Here, with 
the visual sense, we have the light, three layers of cells making calculations, and 
the results of the calculations being transmitted through the optic nerve. So 
we have the first chance to observe physiologically how, perhaps, the first layers 
of the brain work in their first steps. It is thus of double interest, not simply 
interesting for vision, but interesting to the whole problem of physiology. 

The fact that there are three pigments does not mean that there must be three 
kinds of sensations. One of the other theories of color vision has it that there are 
really opposing color schemes (Fig. 36-2). That is, one of the nerve fibers carries 
a lot of impulses if there is yellow being seen, and less than usual for blue. Another 
nerve fiber carries green and red information in the same way, and another, white 
and black. In other words, in this theory someone has already started to make a 
guess as to the system of wiring, the method of calculation. 

The problems we are trying to solve by guessing at these first calculations are 
questions about the apparent colors that are seen on a pink background, what hap- 
pens when the eye is adapted to different colors, and also the so-called psychological 
phenomena. The psychological phenomena are of the nature, for instance, that 
white does not “feel” like red and yellow and blue, and this theory was advanced 
because the psychologists say that there are four apparent pure colors: “There are 
four stimuli which have a remarkable capacity to evoke psychologically simple 
blue, yellow, green, and red hues respectively. Unlike sienna, magenta, purple, or 
most of the discriminable colors, these simple hues are unmixed in the sense that 
none partakes of the nature of the other ; specifically, blue is not yellowish, reddish, 
or greenish, and so on; these are psychologically primary hues.” That is a psycho- 
logical fact, so-called. To find out from what evidence this psychological fact was 
deduced, we must search very hard indeed through all the literature. In the modern 
literature all we find on the subject are repeats of the same statement, or of one by 
a German psychologist, who uses as one of his authorities Leonardo da Vinci, who, 
of course, we all know was a great artist. He says, “Leonardo thought there were 
five colors.” Then, looking still further, we find, in a still older book, the evidence 
for the subject. The book says something like this: “Purple is reddish-blue, orange 
is reddish-yellow, but can red be seen as purplish-orange? Are not red and yellow 
more unitary than purple or orange? The average person, asked to state which 
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colors are unitary, names red, yellow, and blue, these three, and some observers 
add a fourth, green. Psychologists are accustomed to accept the four as salient 
hues.” So that is the situation in the psychological analysis of this matter: if every- 
body says there are three, and somebody says there are four, and they want it to 
be four, it will be four. That shows the difficulty with psychological researches. 
It is clear that we have such feelings, but it is very difficult to obtain much informa- 
tion about them. 

So the other direction to go is the physiological direction, to find out experi- 
mentally what actually happens in the brain, the eye, the retina, or wherever, and 
perhaps to discover that some combinations of impulses from various cells move 
along certain nerve fibers. Incidentally, primary pigments do not have to be in 
separate cells; one could have cells in which are mixtures of the various pigments, 
cells with the red and the green pigments, cells with all three (the information of all 
three is then white information), and so on. There are many ways of hooking the 
system up, and we have to find out which way nature has used. It would be 
hoped, ultimately, that when we understand the physiological connections we will 
have a little bit of understanding of some of those aspects of the psychology, so we 
look in that direction. 


36-2 The physiology of the eye 

We begin by talking not only about color vision, but about vision in general, 
just to remind ourselves about the interconnections in the retina, shown in Fig. 
35-2. The retina is really like the surface of the brain. Although the actual picture 
through a microscope is a little more complicated looking than this somewhat 
schematized drawing, by careful analysis one can see all these interconnections. 
There is no question that one part of the surface of the retina is connected to other 
parts, and that the information that comes out on the long axons, which produce 
the optic nerve, are combinations of information from many cells. There are three 
layers of cells in the succession of function: there are retinal cells, which are the 
ones that the light affects, an intermediate cell which takes information from a 
single or a few retinal cells and gives it out again to several cells in a third layer of 
cells and carries it to the brain. There are all kinds of cross connections between 
cells in the layers. 

We now turn to some aspects of the structure and performance of the eye 
(see Fig. 35-1). The focusing of the light is accomplished mainly by the cornea, 
by the fact that it has a curved surface which “bends” the light. This is why we 
cannot see clearly under water, because we then do not have enough difference 
between the index of the cornea, which is 1.37, and that of the water, which is 1.33. 
Behind the cornea is water, practically, with an index of 1.33, and behind that is a 
lens which has a very interesting structure: it is a series of layers, like an onion, 
except that it is all transparent, and it has an index of 1.40 in the middle and 1.38 
at the outside. (It would be nice if we could make optical glass in which we could 
adjust the index throughout, for then we would not have to curve it as much as 
we do when we have a uniform index.) Furthermore, the shape of the cornea is not 
that of a sphere. A spherical lens has a certain amount of spherical aberration. 
The cornea is “flatter” at the outside than is a sphere, in just such a manner that 
the spherical aberration is less for the cornea than it would be if we put a spherical 
lens in there! The light is focused by the cornea-lens system onto the retina. As 
we look at things that are closer and farther away, the lens tightens and loosens 
and changes the focus to adjust for the different distances. To adjust for the total 
amount of light there is the iris, which is what we call the color of the eye, brown 
or blue, depending on who it is; as the amount of light increases and decreases, 
the iris moves in and out. 

Let us now look at the neural machinery for controlling the accommodation 
of the lens, the motion of the eye, the muscles which turn the eye in the socket, and 
the iris, shown schematically in Fig. 36-3. Of all the information that comes out 
of the optic nerve A , the great majority is divided into one of two bundles (which 
we will talk about later) and thence to the brain. But there are a few fibers, of 



Fig. 36-3. The neural interconnec- 
tions for the mechanical operation of 
the eyes. 
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Fig. 36-4. The neural connections 
from the eyes to the visual cortex. 


interest to us now, which do not run directly to the visual cortex of the brain 
where we “see” the images, but instead go into the mid-brain H. These are the 
fibers which measure the average light and make adjustment for the iris; or, if the 
image looks foggy, they try to correct the lens; or, if there is a double image, they 
try to adjust the eye for binocular vision. At any rate, they go through the mid- 
brain and feed back into the eye. At K are the muscles which run the accommoda- 
tion of the lens, and at L another one that runs into the iris. The iris has two 
muscle systems. One is a circular muscle L which, when it is excited, pulls in and 
closes down the iris; it acts very rapidly and the nerves are directly connected from 
the brain through short axons into the iris. The opposite muscles are radial muscles, 
so that, when the things get dark and the circular muscle relaxes, these radial 
muscles pull out. Here we have, as in many places in the body, a pair of muscles 
which work in opposite directions, and in almost every such case the nerve systems 
which control the two are very delicately adjusted, so that when signals are sent in 
to tighten one, signals are automatically sent in to loosen the other. The iris is a 
peculiar exception: the nerves which make the iris contract are the ones we have 
already described, but the nerves which make the iris expand come out from no one 
knows exactly where, go down into the spinal cord back of the chest, into the 
thoracic sections, out of the spinal cord, up through the neck ganglia, and all the 
way around and back up into the head in order to run the other end of the iris. 
In fact, the signal goes through a completely different nervous system, not the 
central nervous system at all, but the sympathetic nervous system, so it is a very 
strange way of making things go. 

We have already emphasized another strange thing about the eye, that the 
light-sensitive cells are on the wrong side, so that the light has to go through several 
layers of other cells before it gets to the receptors — it is built inside out ! So some 
of the features are wonderful and some are apparently stupid. 

Figure 36-4 shows the connections of the eye to the part of the brain which is 
most directly concerned with the visual process. The optic nerve fibers run into a 
certain area just beyond D, called the lateral geniculate, whereupon they run out 
to a section of the brain called the visual cortex. Notice that some of the fibers from 
each eye are sent over to the other side of the brain, so the picture formed is in- 
complete. The optic nerves from the left side of the right eye run across the optic 
chiasma B, while the ones on the left side of the left eye come around and go this 
same way. So the left side of the brain receives all the information which comes 
from the left side of the eyeball of each eye, i.e., on the right side of the visual 
field, while the right side of the brain sees the left side of the visual field. This is 
the manner in which the information from each of the two eyes is put together 
in order to tell how far away things are. This is the system of binocular vision. 

The connections between the retina and the visual cortex are interesting. If a 
spot in the retina is excised or destroyed in any way, then the whole fiber will die, 
and we can thereby find out where it is connected. It turns out that, essentially, 
the connections are one to one — for each spot in the retina there is one spot in 
the visual cortex — and spots that are very close together in the retina are very 
close together in the visual cortex. So the visual cortex still represents the spatial 
arrangement of the rods and cones, but of course much distorted. Things which 
are in the center of the field, which occupy a very small part of the retina, are 
expanded over many, many cells in the visual cortex. It is clear that it is useful to 
have things which are originally close together, still close together. The most 
remarkable aspect of the matter, however, is the following. The place where one 
would think it would be most important to have things close together would be 
right in the middle of the visual field. Believe it or not, the up-and-down line in 
our visual field as we look at something is of such a nature that the information 
from all the points on the right side of that line is going into the left side of the brain, 
and information from the points on the left side is going into the right side of the 
brain, and the way this area is made, there is a cut right down through the middle, 
so that the things that are very close together right in the middle are very far apart 
in the brain! Somehow, the information has to go from one side of the brain to 
the other through some other channels, which is quite surprising. 
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The question of how this network ever gets “wired” together is very interesting. 
The problem of how much is already wired and how much is learned is an old one. 
It used to be thought long ago that perhaps it does not have to be wired carefully 
at all, it is only just roughly interconnected, and then, by experience, the young 
child learns that when a thing is “up there” it produces some sensation in the brain. 
(Doctors always tell us what the young child “feels,” but how do they know what 
a child feels at the age of one?) The child, at the age of one, supposedly sees that 
an object is “up there,” gets a certain sensation, and learns to reach “there,” 
because when he reaches “here,” it does not work. That approach probably is not 
correct, because we already see that in many cases there are these special detailed 
interconnections. More illuminating are some most remarkable experiments done 
with a salamander. (Incidentally, with the salamander there is a direct crossover 
connection, without the optic chiasma, because the eyes are on each side of the head 
and have no common area. Salamanders do not have binocular vision.) The experi- 
ment is this. We can cut the optic nerve in a salamander and the nerve will grow out 
from the eyes again. Thousands and thousands of cell fibers will thus re-establish 
themselves. Now, in the optic nerve the fibers do not stay adjacent to each other — it 
is like a great, sloppily made telephone cable, all the fibers twisting and turning, 
but when it gets to the brain they are all sorted out again. When we cut the optic 
nerve of the salamander, the interesting question is, will it ever get straightened out? 
The answer is remarkable: yes. If we cut the optic nerve of the salamander and it 
grows back, the salamander has good visual acuity again. However, if we cut the 
optic nerve and turn the eye upside down and let it grow back again, it has good 
visual acuity all right, but it has a terrible error : when the salamander sees a fly 
“up here,” it jumps at it “down there,” and it never learns. Therefore there is some 
mysterious way by which the thousands and thousands of fibers find their right 
places in the brain. 

This problem of how much is wired in, and how much is not, is an important 
problem in the theory of the development of creatures. The answer is not known, 
but is being studied intensively. 

The same experiment in the case of a goldfish shows that there is a terrible 
knot, like a great scar or complication, in the optic nerve where we cut it, but in 
spite of all this the fibers grow back to their right places in the brain. 

In order to do this, as they grow into the old channels of the optic nerve they 
must make several decisions about the direction in which they should grow. How 
do they do this? There seem to be chemical clues that different fibers respond to 
differently. Think of the enormous number of growing fibers, each of which is an 
individual differing in some way from its neighbors; in responding to whatever 
the chemical clues are, it responds in a unique enough way to find its proper place 
for ultimate connection in the brain! This is an interesting — a fantastic — thing. 
It is one of the great recently discovered phenomena of biology and is undoubtedly 
connected to many older unsolved problems of growth, organization, and develop- 
ment of organisms, and particularly of embryos. 

One other interesting phenomenon has to do with the motion of the eye. The 
eyes must be moved in order to make the two images coincide in different cir- 
cumstances. These motions are of different kinds: one is to follow something, 
which requires that both eyes must go in the same direction, right or left, and the 
other is to point them toward the same place at various distances away, which 
requires that they must move oppositely. The nerves going into the muscles of the 
eye are already wired up for just such purposes. There is one set of nerves which 
will pull the muscles on the inside of one eye and the outside of the other, and relax 
the opposite muscles, so that the two eyes move together. There is another center 
where an excitation will cause the eyes to move in toward each other from parallel. 
Either eye can be turned out to the corner if the other eye moves toward the nose, 
but it is impossible consciously or unconsciously to turn both eyes out at the same 
time, not because there are no muscles , but because there is no way to send a signal 
to turn both eyes out, unless we have had an accident or there is something the 
matter, for instance if a nerve has been cut. Although the muscles of one eye can 
certainly steer that eye about, not even a Yogi is able to move both eyes out freely 
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Fig. 36-5. Electron micrograph of a 
rod cell. 
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Fig. 36-6. The structure of retinene. 


under voluntary control, because there does not seem to be any way to do it. We 
are already wired to a certain extent. This is an important point, because most of 
the earlier books on anatomy and psychology, and so on, do not appreciate or do 
not emphasize the fact that we are so completely wired already — they say that 
everything is just learned. 

36-3 The rod cells 

Let us now examine in more detail what happens in the rod cells. Figure 36-5 
shows an electron micrograph of the middle of a rod cell (the rod cell keeps going 
up out of the field). There are layer after layer of plane structures, shown magnified 
at the right, which contain the substance rhodopsin (visual purple), the dye, 
or pigment, which produces the effects of vision in the rods. The rhodopsin, which 
is the pigment, is a big protein which contains a special group called retinene, 
which can be taken off the protein, and which is, undoubtedly, the main cause of 
the absorption of light. We do not understand the reason for the planes, but it is 
very likely that there is some reason for holding all the rhodopsin molecules parallel. 
The chemistry of the thing has been worked out to a large extent, but there might 
be some physics to it. It may be that all of the molecules are arranged in some 
kind of a row so that when one is excited an electron which is generated, say, may 
run all the way down to some place at the end to get the signal out, or something. 
This subject is very important, and has not been worked out. It is a field in which both 
biochemistry and solid state physics, or something like it, will ultimately be used. 

This kind of a structure, with layers, appears in other circumstances where 
light is important, for example in the chloroplast in plants, where the light causes 
photosynthesis. If we magnify those, we find the same thing with almost the same 
kind of layers, but there we have chlorophyll, of course, instead of retinene. The 
chemical form of retinene is shown in Fig. 36-6. It has a series of alternate double 
bonds along the side chain, which is characteristic of nearly all strongly absorbing 
organic substances, like chlorophyll, blood, and so on. This substance is impossible 
for human beings to manufacture in their own cells — we have to eat it. So we eat it 
in the form of a special substance, which is exactly the same as retinene except 
that there is a hydrogen tied on the right end ; it is called vitamin A, and if we do 
not eat enough of it, we do not get a supply of retinene, and the eye becomes what 
we call night blind , because there is then not enough pigment in the rhodopsin 
to see with the rods at night. 

The reason why such a series of double bonds absorbs light very strongly is 
also known. We may just give a hint: The alternating series of double bonds is 
called a conjugated double bond ; a double bond means that there is an extra electron 
there, and this extra electron is easily shifted to the right or left. When light strikes 
this molecule, the electron of each double bond is shifted over by one step. All 
the electrons in the whole chain shift, like a string of dominoes falling over, and 
though each one moves only a little distance (we would expect that, in a single 
atom, we could move the electron only a little distance), the net effect is the same 
as though the one at the end was moved over to the other end! It is the same as 
though one electron went the whole distance back and forth, and so, in this manner, 
we get a much stronger absorption under the influence of the electric field, than if 
we could only move the electron a distance which is associated with one atom. 
So, since it is easy to move the electrons back and forth, retinene absorbs light very 
strongly; that is the machinery of the physical-chemical end of it. 


36-4 The compound (insect) eye 

Let us now return to biology. The human eye is not the only kind of eye. In 
the vertebrates, almost all eyes are essentially like the human eye. However, in 
the lower animals there are many other kinds of eyes: eye spots, various eye cups, 
and other less sensitive things, which we have no time to discuss. But there is one 
other highly developed eye among the invertebrates, the compound eye of the insect. 
(Most insects having large compound eyes also have various additional simpler 
eyes as well.) A bee is an insect whose vision has been studied very carefully. It is 
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easy to study the properties of the vision of bees because they are attracted to honey, 
and we can make experiments in which we identify the honey by putting it on blue 
paper or red paper, and see which one they come to. By this method some very 
interesting things have been discovered about the vision of the bee. 

In the first place, in trying to measure how acutely bees could see the color 
difference between two pieces of “white” paper, some researchers found they were 
not very good, and others found they were fantastically good. Even if the two 
pieces of white paper were almost exactly the same, the bees could still tell the 
difference. The experimenters used zinc white for one piece of paper and lead 
white for the other, and although these look exactly the same to us, the bee could 
easily distinguish them, because they reflect a different amount in the ultraviolet. 
In this way it was discovered that the bee’s eye is sensitive over a wider range of 
the spectrum than is our own. Our eye works from 7000 angstroms to 4000 ang- 
stroms, from red to violet, but the bee’s can see down to 3000 angstroms into the 
ultraviolet! This makes for a number of different interesting effects. In the first 
place, bees can distinguish between many flowers which to us look alike. Of course, 
we must realize that the colors of flowers are not designed for our eyes, but for the 
bee; they are signals to attract the bees to a specific flower. We all know that there 
are many “white” flowers. Apparently white is not very interesting to the bees, 
because it turns out that all of the white flowers have different proportions of 
reflection in the ultraviolet ; they do not reflect one hundred percent of the ultra- 
violet as would a true white. All the light is not coming back, the ultraviolet is 
missing, and that is a color, just as, for us, if the blue is missing, it comes out 
yellow. So, all the flowers are colored for the bees. However, we also know that 
red cannot be seen by bees. Thus we might expect that all red flowers should look 
black to the bee. Not so! A careful study of red flowers shows, first, that even with 
our own eye we can see that a great majority of red flowers have a bluish tinge be- 
cause they are mainly reflecting an additional amount in the blue, which is the part 
that the bee sees. Furthermore, experiments also show that flowers vary in their 
reflection of the ultraviolet over different parts of the petals, and so on. So if we 
could see the flowers as bees see them they would be even more beautiful and varied! 

It has been shown, however, that there are a few red flowers which do not re- 
flect in the blue or in the ultraviolet, and would , therefore, appear black to the bee! 
This was of quite some concern to the people who worry about this matter, because 
black does not seem like an interesting color, since it is hard to tell from a dirty 
old shadow. It actually turned out that these flowers were not visited by bees, these 
are the flowers that are visited by hummingbirds , and hummingbirds can see the red! 

Another interesting aspect of the vision of the bee is that bees can apparently 
tell the direction of the sun by looking at a patch of blue sky, without seeing the 
sun itself. We cannot easily do this. If we look out the window at the sky and see 
that it is blue, in which direction is the sun? The bee can tell, because the bee is 
quite sensitive to the polarization of light, and the scattered light of the sky is 
polarized.* There is still some debate about how this sensitivity operates. Whether 
it is because the reflections of the light are different in different circumstances, or 
the bee’s eye is directly sensitive, is not yet known.f 

It is also said that the bee can notice flicker up to 200 oscillations per second, 
while we see it only up to 20. The motions of bees in the hives are very quick; 
the feet move and the wings vibrate, but it is very hard for us to see these 
motions with our eye. However, if we could see more rapidly we would be able 
to see the motion. It is probably very important to the bee that its eye has such 
a rapid response. 


* The human eye also has a slight sensitivity to the polarization of light, and one can 
learn to tell the direction of the sun! The phenomenon that is involved here is called 
Haidinger's brush; it is a faint, yellowish hourglass-like pattern seen at the center of the 
visual field when one looks at a broad, featureless expanse using polarizing glasses. It 
can also be seen in the blue sky without polarizing glasses if one rotates his head back and 
forth about the axis of vision. 

| Evidence obtained since this lecture was given indicates that the eye is directly sensi- 
tive. 
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Fig. 36-7. The structure of an om- 
matidium (a single cell of a compound 
eye). 


Now let us discuss the visual acuity we could expect from the bee. The eye of 
a bee is a compound eye, and it is made of a large number of special cells called 
ommatidia , which are arranged conically on the surface of a sphere (roughly) on 
the outside of the b&e’s head. Figure 36-7 shows a picture of one such ommatidium. 
At the top there is a transparent area, a kind of “lens,” but actually it is more like 
a filter or light pipe to make the light come down along the narrow fiber, which is 
where the absorption presumably occurs. Out of the other end of it comes the 
nerve fiber. The central fiber is surrounded on its sides by six cells which, in fact, 
have secreted the fiber. That is enough description for our purposes; the point is 
that it is a conical thing and many can fit next to each other all over the surface of 
the eye of the bee. 

Now let us discuss the resolution of the eye of the bee. If we draw lines (Fig. 
36-8) to represent the ommatidia on the surface, which we suppose is a sphere of 
radius r, we may actually calculate how wide each ommatidium is by using our 
brains, and assuming that evolution is as clever as we are! If we have a very large 
ommatidium we do not have much resolution. That is, one cell gets a piece of 
information from one direction, and the adjacent cell gets a piece of information 
from another direction, and so on, and the bee cannot see things in between very 
well. So the uncertainty of visual acuity in the eye will surely correspond to an 
angle, the angle of the end of the ommatidium relative to the center of curvature of 
the eye. (The eye cells, of course, exist only at the surface of the sphere; inside 
that is the head of the bee.) This angle, from one ommatidium to the next, is, of 
course, the diameter of the ommatidia divided by the radius of the eye surface: 

A0 g = 8/r. (36.1) 



So, we may say, “The finer we make the 5, the more the visual acuity. So why 
doesn’t the bee just use very, very fine ommatidia?” Answer : We know enough 
d physics to realize that if we are trying to get light down into a narrow slot, we 

>/ cannot see accurately in a given direction because of the diffraction effect. The 

light that comes from several directions can enter and, due to diffraction, we will 
\ get light coming in at angle A d d such that 

AB d = \/8. (36.2) 


Fig. 36-8. Schematic view of pack- 
ing of ommatidia in the eye of a bee. 



Now we see that if we make the 8 too small, then each ommatidium does not 
look in only one direction, because of diffraction! If we make them too big, each 
one sees in a definite direction, but there are not enough of them to get a good view 
of the scene. So we adjust the distance d in order to make minimal the total effect 
of these two. If we add the two together, and find the place where the sum has a 
minimum (Fig. 36-9), we find that 

</(A0 g + A9 j) = 0 = I _ * ( ( 36 . 3 ) 

d8 r 8 l 

which gives us a distance 

5 = sf\r. (36.4) 


Fig. 36-9. The optimum size for an 
ommatidium is 5 m . 


If we guess that r is about 3 millimeters, take the light that the bee sees as 4000 
angstroms, and put the two together and take the square root, we find 

8 = (3 X 10“ 3 X 4 X l(T 7 ) 1/2 m 
= 3.5 X 10” 5 m = 3 5 fx. (36.5) 


The book says the diameter is 30 ju, so that is rather good agreement! So, apparently, 
it really works, and we can understand what determines the size of the bee’s eye! 
It is also easy to put the above number back in and find out how good the bee’s eye 
actually is in angular resolution; it is very poor relative to our own. We can see 
things that are thirty times smaller in apparent size than the bee; the bee has a 
rather fuzzy out-of-focus image relative to what we can see. Nevertheless it is all 
right, and it is the best they can do. We might ask why the bees do not develop 

36-8 



a good eye like our own, with a lens and so on. There are several interesting rea- 
sons. In the first place, the bee is too small; if it had an eye like ours, but on his 
scale, the opening would be about 30 p in size and diffraction would be so impor- 
tant that it would not be able to see very well anyway. The eye is not good if it is 
too small. Secondly, if it were as big as the bee’s head, then the eye would occupy 
the whole head of the bee. The beauty of the compound eye is that it takes up no 
space, it is just a very thin layer on the surface of the bee. So when we argue that 
they should have done it our way, we must remember that they had their own 
problems! 


36-5 Other eyes 

Besides the bees, many other animals can see color. Fish, butterflies, birds, 
and reptiles can see color, but it is believed that most mammals cannot. The 
primates can see color. The birds certainly see color, and that accounts for the 
colors of birds. There would be no point in having such brilliantly colored males 
if the females could not notice it! That is, the evolution of the sexual “whatever 
it is” that the birds have is a result of the female being able to see color. So next 
time we look at a peacock and think of what a brilliant display of gorgeous color 
it is, and how delicate all the colors are, and what a wonderful aesthetic sense it 
takes to appreciate all that, we should not compliment the peacock, but should 
compliment the visual acuity and aesthetic sense of the pea hen, because that is 
what has generated the beautiful scene! 

All invertebrates have poorly developed eyes or compound eyes, but all the 
vertebrates have eyes very similar to our own, with one exception. If we consider 
the highest form of animal, we usually say, “Here we are!,” but if we take a less 
prejudiced point of view and restrict ourselves to the invertebrates, so that we 
cannot include ourselves, and ask what is the highest invertebrate animal, most 
zoologists agree that the octopus is the highest animal! It is very interesting that, 
besides the development of its brain and its reactions and so on, which are rather 
good for an invertebrate, it has also developed, independently, a different eye. It 
is not a compound eye or an eye spot — it has a cornea, it has lids, it has an iris, 
it has a lens, it has two regions of water, it has a retina behind. It is essentially the 
same as the eye of the vertebrates! It is a remarkable example of a coincidence in 
evolution where nature has twice discovered the same solution to a problem, with 
one slight improvement. In the octopus it also turns out, amazingly, that the retina 
is a piece of the brain that has come out in the same way in its embryonic develop- 
ment as is true for vertebrates, but the interesting thing which is different is that the 
cells which are sensitive to light are on the inside , and the cells which do the cal- 
culation are in back of them, rather than “inside out,” as in our eye. So we see, 
at least, that there is no good reason for its being inside out. The other time nature 
tried it, she got it straightened out! (See Fig. 36-10.) The biggest eyes in the world 
are those of the giant squid ; they have been found up to 15 inches in diameter! 



Fig. 36-10. The eye of an octopus. 


36-6 Neurology of vision 

One of the main points of our subject is the interconnection of information 
from one part of the eye to the other. Let us consider the compound eye of the 
horseshoe crab, on which considerable experimentation has been done. First of 
all, we must appreciate what kind of information can come along nerves. A nerve 
carries a kind of disturbance which has an electrical effect that is easy to detect, a 
kind of wavelike disturbance which runs down the nerve and produces an effect 
at the other end: a long piece of the nerve cell, called the axon, carries the infor- 
mation along, and a certain kind of impulse, called a “spike,” goes along if it is 
excited at one end. When one spike goes down the nerve, another cannot immedi- 
ately follow. All the spikes are of the same size, so it is not that we get higher 
spikes when the thing is more strongly excited, but that we get more spikes per 
second. The size of the spike is determined by the fiber. It is important to appreci- 
ate this in order to see what happens next. 
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Fig. 36 11. The compound eye of the horseshoe crab, (a) Normal view, (b) Cross section. 

Figures 36-7, 1 1, 12, 13 reprinted with permission from Goldsmith, Sensory Communications, W. A. Rosenblith, ed. Copyright 196 1, Massachusetts Institute of Technology. 
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Figure 36-1 1(a) shows the compound eye of the horseshoe crab; it is not very 
much of an eye, it has only about a thousand ommatidia. Figure 36-1 1(b) is a 
cross section through the system; one can see the ommatidia, with the nerve fibers 
that run out of them and go into the brain. But note that even in a horseshoe crab 
there are little interconnections. They are much less elaborate than in the human 
eye, and it gives us a chance to study a simpler example. 

Let us now look at the experiments which have been done by putting fine 
electrodes into the optic nerve of the horseshoe crab, and shining light on only one 
of the ommatidia, which is easy to do with lenses. If we turn a light on at some 
instant t 0i and measure the electrical pulses that come out, we find that there is a 
slight delay and then a rapid series of discharges which gradually slow down to a 
uniform rate, as shown in Fig. 36-12(a). When the light goes out, the discharge 
stops. Now it is very interesting that if, while our amplifier is connected to this 
same nerve fiber, we shine light on a different ommatidium nothing happens; 
no signal. 




Fig. 36-1 2. The response to light of 
the nerve fibers of the eye of the horse- 
shoe crab. 


Now we do another experiment: we shine the light on the original ommatidium 
and get the same response, but if we now turn light on another one nearby as well, 
the pulses are interrupted briefly and then run at a much lower rate (Fig. 36-1 2b). 
The rate of one is inhibited by the impulses which are coming out of the other! 
In other words, each nerve fiber carries the information from one ommatidium, 
but the amount that it carries is inhibited by the signals from the others. So, for 
example, if the whole eye is more or less uniformly illuminated, the information 
coming from any one ommatidium will be relatively weak, because it is inhibited 
by so many. In fact the inhibition is additive— if we shine light on several nearby 
ommatidia the inhibition is very great. The inhibition is greater when the om- 
matidia are closer, and if the ommatidia are far enough away from one another, 
inhibition is practically zero. So it is additive and depends on the distance; here is 
a first example of information from different parts of the eye being combined in 
the eye itself. We can see, perhaps, if we think about it awhile, that this is a device 
to enhance contrast at the edges of objects, because if a part of the scene is light 
and a part is black, then the ommatidia in the lighted area give impulses that are 
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inhibited by all the other light in the neighborhood, so it is relatively weak. On 
the other hand, an ommatidium at the boundary which is given a “white” impulse 
is also inhibited by others in the neighborhood, but there are not as many of them, 
since some are black; the net signal is therefore stronger. The result would be a 
curve, something like that of Fig. 36-13. The crab will see an enhancement of 
the contour. 

The fact that there is an enhancement of contours has long been known; in 
fact it is a remarkable thing that has been commented on by psychologists many 
times. In order to draw an object, we have only to draw its outline. How used 
we are to looking at pictures that have only the outline! What is the outline? The 
outline is only the edge difference between light and dark or one color and another. 
It is not something definite. It is not, believe it or not, that every object has a line 
around it! There is no such line. It is only in our own psychological makeup that 
there is a line; we are beginning to understand the reasons why the “line” is enough 
of a clue to get the whole thing. Presumably our own eye works in some similar 
manner — much more complicated, but similar. 

Finally, we shall briefly describe the more elaborate work, the beautiful, 
advanced work that has been done on the frog. Doing a corresponding experiment 
on a frog, by putting very fine, beautifully built needlelike probes into the optic 
nerve of a frog, one can obtain the signals that are going along one particular 
axon and, just as in the case of the horseshoe crab, we find that the information 
does not depend on just one spot in the eye, but is a sum of information over 
several spots. 

The most recent picture of the operation of the frog’s eye is the following. 
One can find four different kinds of optic nerve fibers, in the sense that there are 
four different kinds of responses. These experiments were not done by shining 
on-and-off impulses of light, because that is not what a frog sees. A frog just sits 
there and his eyes never move, unless the lily pad is flopping back and forth, and 
in that case his eyes wobble just right so that the image stays put. He does not 
turn his eyes. If anything moves in his field of vision, like a little bug (he has to 
be able to see something small moving in the fixed background), it turns out that 
there are four different kinds of fibers which discharge, whose properties are sum- 
marized in Table 36-1. Sustained edge detection, nonerasable, means that if we 
bring an object with an edge into the field of view of the frog, then there are a lot 
of impulses in this particular fiber while the object is moving, but they die down to 
a sustained impulse that continues as long as the edge is there, even if it is standing 
still. If we turn out the light, the impulses stop. If we turn it on again while the 
edge is still in view, they start again. They are not erasable. Another kind of fiber 
is very similar, except that if the edge is straight it does not work. It must be a 
convex edge with dark behind it! How complicated must be the system of inter- 
connections in the retina of the eye of the frog in order for it to understand that a 
convex surface has moved in! Furthermore, although this fiber does sustain some- 
what, it does not sustain as long as the other, and if we turn out the light and turn 
it on again it does not build up again. It depends on the moving in of the convex 


— Response of Onmotldlum 

Illumination 


Fig. 36-13. The net response of 
horseshoe crab ommatidia near a sharp 
change in illumination. 


Table 36-1 

Types of response in optic nerve fibers of a frog 


Type 

Speed 

Angular 

field 

1 . Sustained edge detection (nonerasable) 

0.2-0. 5 m/sec 

1° 

2. Convex edge detection (erasable) 

0.5 m/sec 

o 

m 

1 

o 

<N 

3. Changing contrast detection 

1-2 m/sec 

7°-10° 

4. Dimming detection 

Up to \ m/sec 

Up to 15° 

5. Darkness detection 

? 

Very large 
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surface. The eye sees it move in and remembers that it is there, but if we merely 
turn out the light for a moment, it simply forgets it and no longer sees it. 

Another example is change-in-contrast detection. If there is an edge moving 
in or out there are pulses, but if the thing stands still there are no pulses at all. 

Then there is a dimming detector. If the light intensity is going down it 
creates pulses, but if it stays down or stays up, the impulse stops; it only works 
while the light is dimming. 

Then, finally, there are a few fibers which are dark detectors — a most amazing 
thing — they fire all the time! If we increase the light, they fire less rapidly, but all 
the time. If we decrease the light, they fire more rapidly, all the time. In the dark 
they fire like mad, perpetually saying, “It is dark! It is dark! It is dark!” 

Now these responses seem to be rather complicated to classify, and we might 
wonder whether perhaps the experiments are being misinterpreted. But it is very 
interesting that these same classes are very clearly separated in the anatomy of the 
frog! By other measurements, after these responses had been classified (< afterwards , 
that is what is important about this), it was discovered that the speed of the signals 
on the different fibers was not the same, so here was another, independent way to 
check which kind of a fiber we have found! 

Another interesting question is from how big an area is one particular fiber 
making its calculations? The answer is different for the different classes. 

Figure 36-14 shows the surface of the so-called tectum of a frog, where the 
nerves come into the brain from the optic nerve. All the nerve fibers coming in 
from the optic nerve make connections in various layers of the tectum. This layered 
structure is analogous to the retina; that is partly why we know that the brain and 
retina are very similar. Now, by taking an electrode and moving it down in suc- 
cession through the layers, we can find out which kinds of optic nerves end where, 
and the beautiful and wonderful result is that the different kinds of fibers end in 
different layers! The first ones end in number 1 type, the second in number 2, 
the threes and fives end in the same place, and deepest of all is number four. 
(What a coincidence, they got the numbers almost in the right order! No, that is 
why they numbered them that way, the first paper had the numbers in a different 
order!) 

We may briefly summarize what we have just learned this way: There are three 
pigments, presumably. There may be many different kinds of receptor cells con- 
taining the three pigments in different proportions, but there are many cross 
connections which may permit additions and subtractions through addition and 
reinforcement in the nervous system. So before we really understand color vision, 
we will have to understand the final sensation. This subject is still an open one, but 
these researches with microelectrodes and so on will perhaps ultimately give us 
more information on how we see color. 
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37 


Quantum Behavior 


37-1 Atomic mechanics 

In the last few chapters we have treated the essential ideas necessary for an 
understanding of most of the important phenomena of light — or electromagnetic 
radiation in general. (We have left a few special topics for next year. Specifically, 
the theory of the index of dense materials and total internal reflection.) What we 
have dealt with is called the “classical theory” of electric waves, which turns out 
to be a completely adequate description of nature for a large number of effects. 
We have not had to worry yet about the fact that light energy comes in lumps or 
“photons.” 

We would like to take up as our next subject the problem of the behavior of 
relatively large pieces of matter — their mechanical and thermal properties, for 
instance. In discussing these, we will find that the “classical” (or older) theory 
fails almost immediately, because matter is really made up of atomic-sized par- 
ticles. Still, we will deal only with the classical part, because that is the only part 
that we can understand using the classical mechanics we have been learning. But 
we shall not be very successful. We shall find that in the case of matter, unlike the 
case of light, we shall be in difficulty relatively soon. We could, of course, con- 
tinuously skirt away from the atomic effects, but we shall instead interpose here a 
short excursion in which we will describe the basic ideas of the quantum properties 
of matter, i.e., the quantum ideas of atomic physics, so that you will have some 
feeling for what it is we are leaving out. For we will have to leave out some im- 
portant subjects that we cannot avoid coming close to. 

So we will give now the introduction to the subject of quantum mechanics, 
but will not be able actually to get into the subject until much later. 

“Quantum mechanics” is the description of the behavior of matter in all its 
details and, in particular, of the happenings on an atomic scale. Things on a very 
small scale behave like nothing that you have any direct experience about. They 
do not behave like waves, they do not behave like particles, they do not behave 
like clouds, or billiard balls, or weights on springs, or like anything that you 
have ever seen. 

Newton thought that light was made up of particles, but then it was discovered, 
as we have seen here, that it behaves like a wave. Later, however (in the beginning 
of the twentieth century) it was found that light did indeed sometimes behave like 
a particle. Historically, the electron, for example, was thought to behave like a 
particle, and then it was found that in many respects it behaved like a wave. So it 
really behaves like neither. Now we have given up. We say: “It is like neither .” 

There is one lucky break, however — electrons behave just like light. The 
quantum behavior of atomic objects (electrons, protons, neutrons, photons, and 
so on) is the same for all, they are all “particle waves,” or whatever you want to 
call them. So what we learn about the properties of electrons (which we shall use 
for our examples) will apply also to all “particles,” including photons of light. 

The gradual accumulation of information about atomic and small-scale be- 
havior during the first quarter of this century, which gave some indications about 
how small things do behave, produced an increasing confusion which was finally 
resolved in 1926 and 1927 by Schrodinger, Heisenberg, and Born. They finally 
obtained a consistent description of the behavior of matter on a small scale. We 
take up the main features of that description in this chapter. 

Because atomic behavior is so unlike ordinary experience, it is very difficult 
to get used to and it appears peculiar and mysterious to everyone, both to the 
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novice and to the experienced physicist. Even the experts do not understand it 
the way they would like to, and it is perfectly reasonable that they should not, 
because all of direct, human experience and of human intuition applies to large 
objects. We know how large objects will act, but things on a small scale just do 
not act that way. So we have to learn about them in a sort of abstract or imagi- 
native fashion and not by connection with our direct experience. 

In this chapter we shall tackle immediately the basic element of the mysterious 
behavior in its most strange form. We choose to examine a phenomenon which is 
impossible, absolutely impossible, to explain in any classical way, and which has 
in it the heart of quantum mechanics. In reality, it contains the only mystery. 
We cannot explain the mystery in the sense of “explaining” how it works. We will 
tell you how it works. In telling you how it works we will have told you about the 
basic peculiarities of all quantum mechanics. 

37-2 An experiment with bullets 

To try to understand the quantum behavior of electrons, we shall compare 
and contrast their behavior, in a particular experimental setup, with the more 
familiar behavior of particles like bullets, and with the behavior of waves like 
water waves. We consider first the behavior of bullets in the experimental setup 
shown diagrammatically in Fig. 37-1. We have a machine gun that shoots a stream 
of bullets. It is not a very good gun, in that it sprays the bullets (randomly) over a 
fairly large angular spread, as indicated in the figure. In front of the gun we have 
a wall (made of armor plate) that has in it two holes just about big enough to let a 
bullet through. Beyond the wall is a backstop (say a thick wall of wood) which will 
“absorb” the bullets when they hit it. In front of the wall we have an object which 
we shall call a “detector” of bullets. It might be a box containing sand. Any bullet 
that enters the detector will be stopped and accumulated. When we wish, we can 
empty the box and count the number of bullets that have been caught. The 
detector can be moved back and forth (in what we will call the x-direction). With 
this apparatus, we can find out experimentally the answer to the question: “What 
is the probability that a bullet which passes through the holes in the wall will 
arrive at the backstop at the distance x from the center?” First, you should 
realize that we should talk about probability, because we cannot say definitely 
where any particular bullet will go. A bullet which happens to hit one of the holes 
may bounce off the edges of the hole, and may end up anywhere at all. By “prob- 
ability” we mean the chance that the bullet will arrive at the detector, which we can 
measure by counting the number which arrive at the detector in a certain time and 
then taking the ratio of this number to the total number that hit the backstop during 
that time. Or, if we assume that the gun always shoots at the same rate during the 
measurements, the probability we want is just proportional to the number that 
reach the detector in some standard time interval. 

For our present purposes we would like to imagine a somewhat idealized 
experiment in which the bullets are not real bullets, but are indestructible bullets — 
they cannot break in half. In our experiment we find that bullets always arrive in 
lumps, and when we find something in the detector, it is always one whole bullet. 
If the rate at which the machine gun fires is made very low, we find that at any given 
moment either nothing arrives, or one and only one — exactly one — bullet arrives 
at the backstop. Also, the size of the lump certainly does not depend on the rate 
of firing of the gun. We shall say: “Bullets always arrive in identical lumps.” What 
we measure with our detector is the probability of arrival of a lump. And we meas- 
ure the probability as a function of x. The result of such measurements with this 
apparatus (we have not yet done the experiment, so we are really imagining the 
result) are plotted in the graph drawn in part (c) of Fig. 37-1. In the graph we plot 
the probability to the right and x vertically, so that the x-scale fits the diagram of 
the apparatus. We call the probability P 12 because the bullets may have come 
either through hole 1 or through hole 2. You will not be surprised that P i2 is 
large near the middle of the graph but gets small if x is very large. You may 
wonder, however, why P l2 has its maximum value at x = 0. We can understand 
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this fact if we do our experiment again after covering up hole 2, and once more 
while covering up hole 1. When hole 2 is covered, bullets can pass only through 
hole 1, and we get the curve marked P x in part (b) of the figure. As you would 
expect, the maximum of Pi occurs at the value of x which is on a straight line with 
the gun and hole 1. When hole 1 is closed, we get the symmetric curve P 2 drawn 
in the figure. P 2 is the probability distribution for bullets that pass through hole 
2. Comparing parts (b) and (c) of Fig. 37-1, we find the important result that 

P 12 = Pi + P 2 * (37.1) 


The probabilities just add together. The effect with both holes open is the sum of 
the effects with each hole open alone. We shall call this result an observation of 
“no interference ,” for a reason that you will see later. So much for bullets. They 
come in lumps, and their probability of arrival shows no interference. 


Fig. 37-2. Interference experiment 
with water waves. 
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37-3 An experiment with waves 

Now we wish to consider an experiment with water waves. The apparatus is 
shown diagrammatically in Fig. 37-2. We have a shallow trough of water. A small 
object labeled the “wave source” is jiggled up and down by a motor and makes 
circular waves. To the right of the source we have again a wall with two holes, 
and beyond that is a second wall, which, to keep things simple, is an “absorber,” 
so that there is no reflection of the waves that arrive there. This can be done by 
building a gradual sand “beach.” In front of the beach we place a detector which 
can be moved back and forth in the x-direction, as before. The detector is now a 
device which measures the “intensity” of the wave motion. You can imagine a 
gadget which measures the height of the wave motion, but whose scale is calibrated 
in proportion to the square of the actual height, so that the reading is proportional 
to the intensity of the wave. Our detector reads, then, in proportion to the energy 
being carried by the wave — or rather, the rate at which energy is carried to the 
detector. 

With our wave apparatus, the first thing to notice is that the intensity can 
have any size. If the source just moves a very small amount, then there is just a 
little bit of wave motion at the detector. When there is more motion at the source, 
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there is more intensity at the detector. The intensity of the wave can have any 
value at all. We would not say that there was any “lumpiness” in the wave intensity. 

Now let us measure the wave intensity for various values of x (keeping the 
wave source operating always in the same way). We get the interestipg-looking 
mrve marked I x 2 in part (c) of the figure. 

We have already worked out how such patterns can come about when we 
studied the interference of electric waves. In this case we would observe that the 
original wave is diffracted at the holes, and new circular waves spread out from each 
hole. If we cover one hole at a time and measure the intensity distribution at the 
absorber we find the rather simple intensity curves shown in part (b) of the figure. 
1 1 is the intensity of the wave from hole 1 (which we find by measuring when hole 
2 is blocked off) and / 2 is the intensity of the wave from hole 2 (seen when hole 
1 is blocked). 

The intensity I x 2 observed when both holes are open is certainly not the sum 
of / 1 and / 2 . We say that there is “interference” of the two waves. At some 
places (where the curve I x 2 has its maxima) the waves are “in phase” and the wave 
peaks add together to give a large amplitude and, therefore, a large intensity. We 
say that the two waves are “interfering constructively” at such places. There will 
be such constructive interference wherever the distance from the detector to one 
hole is a whole number of wavelengths larger (or shorter) than the distance from 
the detector to the other hole. 

At those places where the two waves arrive at the detector with a phase differ- 
ence of 7r (where they are “out of phase”) the resulting wave motion at the detector 
will be the difference of the two amplitudes. The waves “interfere destructively,” 
and we get a low value for the wave intensity. We expect such low values wherever 
the distance between hole 1 and the detector is different from the distance between 
hole 2 and the detector by an odd number of half-wavelengths. The low values of 
1 1 2 in Fig. 37-2 correspond to the places where the two waves interfere destructively. 

You will remember that the quantitative relationship between I u / 2 , and / 12 
can be expressed in the following way: The instantaneous height of the water wave 
at the detector for the wave from hole 1 can be written as (the real part of) h x e iw \ 
where the “amplitude” h x is, in general, a complex number. The intensity is 
proportional to the mean squared height or, when we use the complex numbers, 
to |£i| 2 . Similarly, for hole 2 the height is h 2 e lo>t and the intensity is proportional 
to \h 2 \ 2 . When both holes are open, the wave heights add to give the height 
(hi + h 2 )e l<,lt and the intensity \h x + h 2 1 2 . Omitting the constant of proportion- 
ality for our present purposes, the proper relations for interfering waves are 

h = \hi\ 2 , h = \h 2 1 2 , h 2 = \hx + h 2 1 2 . (37.2) 

You will notice that the result is quite different from that obtained with bullets 
(Eq. 37.1). If we expand \h x + h 2 \ 2 we see that 

\hx + h 2 1 2 = |6i| 2 + \h 2 \ 2 + 2|^ 1 ||^ a | cos 5, (37.3) 

where 5 is the phase difference between h x and h 2 . In terms of the intensities, we 
could write 

I 12 ~ I\ I 2 Is/hh cos 5. (3 7.4) 

The last term in (37.4) is the “interference term.” So much for water waves. The 
intensity can have any value, and it shows inteference. 

37-4 An experiment with electrons 

Now we imagine a similar experiment with electrons. It is shown diagram- 
matically in Fig. 37^3. We make an electron gun which consists of a tungsten wire 
heated by an electric current and surrounded by a metal box with a hole in it. If 
the wire is $t a negative voltage with respect to the box, electrons emitted by the 
wire will be accelerated toward the walls and some will pass through the hole. 
All the electrons which come out of the gun will have (nearly) the same energy. 
In front of the gun is again a wall (just a thin metal plate) with two holes in it. 
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Beyond the wall is another plate which will serve as a “backstop.'” In front of the 
backstop we place a movable detector. The detector might be a geiger counter or, 
perhaps better, an electron multiplier, which is connected to a loudspeaker. 

We should say right away that you should not try to set up this experiment 
(as you could have done with the two we have already described). This experiment 
has never been done in just this way. The trouble is that the apparatus would have 
to be made on an impossibly small scale to show the effects we are interested in. 
We are doing a “thought experiment,” which we have chosen because it is easy to 
think about. We know the results that would be obtained because there are many 
experiments that have been done, in which the scale and the proportions have 
been chosen to show the effects we shall describe. 

The first thing we notice with our electron experiment is that we hear sharp 
“clicks” from the detector (that is, from the loudspeaker). And all “clicks” are 
the same. There are no “half-clicks.” 

We would also notice that the “clicks” come very erratically. Something like: 

click click-click . . . click click .... click-click click . . . , 

etc., just as you have, no doubt, heard a geiger counter operating. If we count 
the clicks which arrive in a sufficiently long time — say for many minutes and 
then count again for another equal period, we find that the two numbers are very 
nearly the same. So we can speak of the average rate at which the clicks are heard 
(so-and-so-many clicks per minute on the average). 

As we move the detector around, the rate at which the clicks appear is faster 
or slower, but the size (loudness) of each click is always the same. If we lower the 
temperature of the wire in the gun the rate of clicking slows down, but still each 
click sounds the same. We would notice also that if we put two separate detectors 
at the backstop, one or the other would click, but never both at once. (Except that 
once in a while, if there were two clicks very close together in time, our ear might 
not sense the separation.) We conclude, therefore, that whatever arrives at the 
backstop arrives in “lumps.” All the “lumps” are the same size: only whole 
“lumps” arrive, and they arrive one at a time at the backstop. We shall say: 
“Electrons always arrive in identical lumps.” 

Just as for our experiment with bullets, we can now proceed to find experi- 
mentally the answer to the question: “What is the relative probability that an 
electron ‘lump’ will arrive at the backstop at various distances x from the center?” 
As before, we obtain the relative probability by observing the rate of clicks, holding 
the operation of the gun constant. The probability that lumps will arrive at a 
particular x is proportional to the average rate of clicks at that x. 

The result of our experiment is the interesting curve marked P 12 in part (c) 
of Fig. 37-3. Yes! That is the way electrons go. 


37-5 The interference of electron waves 

Now let us try to analyze the curve of Fig. 37-3 to see whether we can under- 
stand the behavior of the electrons. The first thing we would say is that since they 
come in lumps, each lump, which we may as well call an electron, has come either 
through hole 1 or through hole 2. Let us write this in the form of a “Proposition : 
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Proposition A: Each electron either goes through hole 1 or it goes through 
hole 2. 

Assuming Proposition A, all electrons that arrive at the backstop can be di- 
vided into two classes: (1) those that come through hole 1, and (2) those that come 
through hole 2. So our observed curve must be the sum of the effects of the elec- 
trons which come through hole 1 and the electrons which come through hole 2. 
Let us check this idea by experiment. First, we will make a measurement for those 
electrons that come through hole 1. We block off hole 2 and make our counts of 
the clicks from the detector. From the clicking rate, we get P x . The result of the 
measurement is shown by the curve marked P x in part (b) of Fig. 37-3. The result 
seems quite reasonable. In a similar way, we measure P 2} the probability distribu- 
tion for the electrons that come through hole 2. The result of this measurement 
is also drawn in the figure. 

The result P X2 obtained with both holes open is clearly not the sum of P x and 

the probabilities for each hole alone. In analogy with our water-wave experi- 
ment, we say: “There is interference.” 

For electrons: P X2 ^ P x + P 2 . (37.5) 

How can such an interference come about? Perhaps we should say: “Well, 
that means, presumably, that it is not true that the lumps go either through hole 
1 or hole 2, because if they did, the probabilities should add. Perhaps they go in a 
more complicated way. They split in half and . . .” But no! They cannot, they 
always arrive in lumps . . . “Well, perhaps some of them go through 1, and then 
they go around through 2, and then around a few more times, or by some other 
complicated path . . . then by closing hole 2, we changed the chance that an elec- 
tron that started out through hole 1 would finally get to the backstop ...” But 
notice! There are some points at which very few electrons arrive when both holes 
are open, but which receive many electrons if we close one hole, so closing one 
hole increased the number from the other. Notice, however, that at the center 
of the pattern, P x 2 is more than twice as large as Pi + P 2 . It is as though closing 
one hole decreased the number of electrons which come through the other hole. 
It seems hard to explain both effects by proposing that the electrons travel in 
complicated paths. 

It is all quite mysterious. And the more you look at it the more mysterious 
it seems. Many ideas have been concocted to try to explain the curve for Pi 2 in 
terms of individual electrons going around in complicated ways through the holes. 
None of them has succeeded. None of them can get the right curve for P x 2 in 
terms of Pi and P 2 . 

Yet, surprisingly enough, the mathematics for relating P x and P 2 to P X2 is 
extremely simple. For P X2 is just like the curve I 12 of Fig. 37-2, and that was 
simple. What is going on at the backstop can be described by two complex numbers 
that we can call <t> x and 4> 2 (they are functions of x, of course). The absolute square 
of <f> i gives the effect with only hole 1 open. That is, P x = \(j> x \ 2 . The effect with 
only hole 2 open is given by $ 2 in the same way. That is, P 2 = \4> 2 \ 2 . And the 
combined effect of the two holes is just P X2 = |^i + 0 2 | 2 . The mathematics 
is the same as that we had for the water waves! (It is hard to see how one could 
get such a simple result from a complicated game of electrons going back and forth 
through the plate on some strange trajectory.) 

We conclude the following: The electrons arrive in lumps, like particles, and 
the probability of arrival of these lumps is distributed like the distribution of 
intensity of a wave. It is in this sense that an electron behaves “sometimes like a 
particle and sometimes like a wave.” 

Incidentally, when we were dealing with classical waves we defined the in- 
tensity as the mean over time of the square of the wave amplitude, and we used 
complex numbers as a mathematical trick to simplify the analysis. But in quantum 
mechanics it turns out that the amplitudes must be represented by complex num- 
bers. The real parts alone will not do. That is a technical point, for the moment, 
because the formulas look just the same. 

37-6 



Since the probability of arrival through both holes is given so simply, although 
it is not equal to (Pi + P 2 ), that is really all there is to say. But there are a large 
number of subtleties involved in the fact that nature does work this way. We 
would like to illustrate some of these subtleties for you now. First, since the num- 
ber that arrives at a particular point is not equal to the number that arrives through 
1 plus the number that arrives through 2, as we would have concluded from 
Proposition A, undoubtedly we should conclude that Proposition A is false. It is 
not true that the electrons go either through hole 1 or hole 2. But that conclusion 
can be tested by another experiment. 

37-6 Watching the electrons 

We shall now try the following experiment. To our electron apparatus we 
add a very strong light source, placed behind the wall and between the two holes, 
as shown in Fig. 37-4. We know that electric charges scatter light. So when an 
electron passes, however it does pass, on its way to the detector, it will scatter some 
light to our eye, and we can see where the electron goes. If, for instance, an electron 
were to take the path via hole 2 that is sketched in Fig. 37-4, we should see a flash 
of light coming from the vicinity of the place marked A in the figure. If an electron 
passes through hole 1 we would expect to see a flash from the vicinity of the upper 
hole. If it should happen that we get light from both places at the same time, 
because the electron divides in half ... Let us just do the experiment! 
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Fig. 37-4. A different electron ex- 
periment. 


(a) 


Here is what we see: every time that we hear a “click” from our electron de- 
tector (at the backstop), we also see a flash of light either near hole 1 or near hole 
2, but never both at once! And we observe the same result no matter where we put 
the detector. From this observation we conclude that when we look at the electrons 
we find that the electrons go either through one hole or the other. Experimentally, 
Proposition A is necessarily true. 

What, then, is wrong with our argument against Proposition A? Why isn't 
p 12 just equal to P\ + P 2 ? Back to experiment! Let us keep track of the electrons 
and find out what they are doing. For each position (x-location) of the detector 
we will count the electrons that arrive and also keep track of which hole they went 
through, by watching for the flashes. We can keep track of things this way: 
whenever we hear a “click” we will put a count in Column 1 if we see the flash near 
hole 1, and if we see the flash near hole 2, we will record a count in Column 2. 
Every electron which arrives is recorded in one of two classes: those which come 
through 1 and those which come through 2. From the number recorded in Column 
1 we get the probability P[ that an electron will arrive at the detector via hole 1 ; 
and from the number recorded in Column 2 we get P 2i the probability that an 
electron will arrive at the detector via hole 2. If we now repeat such a measurement 
for many values of x, we get the curves for P[ and P f 2 shown in part (b) of Fig. 37-4. 

Well, that is not too surprising! We get for P[ something quite similar to 
what we got before for Pi by blocking off hole 2; and P 2 is similar to what we got 
by blocking hole 1. So there is not any complicated business like going through 
both holes. When we watch them, the electrons come through just as we would 
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expect them to come through. Whether the holes are closed or open, those which 
we see come through hole 1 are distributed in the same way whether hole 2 is open 
or closed. 

But wait! What do we have now for the total probability, the probability that 
an electron will arrive at the detector by any route? We already have that informa- 
tion. We just pretend that we never looked at the light flashes, and we lump to- 
gether the detector clicks which we have separated into the two columns. We 
must just add the numbers. For the probability that an electron will arrive at the 
backstop by passing through either hole, we do find P[ 2 = P\ + P 2 . That is, 
although we succeeded in watching which hole our electrons come through, we 
no longer get the old interference curve P 12 , but a new one, P[ 2 , showing no 
interference! If we turn out the light P x 2 is restored. 

We must conclude that when we look at the electrons the distribution of them 
on the screen is different than when we do not look. Perhaps it is turning on our 
light source that disturbs things? It must be that the electrons are very delicate, 
and the light, when it scatters off the electrons, gives them a jolt that changes their 
motion. We know that the electric field of the light acting on a charge will exert 
a force on it. So perhaps we should expect the motion to be changed. Anyway, 
the light exerts a big influence on the electrons. By trying to “watch” the electrons 
we have changed their motions. That is, the jolt given to the electron when the 
photon is scattered by it is such as to change the electron’s motion enough so that 
if it might have gone to where P 12 was at a maximum it will instead land where 
P 12 was a minimum; that is why we no longer see the wavy interference effects. 

You may be thinking: “Don’t use such a bright source! Turn the brightness 
down! The light waves will then be weaker and will not disturb the electrons so 
much. Surely, by making the light dimmer and dimmer, eventually the wave 
will be weak enough that it will have a negligible effect.” O.K. Let’s try it. The 
first thing we observe is that the flashes of light scattered from the electrons as 
they pass by does not get weaker. It is always the same-sized flash. The only „hing 
that happens as the light is made dimmer is that sometimes we hear a “click” 
from the detector but see no flash at all. The electron has gone by without being 
“seen.” What we are observing is that light also acts like electrons, we knew that 
it was “wavy,” but now we find that it is also “lumpy.” It always arrives — or is 
scattered— in lumps that we call “photons.” As we turn down the intensity of 
the light source we do not change the size of the photons, only the rate at which 
they are emitted. That explains why, when our source is dim, some electrons get 
by without being seen. There did not happen to be a photon around at the time 
the electron went through. 

This is all a little discouraging. If it is true that whenever we “see” the electron 
we see the same-sized flash, then those electrons we see are always the disturbed 
ones. Let us try the experiment with a dim light anyway. Now whenever we hear 
a click in the detector we will keep a count in three columns: in Column (1) those 
electrons seen by hole 1, in Column (2) those electrons seen by hole 2, and in 
Column (3) those electrons not seen at all. When we work up our data (computing 
the probabilities) we find these results: Those “seen by hole 1” have a distribution 
like P[ \ those “seen by hole 2” have a distribution like P 2 (so that those “seen by 
either hole 1 or 2 have a distribution like P[ 2 ); and those “not seen at all” have a 
“wavy” distribution just like P 12 of Fig. 37-3! If the electrons are not seen , we 
have interference ! 

That is understandable. When we do not see the electron, no photon disturbs 
it, and when we do see it, a photon has disturbed it. There is always the same 
amount of disturbance because the light photons all produce the same-sized effects 
and the effect of the photons being scattered is enough to smear out any inter- 
ference effect. 

Is there not some way we can see the electrons without disturbing them? 
We learned in an earlier chapter that the momentum carried by a “photon” 
is inversely proportional to its wavelength ( p = h/\). Certainly the jolt given 
to the electron when the photon is scattered toward our eye depends on the 
momentum that photon carries. Aha! If we want to disturb the electrons only 
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slightly we should not have lowered the intensity of the light, we should have 
lowered its frequency (the same as increasing its wavelength). Let us use light of 
a redder color. We could even use infrared light, or radiowaves (like radar), and 
“see” where the electron went with the help of some equipment that can “see” 
light of these longer wavelengths. If we use “gentler” light perhaps we can avoid 
disturbing the electrons so much. 

Let us try the experiment with longer waves. We shall keep repeating our ex- 
periment, each time with light of a longer wavelength. At first, nothing seems to 
change. The results are the same. Then a terrible thing happens. You remember 
that when we discussed the microscope we pointed out that, due to the wave nature 
of the light, there is a limitation on how close two spots can be and still be seen 
as two separate spots. This distance is of the order of the wavelength of light. So 
now, when we make the wavelength longer than the distance between our holes, 
we see a big fuzzy flash when the light is scattered by the electrons. We can no 
longer tell which hole the electron went through! We just know it went somewhere! 
And it is just with light of this color that we find that the jolts given to the electron 
are small enough so that P[ 2 begins to look like P x2 — that we begin to get some 
interference effect. And it is only for wavelengths much longer than the separation 
of the two holes (when we have no chance at all of telling where the electron went) 
that the disturbance due to the light gets sufficiently small that we again get the 
curve Pi 2 shown in Fig. 37-3. 

In our experiment we find that it is impossible to arrange the light in such a 
way that one can tell which hole the electron went through, and at the same time 
not disturb the pattern. It was suggested by Heisenberg that the then new laws of 
nature could only be consistent if there were some basic limitation on our experi- 
mental capabilities not previously recognized. He proposed, as a general principle, 
his uncertainty principle , which we can state in terms of our experiment as follows: 
“It is impossible to design an apparatus to determine which hole the electron passes 
through, that will not at the same time disturb the electrons enough to destroy the 
interference pattern.” If an apparatus is capable of determining which hole the elec- 
tron goes through, it cannot be so delicate that it does not disturb the pattern in 
an essential way. No one has ever found (or even thought of) a way around the 
uncertainty principle. So we must assume that it describes a basic characteristic 
of nature. 

The complete theory of quantum mechanics which we now use to describe 
atoms and, in fact, all matter depends on the correctness of the uncertainty prin- 
ciple. Since quantum mechanics is such a successful theory, our belief in the 
uncertainty principle is reinforced. But if a way to “beat” the uncertainty principle 
were ever discovered, quantum mechanics would give inconsistent results and 
would have to be discarded as a valid theory of nature. 

“Well,” you say, “what about Proposition A? It is true, or is it not true, 
that the electron either goes through hole 1 or it goes through hole 2?” The only 
answer that can be given is that we have found from experiment that there is a 
certain special way that we have to think in order that we do not get into incon- 
sistencies. What we must say (to avoid making wrong predictions) is the following. 
If one looks at the holes or, more accurately, if one has a piece of apparatus which 
is capable of determining whether the electrons go through hole 1 or hole 2, then 
one can say that it goes either through hole 1 or hole 2. But , when one does not 
try to tell which way the electron goes, when there is nothing in the experiment to 
disturb the electrons, then one may not say that an electron goes either through 
hole 1 or hole 2. If one does say that, and starts to make any deductions from the 
statement, he will make errors in the analysis. This is the logical tightrope on 
which we must walk if we wish to describe nature successfully. 


If the motion of all matter — as well as electrons — must be described in terms 
of waves, what about the bullets in our first experiment? Why didn’t we see an 
interference pattern there? It turns out that for the bullets the wavelengths were so 
tiny that the interference patterns became very fine. So fine, in fact, that with any 
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detector of finite size one could not distinguish the separate maxima and minima. 
What we saw was only a kind of average, which is the classical curve. In Fig. 37-5 
we have tried to indicate schematically what happens with large-scale objects. 
Part (a) of the figure shows the probability distribution one might predict for 
bullets, using quantum mechanics. The rapid wiggles are supposed to represent 
the interference pattern one gets for waves of very short wavelength. Any physical 
detector, however, straddles several wiggles of the probability curve, so that the 
measurements show the smooth curve drawn in part (b) of the figure. 

37-7 First principles of quantum mechanics 

We will now write a summary of the main conclusions of our experiments. 
We will, however, put the results in a form which makes them true for a general 
class of such experiments. We can write our summary more simply if we first 
define an “ideal experiment” as one in which there are no uncertain external 
influences, i.e., no jiggling or other things going on that we cannot take into ac- 
count. We would be quite precise if we said: “An ideal experiment is one in which 
all of the initial and final conditions of the experiment are completely specified.” 
What we will call “an event” is, in general, just a specific set of initial and final 
conditions. (For example: “an electron leaves the gun, arrives at the detector, and 
nothing else happens.”) Now for our summary. 

Summary 

(1) The probability of an event in an ideal experiment is given by the square of 
the absolute value of a complex number <j> which is called the probability 
amplitude. 

P — probability, 

<t> = probability amplitude, (37.6) 

p = w. 

(2) When an event can occur in several alternative ways, the probability ampli- 
tude for the event is the sum of the probability amplitudes for each way 
considered separately. There is interference. 

(j) = <f>l <t> 2 , 

p = 1*1 + </> 2 | 2 . (37.7) 

(3) If an experiment is performed which is capable of determining whether one or 
another alternative is actually taken, the probability of the event is the sum 
of the probabilities for each alternative. The interference is lost. 

P = Pi + P 2 - (37.8) 

One might still like to ask: “How does it work? What is the machinery behind 
the law?” No one has found any machinery behind the law. No one can “explain” 
any more than we have just “explained.” No one will give you any deeper repre- 
sentation of the situation. We have no ideas about a more basic mechanism from 
which these results can be deduced. 

We would like to emphasize a very important difference between classical and 
quantum mechanics . We have been talking about the probability that an electron 
will arrive in a given circumstance. We have implied that in our experimental 
arrangement (or even in the best possible one) it would be impossible to predict 
exactly what would happen. We can only predict the odds! This would mean, if 
it were true, that physics has given up on the problem of trying to predict exactly 
what will happen in a definite circumstance. Yes! physics has given up. We do 
not know how to predict what would happen in a given circumstance , and we believe 
now that it is impossible, that the only thing that can be predicted is the prob- 
ability of different events. It must be recognized that this is a retrenchment in our 
earlier ideal of understanding nature. It may be a backward step, but no one 
has seen a way to avoid it. 
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(a) (b) 


Fig. 37-5. Interference pattern with 
bullets: (a) actual (schematic), (b) ob- 
served. 



We make now a few remarks on a suggestion that has sometimes been made 
to try to avoid the description we have given : “Perhaps the electron has some kind 
of internal works — some inner variables — that we do not yet know about. Perhaps 
that is why we cannot predict what will happen. If we could look more closely at 
the electron we could be able to tell where it would end up.” So far as we know, 
that is impossible. We would still be in difficulty. Suppose we were to assume that 
inside the electron there is some kind of machinery that determines where it is 
going to end up. That machine must also determine which hole it is going to go 
through on its way. But we must not forget that what is inside the electron should 
not be dependent on what we do, and in particular upon whether we open or close 
one of the holes. So if an electron, before it starts, has already made up its mind 
(a) which hole it is going to use, and (b) where it is going to land, we should find 
Pi for those electrons that have chosen hole 1, P 2 for those that have chosen hole 
2, and necessarily the sum P x + P 2 for those that arrive through the two holes. 
There seems to be no way around this. But we have verified experimentally that 
that is not the case. And no one has figured a way out of this puzzle. So at the 
present time we must limit ourselves to computing probabilities. We say “at the 
present time,” but we suspect very strongly that it is something that will be with 
us forever — that it is impossible to beat that puzzle — that this is the way nature 
really is. 


37-8 The uncertainty principle 

This is the way Heisenberg stated the uncertainty principle originally: If you 
make the measurement on any object, and you can determine the ^-component of 
its momentum with an uncertainty A p, you cannot, at the sane time, know its 
x-position more accurately than Ax = h/Ap . The uncertainties in the position 
and momentum at any instant must have their product greater than Planck’s 
constant. This is a special case of the uncertainty principle that was stated above 
more generally. The more general statement was that one cannot design equipment 
in any way to determine which of two alternatives is taken, without, at the same 
time, destroying the pattern of interference. 

Let us show for one particular case that the kind of relation given by Heisen- 
berg must be true in order to keep from getting into trouble. We imagine a modifi- 
cation of the experiment of Fig. 37-3, in which the wall with the holes consists of a 
plate mounted on rollers so that it can move freely up and down (in the x-direction), 
as shown in Fig. 37-6. By watching the motion of the plate carefully we can try to 
tell which hole an electron goes through. Imagine what happens when the detector 
is placed at x = 0. We would expect that an electron which passes through hole 1 
must be deflected downward by the plate to reach the detector. Since the vertical 
component of the electron momentum is changed, the plate must recoil with an 
equal momentum in the opposite direction. The plate will get an upward kick. 
If the electron goes through the lower hole, the plate should feel a downward kick. 
It is clear that for every position of the detector, the momentum received by the 
plate will have a different value for a traversal via hole 1 than for a traversal via 
hole 2. So! Without disturbing the electrons at all , but just by watching the plate , 
we can tell which path the electron used. 

Now in order to do this it is necessary to know what the momentum of the 
screen is, before the electron goes through. So when we measure the momentum 
after the electron goes by, we can figure out how much the plate’s momentum has 
changed. But remember, according to the uncertainty principle we cannot at the 
same time know the position of the plate with an arbitrary accuracy. But if we do 
not know exactly where the plate is we cannot say precisely where the two holes are. 
They will be in a different place for every electron that goes through. This means 
that the center of our interference pattern will have a different location for each 
electron. The wiggles of the interference pattern will be smeared out. We shall show 
quantitatively in the next chapter that if we determine the momentum of the plate 
sufficiently accurately to determine from the recoil measurement which hole was 
used, then the uncertainty in the x-position of the plate will, according to the un- 
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Fig. 37-6. An experiment in which 
the recoil of the wall is measured. 



certainty principle, be enough to shift the pattern observed at the detector up and 
down in the x-direction about the distance from a maximum to its nearest minimum. 
Such a random shift is just enough to smear out the pattern so that no interference 
is observed. 

The uncertainty principle “protects” quantum mechanics. Heisenberg recog- 
nized that if it were possible to measure the momentum and the position simultane- 
ously with a greater accuracy, the quantum mechanics would collapse. So he 
proposed that jt must be impossible. Then people sat down and tried to figure out 
ways of doing it, and nobody could figure out a way to measure the position and 
the momentum of anything— a screen, an electron, a billiard ball, anything— with 
an y greater accuracy. Quantum mechanics maintains its perilous but accurate 
existence. 
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3a 


The Relation of Wave and 
Particle Viewpoints 


38-1 Probability wave amplitudes 

In this chapter we shall discuss the relationship of the wave and particle 
viewpoints. We already know, from the last chapter, that neither the wave view- 
point nor the particle viewpoint is correct. Usually we have tried to present things 
accurately, or at least precisely enough that they will not have to be changed when 
we learn more — it may be extended, but it will not be changed! But when we try 
to talk about the wave picture or the particle picture, both are approximate, and 
both will change. Therefore what we learn in this chapter will not be accurate in a 
certain sense ; it is a kind of half-intuitive argument that will be made more precise 
later, but certain things will be changed a little bit when we interpret them correctly 
in quantum mechanics. The reason for doing such a thing, of course, is that we 
are not going to go directly into quantum mechanics, but we want to have at least 
some idea of the kinds of effects that we will find. Furthermore, all our experiences 
are with waves and with particles, and so it is rather handy to use the wave and 
particle ideas to get some understanding of what happens in given circumstances 
before we know the complete mathematics of the quantum-mechanical amplitudes. 
We shall try to illustrate the weakest places as we go along, but most of it is very 
nearly correct — it is just a matter of interpretation. 

First of all, we know that the new way of representing the world in quantum 
mechanics — the new framework — is to give an amplitude for every event that can 
occur, and if the event involves the reception of one particle then we can give the 
amplitude to find that one particle at different places and at different times. The 
probability of finding the particle is then proportional to the absolute square of 
the amplitude. In general, the amplitude to find a particle in different places at 
different times varies with position and time. 

In a special case the amplitude varies sinusoidally in space and time like 
e i(u>t-k-r) not f or g e ^ that these amplitudes are complex numbers, not real 
numbers) and involves a definite frequency co and wave number k. Then it turns 
out that this corresponds to a classical limiting situation where we would have 
believed that we have a particle whose energy E was known and is related to the 
frequency by 

E = (38.1) 
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momentum 
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and whose momentum p is also known and is related to the wave number by 

p = ftk. (38.2) 

This means that the idea of a particle is limited. The idea of a particle — its 
location, its momentum, etc. — which we use so much, is in certain ways unsatis- 
factory. For instance, if an amplitude to find a particle at different places is given 
by e i{ut ~ k ' r \ whose absolute square is a constant, that would mean that the prob- 
ability of finding a particle is the same at all points. That means we do not know 
where it is — it can be anywhere — there is a great uncertainty in its location. 

On the other hand, if the position of a particle is more or less well known and 
we can predict it fairly accurately, then the probability of finding it in different 
places must be confined to a certain region, whose length we call Ax. Outside this 
region, the probability is zero. Now this probability is the absolute square of an 
amplitude, and if the absolute square is zero, the amplitude is also zero, so that 
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Fig. 38-1. A wave packet of length 
Ax. 


we have a wave train whose length is Ax (Fig. 38-1), and the wavelength (the 
distance between nodes of the waves in the train) of that wave train is what corre- 
sponds to the particle momentum. 

Here we encounter a strange thing about waves ; a very simple thing which has 
nothing to do with quantum mechanics strictly. It is something that anybody 
who works with waves, even if he knows no quantum mechanics, knows: namely, 
we cannot define a unique wavelength for a short wave train. Such a wave train does 
not have a definite wavelength; there is an indefiniteness in the wave number that 
is related to the finite length of the train, and thus there is an indefiniteness in 
the momentum. 




Fig. 38-2. Diffraction of particles 
passing through a slit. 


38-2 Measurement of position and momentum 

Let us consider two examples of this idea — to see the reason why there is an 
uncertainty in the position and/or the momentum, if quantum mechanics is right. 
We have also seen before that if there were not such a thing — if it were possible to 
measure the position and the momentum of anything simultaneously — we would 
have a paradox; it is fortunate that we do not have such a paradox, and the fact 
that such an uncertainty comes naturally from the wave picture shows that every- 
thing is mutually consistent. 

Here is one example which shows the relationship between the position and 
the momentum in a circumstance that is easy to understand. Suppose we have a 
single slit, and particles are coming from very far away with a certain energy — so 
that they are all coming essentially horizontally (Fig. 38-2). We are going to 
concentrate on the vertical components of momentum. All of these particles have 
a certain horizontal momentum p 0 , say, in a classical sense. So, in the classical 
sense, the vertical momentum p y , before the particle goes through the hole, is 
definitely known. The particle is moving neither up nor down, because it came from 
a source that is far away — and so the vertical momentum is of course zero. But 
now let us suppose that it goes through a hole whose width is B. Then after it has 
come out through the hole, we know the position vertically — the y position — with 
considerable accuracy — namely ±B. That is, the uncertainty in position, Ay , is 
of order B . Now we might also want to say, since we know the momentum is 
absolutely horizontal, that A p y is zero; but that is wrong. We once knew the mo- 
mentum was horizontal, but we do not know it any more. Before the particles 
passed through the hole, we did not know their vertical positions. Now that we 
have found the vertical position by having the particle come through the hole, we 
have lost our information on the vertical momentum! Why? According to the 
wave theory, there is a spreading out, or diffraction, of the waves after they go 
through the slit, just as for light. Therefore there is a certain probability that 
particles coming out of the slit are not coming exactly straight. The pattern is 
spread out by the diffraction effect, and the angle of spread, which we can define 
as the angle of the first minimum, is a measure of the uncertainty in the final angle. 

How does the pattern become spread? To say it is spread means that there is 
some chance for the particle to be moving up or down, that is, to have a component 
of momentum up or down. We say chance and particle because we can detect this 
diffraction pattern with a particle counter, and when the counter receives the 
particle, say at C in Fig. 38-2, it receives the entire particle, so that, in a classical 
sense, the particle has a vertical momentum, in order to get from the slit up to C. 

To get a rough idea of the spread of the momentum, the vertical momentum 
p y has a spread which is equal to p 0 A8 , where p 0 is the horizontal momentum. 
And how big is Ad in the spread -out pattern? We know that the first minimum 
occurs at an angle Ad such that the waves from one edge of the slit have to travel 
one wavelength farther than the waves from the other side — we worked that out 
before (Chapter 30). Therefore Ad is A /B, and so A p y in this experiment is Pq\/B . 
Note that if we make B smaller and make a more accurate measurement of the 
position of the particle, the diffraction pattern gets wider. Remember, when we 
closed the slits on the experiment with the microwaves, we had more intensity 
farther out. So the narrower we make the slit, the wider the pattern gets, and the 
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more is the likelihood that we would find that the particle has sidewise momentum. 
Thus the uncertainty in the vertical momentum is inversely proportional to the 
uncertainty of y. In fact, we see that the product of the two is equal to p 0 \ . But X 
is the wavelength and p 0 is the momentum, and in accordance with quantum 
mechanics, the wavelength times the momentum is Planck’s constant h . So we 
obtain the rule that the uncertainties in the vertical momentum and in the vertical 
position have a product of the order h : 

Ay Ap y « h . (38.3) 

We cannot prepare a system in which we know the vertical position of a particle 
and can predict how it will move vertically with greater certainty than given by 
(38.3). That is, the uncertainty in the vertical momentum must exceed h/Ay, where 
Ay is the uncertainty in our knowledge of the position. 

Sometimes people say quantum mechanics is all wrong. When the particle 
arrived from the left, its vertical momentum was zero. And now that it has gone 
through the slit, its position is known. Both position and momentum seem to 
be known with arbitrary accuracy. It is quite true that we can receive a particle, 
and on reception determine what its position is and what its momentum would 
have had to have been to have gotten there. That is true, but that is not what the 
uncertainty relation (38.3) refers to. Equation (38.3) refers to the predictability 
of a situation, not remarks about the past . It does no good to say “I knew what 
the momentum was before it went through the slit, and now I know the position,” 
because now the momentum knowledge is lost. The fact that it went through the 
slit no longer permits us to predict the vertical momentum. We are talking about 
a predictive theory, not just measurements after the fact. So we must talk about 
what we can predict. 

Now let us take the thing the other way around. Let us take another example 
of the same phenomenon, a little more quantitatively. In the previous example 
we measured the momentum by a classical method. Namely, we considered the 
direction and the velocity and the angles, etc., so we got the momentum by classical 
analysis. But since momentum is related to wave number, there exists in nature 
still another way to measure the momentum of a particle — photon or otherwise — 
which has no classical analog, because it uses Eq. (38.2). We measure the wave- 
lengths of the waves. Let us try to measure momentum in this way. 

Suppose we have a grating with a large number of lines (Fig. 38-3), and send 
a beam of particles at the grating. We have often discussed this problem: if the 
particles have a definite momentum, then we get a very sharp pattern in a certain 
direction, because of the interference. And we have also talked about how accu- 
rately we can determine that momentum, that is to say, what the resolving power 
of such a grating is. Rather than derive it again, we refer to Chapter 30, where 
we found that the relative uncertainty in the wavelength that can be measured 
with a given grating is 1 /Nm, where N is the number of lines on the grating and 
m is the order of the diffraction pattern. That is, 

AX/X = l/Nm. (38.4) 

Now formula (38.4) can be rewritten as 

AX/X 2 - 1 /Nm\ = 1 m (38.5) 

where L is the distance shown in Fig. 38-3. This distance is the difference between 
the total distance that the particle or wave or whatever it is has to travel if it is 
reflected from the bottom of the grating, and the distance that it has to travel if 
it is reflected from the top of the grating. That is, the waves which form the diffrac- 
tion pattern are waves which come from different parts of the grating. The first 
ones that arrive come from the bottom end of the grating, from the beginning of 
the wave train, and the rest of them come from later parts of the wave train, coming 
from different parts of the grating, until the last one finally arrives, and that involves 
a point in the wave train a distance L behind the first point. So in order that we 
shall have a sharp line in our spectrum corresponding to a definite momentum, 
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Fig. 38-3. Determination of momen- 
tum by using a diffraction grating. 



with an uncertainty given by (38.4), we have to have a wave train of at least length 
L. If the wave train is too short we are not using the entire grating. The waves 
which form the spectrum are being reflected from only a very short sector of the 
grating if the wave train is too short, and the grating will not work right — we will 
find a big angular spread. In order to get a narrower one, we need to use the whole 
grating, so that at least at some moment the whole wave train is scattering simul- 
taneously from all parts of the grating. Thus the wave train must be of length L 
in order to have an uncertainty in the wavelength less than that given by (38.5). 
Incidentally, 

A X/X 2 = A(l/X) - Ak/2w. (38.6) 

Therefore 

Ak = 2tt/L, (38.7) 

where L is the length of the wave train. 

This means that if we have a wave train whose length is less than L, the un- 
certainty in the wave number must exceed 2i r/L. Or the uncertainty in a wave 
number times the length of the wave train— we will call that for a moment Ax— 
exceeds 2ir. We call it Ax because that is the uncertainty in the location of the 
particle. If the wave train exists only in a finite length, then that is where we could 
find the particle, within an uncertainty Ax. Now this property of waves, that the 
length of the wave train times the uncertainty of the wave number associated with 
it is at least 27T, is a property that is known to everyone who studies them. It has 
nothing to do with quantum mechanics. It is simply that if we have a finite train, 
we cannot count the waves in it very precisely. Let us try another way to see the 
reason for that. 

Suppose that we have a finite train of length L ; then because of the way it has 
to decrease at the ends, as in Fig. 38-1, the number of waves in the length L is 
uncertain by something like =±= 1. But the number of waves in L is kL/2ir. Thus k 
is uncertain, and we again get the result (38.7), a property merely of waves. The 
same thing works whether the waves are in space and k is the number of radians 
per centimeter and L is the length of the train, or the waves are in time and a> is 
the number of oscillations per second and T is the “length” in time that the wave 
train comes in. That is, if we have a wave train lasting only for a certain finite 
time T, then the uncertainty in the frequency is given by 


Aco = 2 7t/T. 
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Fig. 38-4. Scattering of waves by 
crystal planes. 


We have tried to emphasize that these are properties of waves alone, and they are 
well known, for example, in the theory of sound. 

The point is that in quantum mechanics we interpret the wave number as 
being a measure of the momentum of a particle, with the rule that p = hk , so 
that relation (38.7) tells us that Ap « h/Ax. This, then, is a limitation of the classi- 
cal idea of momentum. (Naturally, it has to be limited in some ways if we are 
going to represent particles by waves!) It is nice that we have found a rule that 
gives us some idea of when there is a failure of classical ideas. 

38-3 Crystal diffraction 

Next let us consider the reflection of particle waves from a crystal. A crystal 
is a thick thing which has a whole lot of similar atoms — we will include some com- 
plications later — in a nice array. The question is how to set the array so that we 
get a strong reflected maximum in a given direction for a given beam of, say, light 
(x-rays), electrons, neutrons, or anything else. In order to obtain a strong reflection, 
the scattering from all of the atoms must be in phase. There cannot be equal num- 
bers in phase and out of phase, or the waves will cancel out. The way to arrange 
things is to find the regions of constant phase, as we have already explained; 
they are planes which make equal angles with the initial and final directions (Fig. 
38— 4). 

If we consider two parallel planes, as in Fig. 38-4, the waves scattered from the 
two planes will be in phase provided the difference in distance travelled by a wave- 
38-4 



front is an integral number of wavelengths. This difference can be seen to be 
2d sin 6 , where d is the perpendicular distance between the planes. Thus the 
condition for coherent reflection is 

Id sin 6 — rik (n — 1,2,.. .). (38.9) 

If, for example, the crystal is such that the atoms happen to lie on planes obey- 
ing condition (38.9) with n = 1, then there will be a strong reflection. If, on the 
other hand, there are other atoms of the same nature (equal in density) halfway 
between, then the intermediate planes will also scatter equally strongly and will 
interfere with the others and produce no effect. So d in (38.9) must refer to ad- 
jacent planes; we cannot take a plane five layers farther back and use this formula! 

As a matter of interest, actual crystals are not usually as simple as a single 
kind of atom repeated in a certain way. Instead, if we make a two-dimensional 
analog, they are much like wallpaper, in which there is some kind of figure which 
repeats all over the wallpaper. By “figure” we mean, in the case of atoms, some 
arrangement — calcium and a carbon and three oxygens, etc., for calcium carbonate, 
and so on — which may involve a relatively large number of atoms. But whatever 
it is, the figure is repeated in a pattern. This basic figure is called a unit cell 

The basic pattern of repetition defines what we call the lattice type ; the lattice 
type can be immediately determined by looking at the reflections and seeing what 
their symmetry is. In other words, where we find any reflections at all determines 
the lattice type, but in order to determine what is in each of the elements of the 
lattice one must take into account the intensity of the scattering at the various 
directions. Which directions scatter depends on the type of lattice, but how strongly 
each scatters is determined by what is inside each unit cell, and in that way the 
structure of crystals is worked out. 

Two photographs of x-ray diffraction patterns are shown in Figs. 38-5 and 
38-6; they illustrate scattering from rock salt and myoglobin, respectively. 

Incidentally, an interesting thing happens if the spacings of the nearest planes 
are less than X/2. In this case (38.9) has no solution for n. Thus if X is bigger 
than twice the distance between adjacent planes then there is no side diffraction 
pattern, and the light— or whatever it is — will go right through the material with- 
out bouncing off or getting lost. So in the case of light, where X is much bigger 
than the spacing, of course it does go through and there is no pattern of reflection 
from the planes of the crystal. 

This fact also has an interesting consequence in the case of piles which make 
neutrons (these are obviously particles, for anybody’s money!). If we take these 
neutrons and let them into a long block of graphite, the neutrons diffuse and 
work their way along (Fig. 38-7). They diffuse because they are bounced by the 
atoms, but strictly, in the wave theory, they are bounced by the atoms because 
of diffraction from the crystal planes. It turns out that if we take a very long piece 
of graphite, the neutrons that come out the far end are all of long wavelength! 
In fact, if one plots the intensity as a function of wavelength, we get nothing except 
for wavelengths longer than a certain minimum (Fig. 38-8). In other words, we 
can get very slow neutrons that way. Only the slowest neutrons come through; 
they are not diffracted or scattered by the crystal planes of the graphite, but keep 
going right through like light through glass, and are not scattered out the sides. 
There are many other demonstrations of the reality of neutron waves and waves 
of other particles. 


38-4 The size of an atom 

We now consider another application of the uncertainty relation, Eq. (38.3). 
It must not be taken too seriously; the idea is right but the analysis is not very 
accurate. The idea has to do with the determination of the size of atoms, and the 
fact that, classically, the electrons would radiate light and spiral in until they settle 
down right on top of the nucleus. But that cannot be right quantum-mechanically 
because then we would know where each electron was and how fast it was moving. 
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Figure 38-5 



Figure 38-6 
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Fig. 38-7. Diffusion of pile neutrons 
through graphite block. 



Fig. 38-8. Intensity of neutrons out of 
graphite rod as function of wavelength. 


Suppose we have a hydrogen atom, and measure the position of the electron; 
we must not be able to predict exactly where the electron will be, or the momentum 
spread will then turn out to be infinite. Every time we look at the electron, it is 
somewhere, but it has an amplitude to be in different places so there is a prob- 
ability of it being found in different places. These places cannot all be at the 
nucleus; we shall suppose there is a spread in position of order a. That is, the dis- 
tance of the electron from the nucleus is usually about a. We shall determine a by 
minimizing the total energy of the atom. 

The spread in momentum is roughly h/a because of the uncertainty relation, so 
that if we try to measure the momentum of the electron in some manner, such 
as by scattering x-rays off it and looking for the Doppler effect from a moving 
scatterer, we would expect not to get zero every time — the electron is not standing 
still — but the momenta must be of the order p « h/a. Then the kinetic energy 
is roughly \mv 2 = p 2 /2m — h 2 /lma 2 . (In a sense, this is a kind of dimensional 
analysis to find out in what way the kinetic energy depends upon Planck’s constant, 
upon m , and upon the size of the atom. We need not trust our answer to within 
factors like 2, 7r, etc. We have not even defined a very precisely.) Now the potential 
energy is minus e 2 over the distance from the center, say —e 2 /a, where, we remem- 
ber, e 2 is the charge of an electron squared, divided by 47re 0 . Now the point is 
that the potential energy is reduced if a gets smaller, but the smaller a is, the higher 
the momentum required, because of the uncertainty principle, and therefore the 
higher the kinetic energy. The total energy is 

E - h 2 /lma 2 - e 2 /a. (38.10) 

We do not know what a is, but we know that the atom is going to arrange itself 
to make some kind of compromise so that the energy is as little as possible. In 
order to minimize E , we differentiate with respect to a, set the derivative equal to 
zero, and solve for a . The derivative of E is 

dE/da = ~h 2 /ma z + e 2 /a 2 , (38.11) 

and setting dE/da = 0 gives for a the value 

a 0 = h 2 /me 2 — 0.528 angstrom 

- 0.528 X l(r 10 meter. (38.12) 

This particular distance is called the Bohr radius , and we have thus learned that 
atomic dimensions are of the order of angstroms, which is right: This is pretty 
good — in fact, it is amazing, since until now we have had no basis for understanding 
the size of atoms! Atoms are completely impossible from the classical point of 
view, since the electrons would spiral into the nucleus. 

Now if we put the value (38.12) for a 0 into (38.10) to find the energy, it comes 
out 

E 0 = -e 2 /2a 0 = - me*/2h 2 = -13.6ev. (38.13) 

What does a negative energy mean? It means that the electron has less energy 
when it is in the atom than when it is free. It means it is bound. It means it takes 
energy to kick the electron out; it takes energy of the order of 13.6 ev to ionize a 
hydrogen atom. We have no reason to think that it is not two or three times this — 
or half of this — or (1/w) times this, because we have used such a sloppy argument. 
However, we have cheated, we have used all the constants in such a way that it 
happens to come out the right number! This number, 13.6 electron volts, is called 
a Rydberg of energy; it is the ionization energy of hydrogen. 

So we now understand why we do not fall through the floor. As we walk, our 
shoes with their masses of atoms push against the floor with its mass of atoms. 
In order to squash the atoms closer together, the electrons would be confined to a 
smaller space and, by the uncertainty principle, their momenta would have to be 
higher on the average, and that means high energy; the resistance to atomic com- 
pression is a quantum-mechanical effect and not a classical effect. Classically, 
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we would expect that if we were to draw all the electrons and protons closer together, 
the energy would be reduced still further, and the best arrangement of positive and 
negative charges in classical physics is all on top of each other. This was well 
known in classical physics and was a puzzle because of the existence of the atom. 
Of course, the early scientists invented some ways out of the trouble — but never 
mind, we have the right way out, now! (Maybe.) 

Incidentally, although we have no reason to understand it at the moment, in a 
situation where there are many electrons it turns out that they try to keep away 
from each other. If one electron is occupying a certain space, then another does 
not occupy the same space. More precisely, there are two spin cases, so that two 
can sit on top of each other, one spinning one way and one the other way. But after 
that we cannot put any more there. We have to put others in another place, and 
that is the real reason that matter has strength. If we could put all the electrons 
in the same place it would condense even more than it does. It is the fact that the 
electrons cannot all get on top of each other that makes tables and everything 
else solid. 

Obviously, in order to understand the properties of matter, we will have to use 
quantum mechanics and not be satisfied with classical mechanics. 


38-5 Energy levels 


We have talked about the atom in its lowest possible energy condition, but 
it turns out that the electron can do other things. It can jiggle and wiggle in a 
more energetic manner, and so there are many different possible motions for the 
atom. According to quantum mechanics, in a stationary condition there can only 
be definite energies for an atom. We make a diagram (Fig. 38-9) in which we plot 
the energy vertically, and we make a horizontal line for each allowed value of the 
energy. When the electron is free, i.e., when its energy is positive, it can have any 
energy; it can be moving at any speed. But bound energies are not arbitrary. The 
atom must have one or another out of a set of allowed values, such as those in 
Fig. 38-9. 

Now let us call the allowed values of the energy E 0 , E u E 2 , E 3 . If an atom is 
initially in one of these “excited states,” E u E 2 , etc., it does not remain in that 
state forever. Sooner or later it drops to a lower state and radiates energy in the 
form of light. The frequency of the light that is emitted is determined by conserva- 
tion of energy plus the quantum-mechanical understanding that the frequency of 
the light is related to the energy of the light by (38.1). Therefore the frequency of 
the light which is liberated in a transition from energy E% to energy E x (for ex- 
ample) is 

0)31 = (Es - E 1 )/h . (38.14) 


.//////// 


-E. 


Fig. 38-9. Energy diagram for an 
atom, showing several possible transitions. 


This, then, is a characteristic frequency of the atom and defines a spectral emission 
line. Another possible transition would be from E 3 to E 0 . That would have a 
different frequency 

<030 = (£3 - E 0 )/h. (38.15) 


Another possibility is that if the atom were excited to the state E 1 it could drop to 
the ground state E 0i emitting a photon of frequency 


co 10 = (£1 - E 0 )/h. (38.16) 

The reason we bring up three transitions is to point out an interesting relationship. 
It is easy to see from (38.14), (38.15), and (38.16) that 


<*>3o = W 31 + wio. (38.17) 

In general, if we find two spectral lines, we shall expect to find another line at the 
sum of the frequencies (or the difference in the frequencies), and that all the lines 
can be understood by finding a series of levels such that every line corresponds to 
the difference in energy of some pair of levels. This remarkable coincidence in 
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spectral frequencies was noted before quantum mechanics was discovered, and it is 
called the Ritz combination principle. This is again a mystery from the point of 
view of classical mechanics. Let us not belabor the point that classical mechanics 
is a failure in the atomic domain; we seem to have demonstrated that pretty well. 

We have already talked about quantum mechanics as being represented by 
amplitudes which behave like waves, with certain frequencies and wave numbers. 
Let us observe how it comes about from the point of view of amplitudes that the 
atom has definite energy states. This is something we cannot understand from what 
has been said so far, but we are all familiar with the fact that confined waves have 
definite frequencies. For instance, if sound is confined to an organ pipe, or any- 
thing like that, then there is more than one way that the sound can vibrate, but 
for each such way there is a definite frequency. Thus an object in which the waves 
are confined has certain resonance frequencies. It is therefore a property of waves 
in a confined space — a subject which we will discuss in detail with formulas later 
on — that they exist only at definite frequencies. And since the general relation 
exists between frequencies of the amplitude and energy, we are not surprised to 
find definite energies associated with electrons bound in atoms. 

38-6 Philosophical implications 

Let us consider briefly some philosophical implications of quantum mechanics. 
As always, there are two aspects of the problem: one is the philosophical implica- 
tion for physics, and the other is the extrapolation of philosophical matters to 
other fields. When philosophical ideas associated with science are dragged into 
another field, they are usually completely distorted. Therefore we shall confine 
our remarks as much as possible to physics itself. 

First of all, the most interesting aspect is the idea of the uncertainty principle; 
making an observation affects the phenomenon. It has always been known that 
making observations affects a phenomenon, but the point is that the effect cannot 
be disregarded or minimized or decreased arbitrarily by rearranging the apparatus. 
When we look for a certain phenomenon we cannot help but disturb it in a certain 
minimum way, and the disturbance is necessary for the consistency of the viewpoint. 
The observer was sometimes important in prequantum physics, but only in a 
rather trivial sense. The problem has been raised: if a tree falls in a forest and there 
is nobody there to hear it, does it make a noise? A real tree falling in a real forest 
makes a sound, of course, even if nobody is there. Even if no one is present to hear 
it, there are other traces left. The sound will shake some leaves, and if we were 
careful enough we might find somewhere that some thorn had rubbed against a 
leaf and made a tiny scratch that could not be explained unless we assumed the 
leaf were vibrating. So in a certain sense we would have to admit that there is a 
sound made. We might ask: was there a sensation of sound? No, sensations have 
to do, presumably, with consciousness. And whether ants are conscious and 
whether there were ants in the forest, or whether the tree was conscious, we do not 
know. Let us leave the problem in that forrri. 

Another thing that people have emphasized since quantum mechanics w^s 
developed is the idea that we should not speak about those things which we cannot 
measure. (Actually relativity theory also said this.) Unless a thing can be defined 
by measurement, it has no place in a theory. And since an accurate value of the 
momentum of a localized particle cannot be defined by measurement it therefore 
has no place in the theory. The idea that this is what was the matter with classical 
theory is a false position. It is a careless analysis of the situation. Just because we 
cannot measure position and momentum precisely does not a priori mean that we 
cannot talk about them. It only means that we need not talk about them. The 
situation in the sciences is this: A concept or an idea which cannot be measured 
or cannot be referred directly to experiment may or may not be useful. It need 
not exist in a theory. In other words, suppose we compare the classical theory of 
the world with the quantum theory of the world, and suppose that it is true ex- 
perimentally that we can measure position and momentum only imprecisely. The 
question is whether the ideas of the exact position of a particle and the exact 
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momentum of a particle are valid or not. The classical theory admits the ideas; 
the quantum theory does not. This does not in itself mean that classical physics 
is wrong. When the new quantum mechanics was discovered, the classical people — 
which included everybody except Heisenberg, Schrodinger, and Born — said: 
“Look, your theory is not any good because you cannot answer certain questions 
like: what is the exact position of a particle?, which hole does it go through?, 
and some others.” Heisenberg’s answer was: “I do not need to answer such ques- 
tions because you cannot ask such a question experimentally.” It is that we do 
not have to. Consider two theories (a) and (b); (a) contains an idea that cannot be 
checked directly but which is used in the analysis, and the other, (b), does not 
contain the idea. If they disagree in their predictions, one could not claim that 
(b) is false because it cannot explain this idea that is in (a), because that idea is 
one of the things that cannot be checked directly. It is always good to know which 
ideas cannot be checked directly, but it is not necessary to remove them all. It is 
not true that we can pursue science completely by using only those concepts which 
are directly subject to experiment. 

In quantum mechanics itself there is a wave function amplitude, there is a 
potential, and there are many constructs that we cannot measure directly. The basis 
of a science is its ability to predict. To predict means to tell what will happen in an 
experiment that has never been done. How can we do that? By assuming that we 
know what is there, independent of the experiment. We must extrapolate the 
experiments to a region where they have not been done. We must take our con- 
cepts and extend them to places where they have not yet been checked, if we do 
not do that, we have no prediction. So it was perfectly sensible for the classical 
physicists to go happily along and suppose that the position — which obviously 
means something for a baseball — meant something also for an electron. It was 
not stupidity. It was a sensible procedure. Today we say that the law of relativity 
is supposed to be true at all energies, but someday somebody may come along and 
say how stupid we were. We do not know where we are “stupid” until we “stick 
our neck out,” and so the whole idea is to put our neck out. And the only way to 
find out that we are wrong is to find out what oiir predictions are. It is absolutely 
necessary to make constructs. 

We have already made a few remarks about the indeterminacy of quantum 
mechanics. That is, that we are unable now to predict what will happen in physics 
in a given physical circumstance which is arranged as carefully as possible. If 
we have an atom that is in an excited state and so is going to emit a photon, we 
cannot say when it will emit the photon. It has a certain amplitude to emit the 
photon at any time, and we can predict only a probability for emission; we cannot 
predict the future exactly. This has given rise to all kinds of nonsense and questions 
on the meaning of freedom of will, and of the idea that the world is uncertain. 

Of course we must emphasize that classical physics is also indeterminate, in a 
sense. It is usually thought that this indeterminacy, that we cannot predict the 
future, is an important quantum-mechanical thing, and this is said to explain the 
behavior of the mind, feelings of free will, etc. But if the world were classical — if 
the laws of mechanics were classical — it is not quite obvious that the mind would 
not feel more or less the same. It is true classically that if we knew the position and 
the velocity of every particle in the world, or in a box of gas, we could predict ex- 
actly what would happen. And therefore the classical world is deterministic. 
Suppose, however, that we have a finite accuracy and do not know exactly where 
just one atom is, say to one part in a billion. Then as it goes along it hits another 
atom, and because we did not know the position better than to one part in a billion, 
we find an even larger error in the position after the collision. And that is amplified, 
of course, in the next collision, so that if we start with only a tiny error it rapidly 
magnifies to a very great uncertainty. To give an example: if water falls over a dam, 
it splashes. If we stand nearby, every now and then a drop will land on our nose. 
This appears to be completely random, yet such a behavior would be predicted 
by purely classical laws. The exact position of all the drops depends upon the 
precise wigglings of the water before it goes over the dam. How? The tiniest 
irregularities are magnified in falling, so that we get complete randomness. Ob- 
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viously, we cannot really predict the position of the drops unless we know the 
motion of the water absolutely exactly . 

Speaking more precisely, given an arbitrary accuracy, no matter how precise, 
one can find a time long enough that we cannot make predictions valid for that 
long a time. Now the point is that this length of time is not very large. It is not 
that the time is millions of years if the accuracy is one part in a billion. The time 
goes, in fact, only logarithmically with the error, and it turns out that in only a 
very, very tiny time we lose all our information. If the accuracy is taken to be one 
part in billions and billions and billions — no matter how many billions we wish, 
provided we do stop somewhere — then we can find a time less than the time it 
took to state the accuracy — after which we can no longer predict what is going 
to happen! It is therefore not fair to say that from the apparent freedom and 
indeterminacy of the human mind, we should have realized that classical “deter- 
ministic” physics could not ever hope to understand it, and to welcome quantum 
mechanics as a release from a “completely mechanistic” universe. For already in 
classical mechanics there was indeterminability from a practical point of view. 
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39-1 Properties of matter 


Properties of matter 
Tfie pressure of a gas 
Compressibility of radiation 
Temperature and kinetic energy 
The ideal gas law 

differently than we have dealt with the other subjects so far. In the case of me- 
chanics and in the case of light, we were able to begin with a precise statement of 
some laws, like Newton’s laws, or the formula for the field produced by an ac- 
celerating charge, from which a whole host of phenomena could be essentially 
understood, and which would produce a basis for our understanding of mechanics 
and of light from that time on. That is, we may learn more later, but we do not 
learn different physics, we only learn better methods of mathematical analysis 
to deal with the situation. 

We cannot use this approach effectively in studying the properties of matter. 

We can discuss matter only in a most elementary way ; it is much too complicated 
a subject to analyze directly from its specific basic laws, which are none other than 
the laws of mechanics and electricity. But these are a bit too far away from the 
properties we wish to study; it takes too many steps to get from Newton’s laws to 
the properties of matter, and these steps are, in themselves, fairly complicated. 

We will now start to take some of these steps, but while many of our analyses 
will be quite accurate, they will eventually get less and less accurate. We will 
have only a rough understanding of the properties of matter. 

One of the reasons that we have to perform the analysis so imperfectly is that 
the mathematics of it requires a deep understanding of the theory of probability; 
we are not going to want to know where every atom is actually moving, but rather, 
how many move here and there on the average, and what the odds are for different 
effects. So this subject involves a knowledge of the theory of probability, and our 
mathematics is not yet quite ready and we do not want to strain it too hard. 

Secondly, and more important from a physical standpoint, the actual behavior 
of the atoms is not according to classical mechanics, but according to quantum 
mechanics, and a correct understanding of the subject cannot be attained until we 
understand quantum mechanics. Here, unlike the case of billiard balls and auto- 
mobiles, the difference between the classical mechanical laws and the quantum- 
mechanical laws is very important and very significant, so that many things that 
we will deduce by classical physics will be fundamentally incorrect. Therefore 
there will be certain things to be partially unlearned; however, we shall indicate 
in every case when a result is incorrect, so that we will know just where the “edges” 
are. One of the reasons for discussing quantum mechanics in the preceding 
chapters was to give an idea as to why, more or less, classical mechanics is incorrect 
in the various directions. 

Why do we deal with the subject now at all? Why not wait a half a year, or 
a year, until we know the mathematics of probability better, and we learn a little 
quantum mechanics, and then we can do it in a more fundamental way? The 
answer is that it is a difficult subject, and the best way to learn is to do it slowly! 

The first thing to do is to get some idea, more or less, of what ought to happen in 
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With this chapter we begin a new subject which will occupy us for some time. 39-1 
It is the first part of the analysis of the properties of matter from the physical point 
of view, in which, recognizing that matter is made out of a great many atoms, or 
elementary parts, which interact electrically and obey the laws of mechanics, we 39-3 

try to understand why various aggregates of atoms behave the way they do. 39-4 

It is obvious that this is a difficult subject, and we emphasize at the beginning 
that it is in fact an extremely difficult subject, and that we have to deal with it 39-5 



different circumstances, and then, later, when we know the laws better, we will 
formulate them better. 

Anyone who wants to analyze the properties of matter in a real problem might 
want to start by writing down the fundamental equations and then try to solve 
them mathematically. Although there are people who try to use such an approach, 
these people are the failures in this field ; the real successes come to those who start 
from a physical point of view, people who have a rough idea where they are going 
and then begin by making the right kind of approximations, knowing what is big 
and what is small in a given complicated situation. These problems are so compli- 
cated that even an elementary understanding, although inaccurate and incomplete, 
is worth while having, and so the subject will be one that we shall go over again 
and again, each time with more and more accuracy, as we go through our course 
in physics. 

Another reason for beginning the subject right now is that we have already 
used many of these ideas in, for example, chemistry, and we have even heard of 
some of them in high school. It is interesting to know the physical basis for 
these things. 

As an interesting example, we all know that equal volumes of gases, at the 
same pressure and temperature, contain the same number of molecules. The law 
of multiple proportions, that when two gases combine in a chemical reaction the 
volumes needed always stand in simple integral proportions, was understood ulti- 
mately by Avogadro to mean that equal volumes have equal numbers of atoms. 
Now why do they have equal numbers of atoms? Can we deduce from Newton’s 
laws that the number of atoms should be equal? We shall address ourselves to 
that specific matter in this chapter. In succeeding chapters, we shall discuss various 
other phenomena involving pressures, volumes, temperature, and heat. 

We shall also find that the subject can be attacked from a nonatomic point of 
view, and that there are many interrelationships of the properties of substances. 
For instance, when we compress something, it heats; if we heat it, it expands. 
There is a relationship between these two facts which can be deduced independently 
of the machinery underneath. This subject is called thermodynamics . The deepest 
understanding of thermodynamics comes, of course, from understanding the actual 
machinery underneath, and that is what we shall do: we shall take the atomic 
viewpoint from the beginning and use it to understand the various properties of 
matter and the laws of thermodynamics. 

Let us, then, discuss the properties of gases from the standpoint of Newton’s 
laws of mechanics. 


39-2 The pressure of a gas 

First, we know that a gas exerts a pressure, and we must clearly understand 
what this is due to. If our ears were a few times more sensitive, we would hear a 
perpetual rushing noise. Evolution has not developed the ear to that point, 
because it would be useless if it were so much more sensitive — we would hear a 
perpetual racket. The reason is that the eardrum is in contact with the air, and air 
is a lot of molecules in perpetual motion and these bang against the eardrums. 
In banging against the eardrums they make an irregular tattoo — boom, boom, 
boom — which we do not hear because the atoms are so small, and the sensitivity 
of the ear is not quite enough to notice it. The result of this perpetual bombardment 
is to push the drum away, but of course there is an equal perpetual bombardment 
of atoms on the other side of the eardrum, so the net force on it is zero. If we were 
to take the air away from one side, or change the relative amounts of air on the 
two sides, the eardrum would then be pushed one way or the other, because the 
amount of bombardment on one side would be greater than on the other. We 
sometimes feel this uncomfortable effect when we go up too fast in an elevator or 
an airplane, especially if we also have a bad cold (when we have a cold, inflammation 
closes the tube which connects the air on the inside of the eardrum with the outside 
air through the throat, so that the two pressures cannot readily equalize). 
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In considering how to analyze the situation quantitatively, we imagine that 
we have a volume of gas in a box, at one end of which is a piston which can be 
moved (Fig. 39-1). We would like to find out what force on the piston results from 
the fact that there are atoms in this box. The volume of the box is V , and as the 
atoms move around inside the box with various velocities they bang against the 
piston. Suppose there is nothing, a vacuum, on the outside of the piston. What of 
it? If the piston were left alone, and nobody held onto it, each time it got banged 
it would pick up a little momentum and it would gradually get pushed out of the 
box. So in order to keep it from being pushed out of the box, we have to hold it 
with a force F. The problem is, how much force? One way of expressing the force 
is to talk about the force per unit area: if A is the area of the piston, then the force 
on the piston will be written as a number times the area. We define the pressure, 
then, as equal to the force that we have to apply on a piston, divided by the area 
of the piston: 

P = F/A . (39.1) 

To make sure we understand the idea (we have to derive it for another purpose 
anyway), the differential work dW done on the gas in compressing it by moving 
the piston in a differential amount — dx would be the force times the distance 
that we compress it, which, according to (39.1), would be the pressure times the 
area, times the distance, which is equal to minus the pressure times the change 
in the volume: 

dW - F(-dx) = -PAdx = - P dV . (39.2) 

(The area A times the distance dx is the volume change.) The minus sign is there 
because, as we compress it, we decrease the volume; if we think about it we can 
see that if a gas is compressed, work is done on it. 

How much force do we have to apply to balance the banging of the molecules? 
The piston receives from each collision a certain amount of momentum. A certain 
amount of momentum per second will pour into the piston, and it will start to move. 
To keep it from moving, we must pour back into it the same amount of momentum 
per second from our force. Of course, the force is the amount of momentum per 
second that we must pour in. There is another way to put it: if we let go of the 
piston it will pick up speed because of the bombardments ; with each collision we 
get a little more speed, and the speed thus accelerates. The rate at which the piston 
picks up speed, or accelerates, is proportional to the force on it. So we see that 
the force, which we already have said is the pressure times the area, is equal to the 
momentum per second delivered to the piston by the colliding molecules. 

To calculate the momentum per second is easy — we can do it in two parts: 
first, we find the momentum delivered to the piston by one particular atom in a 
collision with the piston, then we have to multiply by the number of collisions per 
second that the atoms have with the wall. The force will be the product of these 
two factors. Now let us see what the two factors are: In the first place, we shall 
suppose that the piston is a perfect “reflector” for the atoms. If it is not, the whole 
theory is wrong, and the piston will start to heat up and things will change, but 
eventually, when equilibrium has set in, the net result is that the collisions are 
effectively perfectly elastic. On the average, every particle that comes in leaves 
with the same energy. So we shall imagine that the gas is in a steady condition, and 
we lose no energy to the piston because the piston is standing still. In those cir- 
cumstances, if a particle comes in with a certain speed, it comes out with the same 
speed and, we will say, with the same mass. 

If v is the velocity of an atom, and v x is the x-component of v, then mv x is the 
jc-component of momentum “in”; but we also have an equal component of mo- 
mentum “out,” and so the total momentum delivered to the piston by the particle, 
in one collision, is 2 mv x , because it is “reflected.” 

Now, we need the number of collisions made by the atoms in a second, or in 
a certain amount of time dt; then we divide by dt. How many atoms are hitting? 
Let us suppose that there are N atoms in the volume F, or n = N/V in each unit 
volume. To find how many atoms hit the piston, we note that, given a certain 
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Fig. 39-1. Atoms of a gas in a box 
with a frictionless piston. 



amount of time t , if a particle has a certain velocity toward the piston it will hit 
during the time t, provided it is close enough. If it is too far away, it goes only 
part way toward the piston in the time t , but does not reach the piston. Therefore 
it is clear that only those molecules which are within a distance v x t from the piston 
are going to hit the piston in the time t. Thus the number of collisions in a time t is 
equal to the number of atoms which are in the region within a distance v x t, and 
since the area of the piston is A , the volume occupied by the atoms which are going 
to hit the piston is v x tA. But the number of atoms that are going to hit the piston 
is that volume times the number of atoms per unit volume, nv x tA. Of course we 
do not want the number that hit in a time t , we want the number that hit per second, 
so we divide by the time /, to get nv x A. (This time t could be made very short; if 
we feel we want to be more elegant, we call it dt , then differentiate, but it is the same 
thing.) 

So we find that the force is 


F = nv x A • 2 mv x , (39.3) 

See, the force is proportional to the area, if we keep the particle density fixed as 
we change the area! The pressure is then 

P = 2 nmv% (39.4) 

Now we notice a little trouble with this analysis: First, all the molecules do 
not have the same velocity, and they do not move in the same direction. So, all 
the v x ’s are different! So what we must do, of course, is to take an average of the 
vfs, since each one makes its own contribution. What we want is the square of 
v Xi averaged over all the molecules: 

P = nm{v 2 x ). (39.5) 

Did we forget to include the factor 2? No; of all the atoms, only half are 
headed toward the piston. The other half are headed the other way, and if we take 
we are averaging the negative v x s squared, as well as the positive v x s. So 
when we just take (v%), without looking, we are getting twice as much as we want. 
The average of v 2 x , for positive v X9 is equal to the average of v 2 x for all v Xy times one 
half. 

Now as the atoms bounce around, it is clear that there is nothing special about 
the “^-direction”; the atoms may also be moving up and down, back and forth, 
in and out. Therefore it is going to be true that (vl), the average motion of the 
atoms in one direction, and the average in the other two directions, are all going 
to be equal: 

<«*> = <«£> = <«£>• (39.6) 

It is only a matter of rather tricky mathematics to notice, therefore, that they are 
each equal to one-third of their sum, which is of course the square of the magnitude 
of the velocity: 

( v l) = + v 2 y + v 2 z ) — (y 2 )/3. (39.7) 

This has the advantage that we do not have to worry about any particular direction, 
and so we write our pressure formula again in this form: 

P = Q)n(mv 2 /2). (39.8) 

The reason we wrote the last factor as (mv 2 /l) is that this is the kinetic energy of 
the center-of-mass motion of the molecule. We find, therefore, that 

PV = N($(mv 2 / 2). (39.9) 

With this equation we can calculate how much the pressure is, if we know the speeds. 

As a very simple example let us take helium gas, or any other gas, like mercury 
vapor, or potassium vapor of high enough temperature, or argon, in which all the 
molecules are single atoms, for which we may suppose that there is no internal 
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motion in the atom. If we had a complex molecule, there might be some internal 
motion, mutual vibrations, or something. We suppose that we may disregard that; 
this is actually a serious matter that we will have to come back to, but it turns out 
to be all right. We suppose that the internal motion of the atoms can be disre- 
garded, and therefore, for this purpose, that the kinetic energy of the center-of-mass 
motion is all the energy there is. So for a monatomic gas, the kinetic energy is the 
total energy. In general, we are going to call U the total energy (it is sometimes 
called the total internal energy — we may wonder why, since there is no external 
energy to a gas), i.e., all the energy of all the molecules in the gas, or the object, 
whatever it is. 

For a monatomic gas we will suppose that the total energy U is equal to a num- 
ber of atoms times the average kinetic energy of each, because we are disregarding 
any possibility of excitation or motion inside the atoms themselves. Then, in these 
circumstances, we would have 

PV = f U. (39.10) 

Incidentally, we can stop here and find the answer to the following question: 
Suppose that we take a can of gas and compress the gas slowly, how much pressure 
do we need to squeeze the volume down? It is easy to find out, since the pressure 
is f the energy divided by K. As we squeeze it down, we do work on the gas and we 
thereby increase the energy U. So we are going to have some kind of a differential 
equation: If we start out in a given circumstance with a certain energy and a certain 
volume, we then know the pressure. Now we start to squeeze, but the moment we 
do, the energy U increases and the volume V decreases, so the pressure goes up. 

So, we have to solve a differential equation, and we will solve it in a moment. 
We must first emphasize, however, that as we are compressing this gas, we are 
supposing that all the work goes into increasing the energy of the atoms inside. 
We may ask, “Isn’t that necessary? Where else could it go?” It turns out that it 
can go another place. There are what we call “heat leaks” through the walls: 
the hot (i.e., fast-moving) atoms that bombard the walls, heat the walls, and energy 
goes away. We shall suppose for the present that this is not the case. 

For somewhat wider generality, although we are still making some very special 
assumptions about our gas, we shall write, not PV = § £/, but 

PV - (7 - 1 )U. (39.11) 

It is written (7—1) times U for conventional reasons, because we will deal with 
a few other cases later where the number in front of U will not be §, but will be a 
different number. So, in order to do the thing in general, we call it 7 — 1, because 
people have been calling it that for almost one hundred years. This 7, then, is § , 
because § — 1 is § for a monatomic gas like helium. 

We have already noticed that when we compress a gas the work done is 
— P dV. A compression in which there is no heat energy added or removed is 
called an adiabatic compression, from the Greek a (not) + dia (through) + bainein 
(to go). (The word adiabatic is used in physics in several ways, and it is sometimes 
hard to see what is common about them.) That is, for an adiabatic compression 
all the work done goes into changing the internal energy. That is the key — that 
there are no other losses of energy — for then we have P dV = — dU . But since 
U = PVf(y — 1), we may write 

dU = (PdV + V dP)/(y - 1). (39.12) 

So we have PdV ~ — ( PdV + V dP)/(y — 1), or, rearranging the terms, 
7 PdV = - VdP , or 

(7 dV/V) + (dP/P) = 0. (39.13) 

Fortunately, assuming that 7 is constant, as it is for a monatomic gas, we can inte- 
grate this : it gives 7 In V + In P = In C, where In C is the constant of integration. 
If we take the exponential of both sides, we get the law 


(39.14) 
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PV y = C (a constant). 



In other words, under adiabatic conditions, where the temperature rises as we 
compress because no heat is being lost, the pressure times the volume to the f 
power is a constant for a monatomic gas! Although we derived it theoretically, 
this is, in fact, the way monatomic gases behave experimentally. 


39-3 Compressibility of radiation 

We may give one other example of the kinetic theory of a gas, one which is 
not used in chemistry so much, but is used in astronomy. We have a large number 
of photons in a box in which the temperature is very high. (The box is, of course, 
the gas in a very hot star. The sun is not hot enough; there are still too many 
atoms, but at still higher temperatures in certain very hot stars, we may neglect the 
atoms and suppose that the only objects that we have in the box are photons.) 
Now then, a photon has a certain momentum p. (We always find that we are in 
terrible trouble when we do kinetic theory: p is the pressure, but p is the momentum; 
v is the volume, but v is the velocity; T is the temperature, but T is the kinetic 
energy or the time or the torque; one must keep one’s wits about one!) This 
p is momentum, it is a vector. Going through the same analysis as before, it is 
the x-component of the vector p which generates the “kick,” and twice the x-com- 
ponent of the vector p is the momentum which is given in the kick. Thus 2 p x re- 
places 2mv x , and in evaluating the number of collisions, v x is still v x , so when we 
get all the way through, we find that the pressure in Eq. (39.4) is, instead, 

P = 2np x v x . (39.15) 

Then, in the averaging, it becomes n times the average of p x v x (the same factor of 
2) and, finally, putting in the other two directions, we find 

PV=N( p-v)/3. (39.16) 

This checks with the formula (39.9), because the momentum is mv; it is a little 
more general, that is all. The pressure times the volume is the total number of 
atoms times £(p • v), averaged. 

Now, for photons, what is p ■ v? The momentum and the velocity are in the 
same direction, and the velocity is the speed of light, so this is the momentum of 
each of the objects, times the speed of light. The momentum times the speed of 
light of every photon is its energy: E = pc, so these terms are the energies of 
each of the photons, and we should, of course, take an average energy, times the 
number of photons. So we have ^ of the energy inside the gas: 

PV = U / 3 (photon gas). (39.17) 

For photons, then, since we have £ in front, (7 - 1) in (39.1 1) is or 7 = and 
we have discovered that radiation in a box obeys the law 

PV 413 = C. (39.18) 

So we know the compressibility of radiation! That is what is used in an analysis 
of the contribution of radiation pressure in a star, that is how we calculate it, and 
how it changes when we compress it. What wonderful things are already within 
our power! 


39-4 Temperature and kinetic energy 

So far we have not dealt with temperature', we have purposely been avoiding 
the temperature. As we compress a gas, we know that the energy of the molecules 
increases, and we are used to saying that the gas gets hotter; we would like to 
understand what this has to do with the temperature. If we try to do the experi- 
ment, not adiabatically but at what we call constant temperature, what are we doing? 
We know that if we take two boxes of gas and let them sit next to each other long 
enough, even if at the start they were at what we call dilferent temperatures, they 
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will in the end come to the same temperature. Now what does that mean? That 
means that they get to a condition that they would get to if we left them alone 
long enough! What we mean by equal temperature is just that— the final condition 
when things have been sitting around interacting with each other long enough. 

Let us consider, now, what happens if we have two gases in containers sepa- 
rated by a movable piston as in Fig. 39-2 (just for simplicity we shall take two 
monatomic gases, say helium and neon). In container (1) the atoms have mass 
m u velocity vi, and there are «i per unit volume, and in the other container the 
atoms have mass m 2 , velocity v 2 , there are n 2 atoms per unit volume. What are 
the conditions for equilibrium? 

Obviously, the bombardment from the left side must be such that it moves 
the piston to the right and compresses the other gas until its pressure builds up, 
and the thing will thus slosh back and forth, and will gradually come to rest at 
a place where the pressures are equal on both sides. So we can arrange that the 
pressures are equal ; that just means that the internal energies per unit volume are 
equal, or that the numbers n times the average kinetic energies on each side are 
equal. What we have to try to prove, eventually, is that the numbers themselves 
are equal. So far, all we know is that the numbers times the kinetic energies 
are equal, 

ni{miv\/2) = n 2 {m 2 v\/2), 

from (39.8), because the pressures are equal. We must realize that this is not the 
only condition over the long run, but something else must happen more slowly 
as the true complete equilibrium corresponding to equal temperatures sets in. 

To see the idea, suppose that the pressure on the left side were developed by 
having a very high density but a low velocity. By having a large n and a small v, 
we can get the same pressure as by having a small n and a large v. The atoms may 
be moving slowly but be packed nearly solidly, or there may be fewer but they are 
hitting harder. Will it stay like that forever? At first we might think so, but then 
we think again and find we have forgotten one important point. That is, that the 
intermediate piston does not receive a steady pressure; it wiggles, just like the 
eardrum that we were first talking about, because the bangings are not absolutely 
uniform. There is not a perpetual, steady pressure, but a tattoo— the pressure 
varies, and so the thing jiggles. Suppose that the atoms on the right side are not 
jiggling much, but those on the left are few and far between and very energetic. 
The piston will, now and then, get a big impulse from the left, and will be driven 
against the slow atoms on the right, giving them more speed. (As each atom 
collides with the piston, it either gains or loses energy, depending upon whether 
the piston is moving one way or the other when the atom strikes it.) So, as a result 
of the collisions, the piston finds itself jiggling, jiggling, jiggling, and this shakes 
the other gas — it gives energy to the other atoms, and they build up faster motions, 
until they balance the jiggling that the piston is giving to them. The system comes 
to some equilibrium where the piston is moving at such a mean square speed that 
it picks up energy from the atoms at about the same rate as it puts energy back 
into them. So the piston picks up a certain mean irregularity in speed, and it is 
our problem to find it. When we do find it, we can solve our problem better, be- 
cause the gases will adjust their velocities until the rate at which they are trying 
to pour energy into each other through the piston will become equal. 

It is quite difficult to figure out the details of the piston in this particular cir- 
cumstance; although it is ideally simple to understand, it turns out to be a little 
harder to analyze. Before we analyze that, let us analyze another problem in 
which we have a box of gas but now we have two different kinds of molecules in it, 
having masses mi and m 2 , velocities r i and v 2 , and so forth; there is now a much 
more intimate relationship. If all of the No. 2 molecules are standing still, that 
condition is not going to last, because they get kicked by the No. 1 molecules and 
so pick up speed. If they are all going much faster than the No. 1 molecules, then 
maybe that will not last either— they will pass the energy back to the No. 1 mole- 
cules. So when both gases are in the same box, the problem is to find the rule that 
determines the relative speeds of the two. 
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Fig. 39-2. Atoms of two different 
monatomic gases are separated by a 
movable piston. 



Fig. 39-3. A collision between un- 
equal molecules, viewed in the CM system. 


This is still a very difficult problem, but we will solve it as follows. First we 
consider the following sub-problem (again this is one of those cases where — never 
mind the derivation — in the end the result is very simple to remember, but the 
derivation is just ingenious). Let us suppose that we have two molecules, of differ- 
ent mass, colliding, and that the collision is viewed on the center-of-mass (CM) 
system. In order to remove a complication, we look at the collision in the CM. 
As we know from the laws of collision, by the conservation of momentum and en- 
ergy, after the molecules collide the only way they can move is such that each 
maintains its own original speed — and they just change their direction. So we have 
an average collision that looks like that in Fig. 39-3. Suppose, for a moment, that 
we watch all the collisions with the CM at rest. Suppose we imagine that they are 
all initially moving horizontally. Of course, after the first collision some of them 
are moving at an angle. In other words, if they were all going horizontally, then 
at least some would later be moving vertically. Now in some other collision, they 
would be coming in from another direction, and then they would be twisted at 
still another angle. So even if they were completely organized in the beginning, 
they would get sprayed around at all angles, and then the sprayed ones would get 
sprayed some more, and sprayed some more, and sprayed some more. Ultimately, 
what will be the distribution? Answer : It will be equally likely to find any pair 
moving in any direction in space . After that further collisions could not change the 
distribution. 

They are equally likely to go in all directions, but how do we say that? There 
is of course no likelihood that they will go in any specific direction, because a specific 
direction is too exact, so we have to talk about per unit “something.” The idea is 
that any area on a sphere centered at a collision point will have just as many 
molecules going through it as go through any other equal area on the sphere. 
So the result of the collisions will be to distribute the directions so that equal areas 
on a sphere will have equal probabilities. 

Incidentally, if we just want to discuss the original direction and some other 
direction an angle 6 from it, it is an interesting property that the differential area 
of a sphere of unit radius is sin 6 dd times 2ir, and that is the same as the differ- 
ential of cos 6. So what it means is that the cosine of the angle 6 between any two 
directions is equally likely to be anything from —1 to +1. 

Next, we have to worry about the actual case, where we do not have the 
collision in the CM system, but we have two atoms which are coming together with 
vector velocities Vi and V 2 . What happens now? We can analyze this collision 
with the vector velocities Vj and v 2 in the following way: We first say that there 
is a certain CM; the velocity of the CM is given by the “average” velocity, with 
weights proportional to the masses, so the velocity of the CM is v CM — (m^x + 
n ^ 2 y 2 )/(jni + m 2 ). If we watch this collision in the CM system, then we see a 
collision just like that in Fig. 39-3, with a certain relative velocity w coming in. 
The relative velocity is just Vi - v 2 . Now the idea is that, first, the whole CM is 
moving, and in the CM there is a relative velocity w, and the molecules collide 
and come off in some new direction. All this happens while the CM keeps right on 
moving, without any change. 

Now then, what is the distribution resulting from this? From our previous 
argument we conclude this: that at equilibrium, all directions for w are equally 
likely , relative to the direction of the motion of the CM* There will be no particular 
correlation, in the end, between the direction of the motion of the relative velocity 
and that of the motion of the CM. Of course, if there were, the collisions would 
spray it about, so it is all sprayed around. So the cosine of the angle between w 
and v C m is zero on the average. That is. 


(w’V CM > = 0. (39.19) 


* This argument, which was the one used by Maxwell, involves some subtleties. Al- 
though the conclusion is correct, the result does not follow purely from the considerations 
of symmetry that we used before, since, by going to a reference frame moving through 
the gas, we may find a distorted velocity distribution. We have not found a simple proof 
of this result. 
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But w • Vcm can be expressed in terms of Vi and v 2 as well: 

Oi - v 2 ) • (^lYx + rn 2 v 2 ) 

w VcM mi + m 2 

_ (miV i — m 2 v%) + (^2 ~ Wi)(yi • v 2 ) 
— m, 4 - mo 


(39.20) 


First, let us look at the Vi • v 2 ; what is the average of v x • v 2 ? That is, what is 
the average of the component of velocity of one molecule in the direction of an- 
other? Surely there is just as much likelihood of finding any given molecule moving 
one way as another. The a vereig e of the velocity v 2 in any direction is zero. Certainly , 
then, in the direction of v l5 v 2 has zero average. So, the average of Vj • v 2 is 
zero! Therefore, we conclude that the average of m i v\ must be equal to the average 
of m 2 v\. That is, the average kinetic energy of the two must be equal: 

%miv\ = \m 2 v%. (39.21) 

If we have two kinds of atoms in a gas, it can be shown, and we presume to have 
shown it, that the average of the kinetic energy of one is the same as the average of 
the kinetic energy of the other, when they are both in the same gas in the same box 
in equilibrium. That means that the heavy ones will move slower than the light 
ones; this is easily shown by experimentation with “atoms” of different masses 
in an air trough. 

Now we would like to go one step further, and say that if we have two different 
gases separated in a box, they will also have equal average kinetic energy when 
they have finally come to equilibrium, even though they are not in the same box. 
We can make the argument in a number of ways. One way is to argue that if we 
have a fixed partition with a tiny hole in it (Fig. 39-4) so that one gas could leak 
out through the holes while the other could not, because the molecules are too 
big, and these had attained equilibrium, then we know that in one part, where they 
are mixed, they have the same average kinetic energy, but some come through the 
hole without loss of kinetic energy, so the average kinetic energy in the pure gas 
and in the mixture must be the same. That is not too satisfactory, because maybe 
there are no holes, for this kind of molecule, that separate one kind from the other. 

Let us now go back to the piston problem. We can give an argument 
which shows that the kinetic energy of this piston must also be %m 2 v\. Actually, 
that would be the kinetic energy due to the purely horizontal motion of the piston, 
so, forgetting its up and down motion, it will have to be the same as %m 2 v 2x . 
Likewise, from the equilibrium on the other side, we can prove that the kinetic 
energy of the piston is \m \ if . Although this is not in the middle of the gas, but 
is on one side of the gas, we can still make the argument, although it is a little 
more difficult, that the average kinetic energy of the piston and of the gas molecules 
are equal as a result of all the collisions. 

If this still does not satisfy us, we may make an artificial example by which 
the equilibrium is generated by an object which can be hit on all sides. Suppose 
that we have a short rod with a ball on each end sticking through the piston, on a 
frictionless sliding universal joint. Each ball is round, like one of the molecules, 
and can be hit on all sides. This whole object has a certain total mass, m. 
Now, we have the gas molecules with mass yyi\ and mass m2 ns before. The 
result of the collisions, by the analysis that was made before, is that the kinetic 
energy of m because of collisions with the molecules on one side must be on 

the average. Likewise, because of the collisions with molecules on the other side, 
it has to be %m 2 v l on the average. So, therefore, both sides have to have the 
same kinetic energy when they are in thermal equilibrium. So, although we only 
proved it for a mixture of gases, it is easily extended to the case where there are 
two different, separate gases at the same temperature. 

Thus when we have two gases at the same temperature , the mean kinetic energy 
of the CM motions are equal . 

The mean molecular kinetic energy is a property only of the “temperature.” 
Being a property of the “temperature,” and not of the gas , we can use it as a defini- 
tion of the temperature. The mean kinetic energy of a molecule is thus some 

39-9 



Fig. 39-4. Two gases in a box with a 
semipermeable membrane. 



function of the temperature. But who is to tell us what scale to use for the tempera- 
ture? We may arbitrarily define the scale of temperature so that the mean energy 
is linearly proportional to the temperature. The best way to do it would be to 
call the mean energy itself “the temperature.” That would be the simplest possible 
function. Unfortunately, the scale of temperature has been chosen differently, 
so instead of calling it temperature directly we use a constant conversion factor 
between the energy of a molecule and a degree of absolute temperature called a 
degree Kelvin. The constant of proportionality is k = 1.38 X 10“ 23 joule for 
every degree Kelvin.* So if T is absolute temperature, our definition says that the 
mean molecular kinetic energy is f kT. (The f is put in as a matter of convenience, 
so as to get rid of it somewhere else.) 

We point out that the kinetic energy associated with the component of motion 
in any particular direction is only £AT. The three independent directions that are 
involved make it §kT. 

39-5 The ideal gas law 

Now, of course, we can put our definition of temperature into Eq. (39.9) 
and so find the law for the pressure of gases as a function of the temperature: it is 
that the pressure times the volume is equal to the total number of atoms times the 
universal constant k, times the temperature: 

PV = NkT. (39.22) 

Furthermore, at the same temperature and pressure and volume, the number of 
atoms is determined; it too is a universal constant! So equal volumes of different 
gases, at the same pressure and temperature, have the same number of molecules, 
because of Newton’s laws. That is an amazing conclusion! 

In practice, when dealing with molecules, because the numbers are so large, 
the chemists have artificially chosen a specific number, a very large number, and 
called it something else. They have a number which they call a mole. A mole is 
merely a handy number. Why they did not choose 10 24 objects, so it would come 
out even, is a historical question. They happened to choose, for the convenient 
number of objects on which they standardize, N 0 = 6.02 X 10 23 objects, and 
this is called a mole of objects. So instead of measuring the number of molecules 
in units, they measure in terms of numbers of moles, f In terms of N 0 we can write 
the number of moles, times the number of atoms in a mole, times kT, and if we 
want to, we can take the number of atoms in a mole times k, which is a mole’s 
worth of k, and call it something else, and we do— we call it R. A mole’s worth of 
k is 8.3 17 joules :R = N 0 k = 8.317 j * mole -1 • °K -1 . Thus we also find the gas 
law written as the number of moles (also called N) times RT, or the number of 
atoms, times kT: 

PV = NRT. (39.23) 

It is the same thing, just a different scale for measuring numbers. We use 1 as a 
unit, and chemists use 6 X 10 2 3 as a unit! 

We now make one more remark about our gas law, and that has to do with the 
law for objects other than monatomic molecules. We have dealt only with the 
CM motion of the atoms of a monatomic gas. What happens if there are forces 
present? First, consider the case that the piston is held by a horizontal spring, 
and there are forces on it. The exchange of jiggling motion between atoms and 
piston at any moment does not depend on where the piston is at that moment, of 
course. The equilibrium conditions are the same. No matter where the piston is, 
its speed of motion must be such that it passes energy to the molecules in just 


* The centigrade scale is just this Kelvin scale with a zero chosen at 273.16 °K, so 
T = 273.16 + centigrade temperature. 

t What the chemists call molecular weights are the masses in grams of a mole of a 
molecule. The mole is defined so that the mass of a mole of carbon atoms of isotope 12 
(i.e., having 6 protons and 6 neutrons in the nucleus) is exactly 12 grams. 
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the right way. So it makes no difference about the spring. The speed at which 
the piston has to move, on the average, is the same. So our theorem, that the mean 
value of the kinetic energy in one direction is \kT, is true whether there are forces 
present or not. 

Consider, for example, a diatomic molecule composed of atoms m A and m B . 
What we have proved is that the motion of the CM of part A and that of part B 
are such that \\m A v A ) — m B v% } = ik.T. How can this be, if they are held 
together? Although they are held together, when they are spinning and turning 
in there, when something hits them, exchanging energy with them, the only thing 
that counts is how fast they are moving. That alone determines how fast they 
exchange energy in collisions. At the particular instant, the force is not an es- 
sential point. Therefore the same principle is right, even when there are forces. 

Let us prove, finally, that the gas law is consistent also with a disregard of the 
internal motion. We did not really include the internal motions before; we just 
treated a monatomic gas. But we shall now show that an entire object, considered 
as a single body of total mass M, has a velocity of the CM such that 


|M>cm = ffcr. (39.24) 

In other words, we can consider either the separate pieces or the whole thing! 
Let us see the reason for that: The mass of the diatomic molecule is M = m A + 
m B , and the velocity of the center of mass is equal to v C m = (ma^a + m B v B )/ M. 
Now we need (wcm)- If we square Vcmj we g et 

2 m\v 2 A + 2m A m B y A • v B + m B v B 

y CM — 

Now we multiply \M and take the average, and thus we get 

, 2 m A %kT + 2 m A m B (\ A • v B ) + m B %kT 

W V C M = Jj 

_ , 2 m A m B (y A ■ v B ) 

“ 2 + M 

(We have used the fact that (m A + m B )/M =1.) Now what is (v .4 • y B ) ? (It 
had better be zero!) To find out, let us use our assumption that the relative velocity, 
w = V/1 _ is not any more likely to point in one direction than in another — 
that is, that its average component in any direction is zero. Thus we assume that 


(w • v C m) = 0. 


But what is w • v C m? It is 

(v .4 — vs) ’ (rn A y A + m B y B ) 

W 1 VCM - Jf 

_ nt A v\ + (/«b — wu)(v.a • Vb) — m B v% 

~ M 

Therefore, since ( m A v 2 A ) = (/mb?/b)> the first and last terms cancel out on the 
average, and we are left with 

(/mb — m A ){ y A - y B ) = 0. 

Thus if m A 5 ^ m B , we find that (y A ■ y B ) = 0 , and therefore that the bodily motion 
of the entire molecule, regarded as a single particle of mass M, has a kinetic energy, 
on the average, equal to § kT. 

Incidentally, we have also proved at the same time that the average kinetic 
energy of the internal motions of the diatomic molecule, disregarding the bodily 
motion of the CM, is f kT\ For, the total kinetic energy of the parts of the molecule 
is \m A v\ + \m B v 2 B , whose average is \kT + fkT, or 3 kT. The kinetic energy 
of the center-of-mass motion is f kT, so the average kinetic energy of the rotational 
and vibratory motions of the two atoms inside the molecule is the difference, f kT. 
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The theorem concerning the average energy of the CM motion is general: 
for any object considered as a whole, with forces present or no, for every inde- 
pendent direction of motion that there is, the average kinetic energy in that motion 
is \kT. These “independent directions of motion” are sometimes called the 
degrees of freedom of the system. The number of degrees of freedom of a molecule 
composed of r atoms is 3 r, since each atom needs three coordinates to define its 
position. The entire kinetic energy of the molecule can be expressed either as the 
sum of the kinetic energies of the separate atoms, or as the sum of the kinetic 
energy of the CM motion plus the kinetic energy of the internal motions. The 
latter can sometimes be expressed as a sum of rotational kinetic energy of the 
molecule and vibrational energy, but this is an approximation. Our theorem, 
applied to the /--atom molecule, says that the molecule will have, on the average, 
3rkT/2 joules of kinetic energy, of which %kT is kinetic energy of the center-of-mass 
motion of the entire molecule, and the rest, $ (r — l)kT, is internal vibrational and 
rotational kinetic energy. 


39-12 



40 


The Principles of Statistical Mechanics 


40-1 The exponential atmosphere 

We have discussed some of the properties of large numbers of intercolliding 
atoms. The subject is called kinetic theory, a description of matter from the point 
of view of collisions between the atoms. Fundamentally, we assert that the gross 
properties of matter should be explainable in terms of the motion of its parts. 

We limit ourselves for the present to conditions of thermal equilibrium, that 
is, to a subclass of all the phenomena of nature. The laws of mechanics which 
apply just to thermal equilibrium are called statistical mechanics , and in this section 
we want to become acquainted with some of the central theorems of this subject. 

We already have one of the theorems of statistical mechanics, namely, the 
mean value of the kinetic energy for any motion at the absolute temperature T 
is \kT for each independent motion, i.e., for each degree of freedom. That tells us 
something about the mean square velocities of the atoms. Our objective now is 
to learn more about the positions of the atoms, to discover how many of them 
are going to be in different places at thermal equilibrium, and also to go into a 
little more detail on the distribution of the velocities. Although we have the mean 
square velocity, we do not know how to answer a question such as how many of 
them are going three times faster than the root mean square, or how many of them 
are going one-quarter of the root mean square speed. Or have they all the same 
speed exactly? 

So, these are the two questions that we shall try to answer: How are the mole- 
cules distributed in space when there are forces acting on them, and how are they 
distributed in velocity? 

It turns out that the two questions are completely independent, and that the 
distribution of velocities is always the same. We already received a hint of the latter 
fact when we found that the average kinetic energy is the same, \kT per degree of 
freedom, no matter what forces are acting on the molecules. The distribution of 
the velocities of the molecules is independent of the forces, because the collision 
rates do not depend upon the forces. 

Let us begin with an example: the distribution of the molecules in an atmosphere 
like our own, but without the winds and other kinds of disturbance. Suppose that 
we have a column of gas extending to a great height, and at thermal equilibrium — 
unlike our atmosphere, which as we know gets colder as we go up. We could 
remark that if the temperature differed at different heights, we could demonstrate 
lack of equilibrium by connecting a rod to some balls at the bottom (Fig. 40-1), 
where they would pick up \kT from the molecules there and would shake, via the 
rod, the balls at the top and those would shake the molecules at the top. So, 
ultimately, of course, the temperature becomes the same at all heights in a gravita- 
tional field. 

If the temperature is the same at all heights, the problem is to discover by 
what law the atmosphere becomes tenuous as we go up. If N is the total number 
of molecules in a volume V of gas at pressure P, then we know PV = NkT , or 
p = nkT , where n = N/V is the number of molecules per unit volume. In other 
words, if we know the number of molecules per unit volume, we know the pressure, 
and vice versa: they are proportional to each other, since the temperature is con- 
stant in this problem. But the pressure is not constant, it must increase as the 
altitude is reduced, because it has to hold, so to speak, the weight of all the gas 
above it. That is the clue by which we may determine how the pressure changes 
with height. If we take a unit area at height h , then the vertical force from below, 
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Fig. 40-1. The pressure at height h 
must exceed that at h + dh by the 
weight of the intervening gas. 



on this unit area, is the pressure P. The vertical force per unit area pushing down 
at a height h + dh would be the same, in the absence of gravity, but here it is not, 
because the force from below must exceed the force from above by the weight of 
gas in the section between h and h + dh. Now mg is the force of gravity on each 
molecule, where g is the acceleration due to gravity, and n dh is the total number of 
molecules in the unit section. So this gives us the differential equation Ph+dh — 
p h = dP = —mgndh. Since P = nkT, and T is constant, we can eliminate either 
P or n, say P, and get 

dn _ _ mg 

dh ~ kT n 



Fig. 40-2. The normalized density 
as a function of height in the earth's 
gravitational field for oxygen and for 
hydrogen, at constant temperature. 


for the differential equation, which tells us how the density goes down as we go up 
in energy. 

We thus have an equation for the particle density n, which varies with height, 
but which has a derivative which is proportional to itself. Now a function which 
has a derivative proportional to itself is an exponential, and the solution of this 
differential equation is 

n = no e- mghlkT . (40.1) 

Here the constant of integration, n 0 , is obviously the density at h = 0 (which can 
be chosen anywhere), and the density goes down exponentially with height. 

Note that if we have different kinds of molecules with different masses, they 
go down with different exponentials. The ones which were heavier would decrease 
with altitude faster than the light ones. Therefore we would expect that because 
oxygen is heavier than nitrogen, as we go higher and higher in an atmosphere with 
nitrogen and oxygen the proportion of nitrogen would increase. This does not 
really happen in our own atmosphere, at least at reasonable heights, because there 
is so much agitation which mixes the gases back together again. It is not an 
isothermal atmosphere. Nevertheless, there is a tendency for lighter materials, 
like hydrogen, to dominate at very great heights in the atmosphere, because the 
lowest masses continue to exist, while the other exponentials have all died out 
(Fig. 40-2). 


40-2 The Boltzmann law 

Here we note the interesting fact that the numerator in the exponent of Eq. 
(40.1) is the potential energy of an atom. Therefore we can also state this particular 
law as : the density at any point is proportional to 

—(the potential energy of each atom/ kT) 

That may be an accident, i.e., may be true only for this particular case of a 
uniform gravitational field. However, we can show that it is a more general prop- 
osition. Suppose that there were some kind of force other than gravity acting 
on the molecules in a gas. For example, the molecules may be charged electrically, 
and may be acted on by an electric field or another charge that attracts them. Or, 
because of the mutual attractions of the atoms for each other, or for the wall, or 
for a solid, or something, there is some force of attraction which varies with 
position and which acts on all the molecules. Now suppose, for simplicity, that 
the molecules are all the same, and that the force acts on each individual one, so 
that the total force on a piece of gas would be simply the number of molecules 
times the force on each one. To avoid unnecessary complication, let us choose a 
coordinate system with the x-axis in the direction of the force, F. 

In the same manner as before, if we take two parallel planes in the gas, sepa- 
rated by a distance dx , then the force on each atom, times the n atoms per cm 3 
(the generalization of the previous nmg), times dx, must be balanced by the pressure 
change: Fn dx — dP = kT dn. Or, to put this law in a form which will be useful 
to us later, 
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For the present, observe that —Fdx is the work we would do in taking a molecule 
from xto x + dx, and if F comes from a potential, i.e., if the work done can be 
represented by a potential energy at all, then this would also be the difference in 
the potential energy (P.E.). The negative differential of potential energy is the 
work done, F dx, and we find that c/(ln n ) = — d(P.TL.)fkT, or, after integrating, 

n = (constant)e“ p,E/fcr . (40.3) 

Therefore what we noticed in a special case turns out to be true in general. (What 
if F does not come from a potential? Then (40.2) has no solution at all. Energy 
can be generated, or lost by the atoms running around in cyclic paths for which 
the work done is not zero, and no equilbrium can be maintained at all. Thermal 
equilibrium cannot exist if the external forces on the atoms are not conservative.) 
Equation (40.3), known as Boltzmann's law , is another of the principles of statistical 
mechanics: that the probability of finding molecules in a given spatial arrangement 
varies exponentially with the negative of the potential energy of that arrangement, 
divided by kT. 

This, then, could tell us the distribution of molecules: Suppose that we had a 
positive ion in a liquid, attracting negative ions around it, how many of them would 
be at different distances? If the potential energy is known as a function of distance, 
then the proportion of them at different distances is given by this law, and so on, 
through many applications. 

40-3 Evaporation of a liquid 

In more advanced statistical mechanics one tries to solve the following im- 
portant problem. Consider an assembly of molecules which attract each other, 
and suppose that the force between any two, say i and y, depends only on their 
separation /vy, and can be represented as the derivative of a potential function 
V(rij). Figure 40-3 shows a form such a function might have. For r > r 0 , the 
energy decreases as the molecules come together, because they attract, and then 
the energy increases very sharply as they come still closer together, because they 
repel strongly, which is characteristic of the way molecules behave, roughly 
speaking. 

Now suppose we have a whole box full of such molecules, and we would like 
to know how they arrange themselves on the average. The answer is e P E 
The total potential energy in this case would be the sum over all the pairs, supposing 
that the forces are all in pairs (there may be three-body forces in more complicated 
things, but in electricity, for example, the potential energy is all in pairs). Then the 
probability for finding molecules in any particular combination of /\/s will be 
proportional to 

ex p[ - E V ( r ii)/ kT \ 

i,3 

Now, if the temperature is very high, so that kT » |F(r 0 )|, the exponent is 
relatively small almost everywhere, and the probability of finding a molecule is 
almost independent of position. Let us take the case of just two molecules: 
the e ~ F E -l kT would be the probability of finding them at various mutual distances 
r. Clearly, where the potential goes most negative, the probability is largest, and 
where the potential goes toward infinity, the probability is almost zero, which 
occurs for very small distances. That means that for such atoms in a gas, there is 
no chance that they are on top of each other, since they repel so strongly. But there 
is a greater chance of finding them per unit volume at the point r 0 than at any other 
point. How much greater, depends on the temperature. If the temperature is 
very large compared with the difference in energy between r = r 0 and r = oo, 
the exponential is always nearly unity. In this case, where the mean kinetic energy 
(about kT) greatly exceeds the potential energy, the forces do not make much 
difference. But as the temperature falls, the probability of finding the molecules 
at the preferred distance r 0 gradually increases relative to the probability of finding 
them elsewhere and, in fact, if kT is much less than |F(r 0 )|, we have a relatively 
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Fig. 40-3. A potential-energy func- 
tion for two molecules, which depends 
only on their separation. 



large positive exponent in that neighborhood. In other words, in a given volume they 
are much more likely to be at the distance of minimum energy than far apart. As 
the temperature falls, the atoms fall together, clump in lumps, and reduce to 
liquids, and solids, and molecules, and as you heat them up they evaporate. 

The requirements for the determination of exactly how things evaporate, 
exactly how things should happen in a given circumstance, involve the following. 
First, to discover the correct molecular-force law V(r), which must come from 
something else, quantum mechanics, say, or experiment. But, given the law of 
force between the molecules, to discover what a billion molecules are going to do 
merely consists of studying the function e~ zv ^' /kT . Surprisingly enough, since it is 
such a simple function and such an easy idea, given the potential, the labor is 
enormously complicated ; the difficulty is the tremendous number of variables. 

In spite of such difficulties, the subject is quite exciting and interesting. It is 
often called an example of a “many-body problem,” and it really has been a very 
interesting thing. In that single formula must be contained all the details, for 
example, about the solidification of gas, or the forms of the crystals that the solid 
can take, and people have been trying to squeeze it out, but the mathematical 
difficulties are very great, not in writing the law, but in dealing with so enormous 
a number of variables. 

That then, is the distribution of particles in space. That is the end of classical 
statistical mechanics, practically speaking, because if we know the forces, we can, 
in principle, find the distribution in space, and the distribution of velocities is 
something that we can work out once and for all, and is not something that is 
different for the different cases. The great problems are in getting particular 
information out of our formal solution, and that is the main subject of classical 
statistical mechanics. 



Fig. 40-4. Only those molecules 
moving up at h = 0 with sufficient 
velocity can arrive at height fi. 


40-4 The distribution of molecular speeds 

Now we go on to discuss the distribution of velocities, because sometimes it is 
interesting or useful to know how many of them are moving at different speeds. 
In order to do that, we may make use of the facts which we discovered with regard 
to the gas in the atmosphere. We take it to be a perfect gas, as we have already 
assumed in writing the potential energy, disregarding the energy of mutual attrac- 
tion of the atoms. The only potential energy that we included in our first example 
was gravity. We would, of course, have something more complicated if there 
were forces between the atoms. Thus we assume that there are no forces between 
the atoms and, for a moment, disregard collisions also, returning later to the 
justification of this. Now we saw that there are fewer molecules at the height h 
than there are at the height 0; according to formula (40.1), they decrease expo- 
nentially with height. How can there be fewer at greater heights? After all, do 
not all the molecules which are moving up at height 0 arrive at hi No!, because 
some of those which are moving up at 0 are going too slowly, and cannot climb 
the potential hill to h. With that clue, we can calculate how many must be moving 
at various speeds, because from (40.1) we know how many are moving with less 
than enough speed to climb a given distance h. Those are just the ones that account 
for the fact that the density at h is lower than at 0. 

Now let us put that idea a little more precisely: let us count how many mole- 
cules are passing from below to above the plane h = 0 (by calling it height = 0, 
we do not mean that there is a floor there; it is just a convenient label, and there is 
gas at negative h). These gas molecules are moving around in every direction, but 
some of them are moving through the plane, and at any moment a certain number 
per second of them are passing through the plane from below to above with different 
velocities. Now we note the following: if we call u the velocity which is just needed 
to get up to the height h (kinetic energy mu 2 / 2 = mgh\ then the number of 
molecules per second which are passing upward through the lower plane in a 
vertical direction with velocity component greater than u is exactly the same as 
the number which pass through the upper plane with any upward velocity. Those 
molecules whose vertical velocity does not exceed u cannot get through the upper 
40-4 



plane. So therefore we see that 

Number passing h = 0 with v z > u = number passing h = h with v z > 0. 

But the number which pass through h with any velocity greater than 0 is less than 
the number which pass through the lower height with any velocity greater than 0, 
because the number of atoms is greater; that is all we need. We know already that 
the distribution of velocities is the same, after the argument we made earlier about 
the temperature being constant all the way through the atmosphere. So, since the 
velocity distributions are the same, and it is just that there are more atoms lower 
down, clearly the number n> 0 (/i), passing with positive velocity at height h, 
and the number n >0 ( 0), passing with positive velocity at height 0, are in the same 
ratio as the densities at the two heights, which is e ~ m9hlkT . But n>o(h) — n> u (0 ), 
and therefore we find that 

n> M (0) — mghjkT _ — mu 2 }2kT 

n> o(0) 

since \mu 2 — mgh. Thus, in words, the number of molecules per unit area per 
second passing the height 0 with a z-component of velocity greater than u is 
e -mu 2 i 2 kT times the total number that are passing through the plane with velocity 
greater than zero. 

Now this is not only true at the arbitrarily chosen height 0, but of course it is 
true at any other height, and thus the distributions of velocities are all the same! 
(The final statement does not involve the height h, which appeared only in the 
intermediate argument.) The result is a general proposition that gives us the distri- 
bution of velocities. It tells us that if we drill a little hole in the side of a gas pipe, 
a very tiny hole, so that the collisions are few and far between, i.e., are farther 
apart than the diameter of the hole, then the particles which are coming out will 
have different velocities, but the fraction of particles which come out at a velocity 
greater than u is e ~ mu2 i 2kT t 

Now we return to the question about the neglect of collisions: Why does it 
not make any difference? We could have pursued the same argument, not with a 
finite height h, but with an infinitesimal height h , which is so small that there would 
be no room for collisions between 0 and h. But that was not necessary : the argu- 
ment is evidently based on an analysis of the energies involved, the conservation 
of energy, and in the collisions that occur there is an exchange of energies among 
the molecules. However, we do not really care whether we follow the same mole- 
cule if energy is merely exchanged with another molecule. So it turns out that even 
if the problem is analyzed more carefully (and it is more difficult, naturally, to do a 
rigorous job), it still makes no difference in the result. 

It is interesting that the velocity distribution we have found is just 

^ a ^—kinetic energ yfkT (40.4) 

This way of describing the distribution of velocities, by giving the number of 
molecules that pass a given area with a certain minimum z-component, is not the 
most convenient way of giving the velocity distribution. For instance, inside the 
gas, one more often wants to know how many molecules are moving with a 
z-component of velocity between two given values, and that, of course, is not di- 
rectly given by Eq. (40.4). We would like to state our result in the more con- 
ventional form, even though what we already have written is quite general. Note 
that it is not possible to say that any molecule has exactly some stated velocity; 
none of them has a velocity exactly equal to 1.7962899173 meters per second. 
So in order to make a meaningful statement, we have to ask how many are to be 
found in some range of velocities. We have to say how many have velocities between 
1.796 and 1.797, and so on. On mathematical terms, let f{u) du be the fraction of 
all the molecules which have velocities between u and u + du or, what is the same 
thing (if du is infinitesimal), all that have a velocity u with a range du. Figure 40-5 
shows a possible form for the function /(w), and the shaded part, of width du and 
mean height /(«), represents this fraction f(u) du. That is, the ratio of the shaded 
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f(u) 



Fig. 40-5. A velocity distribution 
function. The shaded area is f(u) du, the 
fraction of particles having velocities 
within a range du about u. 



area to the total area of the curve is the relative proportion of molecules with 
velocity u within du. If we define /(u) so that the fraction having a velocity in this 
range is given directly by the shaded area, then the total area must be 100 percent 
of them, that is, 

J_^ f(u)du = 1. (40.5) 

Now we have only to get this distribution by comparing it with the theorem 
we derived before. First we ask, what is the number of molecules passing through 
an area per second with a velocity greater than u, expressed in terms of /(«)? 
At first we might think it is merely the integral of /“/(«) du, but it is not, because 
we want the number that are passing the area per second. The faster ones pass 
more often, so to speak, than the slower ones, and in order to express how many 
pass, you have to multiply by the velocity. (We discussed that in the previous 
chapter when we talked about the number of collisions.) In a given time t the 
total number which pass through the surface is all of those which have been able 
to arrive at the surface, and the number which arrive come from a distance ut. 
So the number of molecules which arrive is not simply the number which are there, 
but the number that are there per unit volume, multiplied by the distance that 
they sweep through in racing for the area through which they are supposed to go, 
and that distance is proportional to u. Thus we need the integral of u times /(«) du, 
an infinite integral with a lower limit u, and this must be the same as we found 
before, namely e mu l2kT , with a proportionality constant which we will get later: 

J uf(u) du = const • e - m “ 2/2fcr (40.6) 

Now if we differentiate the integral with respect to u, we get the thing that is 
inside the integral, i.e., the integrand (with a minus sign, since u is the lower limit), 
and if we differentiate the other side, we get u times the same exponential (and 
some constants). The w’s cancel and we find 

f(u) du = Ce~ mu2/2kT du. (40.7) 

We retain the du on both sides as a reminder that it is a distribution, and it tells 
what the proportion is for velocity between u and u + du. 

The constant C must be so determined that the integral is unity, according to 
Eq. (40.5). Now we can prove* that 

/ « 2 

e dx — V 7r. 

-00 

Using this fact, it is easy to find that C = V m/lirkT . 

Since velocity and momentum are proportional, we may say that the distribu- 
tion of momenta is also proportional to e~ K,E, /* r per unit momentum range. 

It turns out that this theorem is true in relativity too, if it is in terms of momentum, 
while if it is in velocity it is not, so it is best to learn it in momentum instead of in 
velocity: 

f(p)dp = Ce~ KE/kT dp. (40.8) 

So we find that the probabilities of different conditions of energy, kinetic and 
potential, are both given by e -«»ergy/*r a very easy thing tQ remem b er anc j a rat h er 
beautiful proposition. 


* To get the value of the integral, let 

I = }V dx. 

Then 

which is a double integral over the whole xy-plane. But this can also be written in polar 
coordinates as 

/2 = Jo e ~ r 2 ’ = i r So e-*dt = tt. 
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So far we have, of course, only the distribution of the velocities “vertically.” 
We might want to ask, what is the probability that a molecule is moving in another 
direction? Of course these distributions are connected, and one can obtain the 
complete distribution from the one we have, because the complete distribution 
depends only on the square of the magnitude of the velocity, not upon the z-com- 
ponent. It must be something that is independent of direction, and there is only 
one function involved, the probability of different magnitudes. We have the 
distribution of the z-component, and therefore we can get the distribution of the 
other components from it. The result is that the probability is still proportional to 
e -K.E.jkT> k ut nQW ^ kj ne ti c energy involves three parts, mvf/2, mv 2 y /2 , and 
mv 2 z j 2, summed in the exponent. Or we can write it as a product: 

f(v x , Oy, ^ 2 ) d\)x dv<y do 2 

oc • e ~ mv l l2kT • dVx dVy dVz (40.9) 

You can see that this formula must be right because, first, it is a function only of 
v 2 , as required, and second, the probabilities of various values of v z obtained by 
integrating over all v x and v y is just (40.7). But this one function (40.9) can do 
both those things! 

40-5 The specific heats of gases 

Now we shall look at some ways to test the theory, and to see how 
successful is the classical theory of gases. We saw earlier that if U is the internal 
energy of N molecules, then PV = NkT = (7 - 1 )U holds, sometimes, for 
some gases, maybe. If it is a monatomic gas, we know this is also equal to § 
of the kinetic energy of the center-of-mass motion of the atoms. If it is a monatomic 
gas, then the kinetic energy is equal to the internal energy, and therefore 7 — 

1 = ^. But suppose it is, say, a more complicated molecule, that can spin and 
vibrate, and let us suppose (it turns out to be true according to classical mechanics) 
that the energies of the internal motions are also proportional to kT. Then at a 
given temperature, in addition to kinetic energy kT, it has internal vibrational or 
rotational energy. So the total U includes not just the internal kinetic energy, but 
also the rotational energy, and we get a different value of 7. Technically, the 
best way to measure 7 is by measuring the specific heat, which is the change in 
energy with temperature. We will return to that approach later. For our present 
purposes, we may suppose 7 is found experimentally from the PV y curve for 
adiabatic compression. 

Let us make a calculation of 7 for some cases. First, for a monatomic gas 
U is the total energy, the same as the kinetic energy, and we know already that 
7 should be § . For a diatomic gas, we may take, as an example, oxygen, hydrogen 
iodide, hydrogen, etc., and suppose that the diatomic gas can be represented as 
two atoms held together by some kind of force like the one of Fig. 40-3. We may 
also suppose, and it turns out to be quite true, that at the temperatures that are 
of interest for the diatomic gas, the pairs of atoms tend strongly to be separated 
by r 0 , the distance of potential minimum. If this were not true, if the probability 
were not strongly varying enough to make the great majority sit near the bottom, 
we would have to remember that oxygen gas is a mixture of 0 2 and single oxygen 
atoms in a nontrivial ratio. We know that there are, in fact, very few single oxygen 
atoms, which means that the potential energy minimum is very much greater in 
magnitude than kT , as we have seen. Since they are clustered strongly around 
r 0 , the only part of the curve that is needed is the part near the minimum, which 
may be approximated by a parabola. A parabolic potential implies a harmonic 
oscillatpr, and in fact, to an excellent approximation, the oxygen molecule can be 
represented as two atoms connected by a spring. 

Now what is the total energy of this molecule at temperature T1 We know 
that for each of the two atoms, each of the kinetic energies should be f kT, so the 
kinetic energy of both of them is %kT + f/cT. We can also put this in a different 
way : the same f plus f can also be looked at as kinetic energy of the center of mass 
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Table 40-1 

Values of the specific heat ratio, 7, 
for various gases 


Gas 

T (°C) 

7 

He 

—180 

1.660 

Kr 

19 

1.68 

Ar 

15 

1.668 

h 2 

100 

1.404 

o 2 

100 

1.399 

HI 

100 

1.40 

Br 2 

300 

1.32 

I 2 

185 

1.30 

nh 3 

15 

1.310 

c 2 h 6 

15 

1.22 



Fig. 40-6. Experimental values of 
7 as a function of temperature for hydro- 
gen and oxygen. Classical theory 
predicts 7 — 1.286, independent of 
temperature. 


(f), kinetic energy of rotation (f), and kinetic energy of vibration (£). We know 
that the kinetic energy of vibration is since there is just one dimension involved 
and each degree of freedom has %kT. Regarding the rotation, it can turn about 
either of two axes, so there are two independent motions. We assume that the 
atoms are some kind of points, and cannot spin about the line joining them; 
this is something to bear in mind, because if we get a disagreement, maybe that is 
where the trouble is. But we have one more thing, which is the potential energy of 
vibration; how much is that? In a harmonic oscillator the average kinetic energy 
and average potential energy are equal, and therefore the potential energy of 
vibration is \kT , also. The grand total of energy is U = %kT, or kT is f U per 
atom. That means, then, that 7 is f instead of f, i.e., 7 = 1.286. 

We may compare these numbers with the relevant measured values shown in 
Table 40-1. Looking first at helium, which is a monatomic gas, we find very 
nearly f , and the error is probably experimental, although at such a low temperature 
there may be some forces between the atoms. Krypton and argon, both monatomic, 
agree also within the accuracy of the experiment. 

We turn to the diatomic gases and find hydrogen with 1.404, which does not 
agree with the theory, 1.286. Oxygen, 1.399, is very similar, but again not in 
agreement. Hydrogen iodide again is similar at 1.40. It begins to look as though 
the right answer is 1.40, but it is not, because if we look further at bromine we see 
1.32, and at iodine we see 1.30. Since 1.30 is reasonably close to 1.286, iodine 
may be said to agree rather well, but oxygen is far off. So here we have a dilemma. 
We have it right for one molecule, we do not have it right for another molecule, 
and we may need to be pretty ingenious in order to explain both. 

Let us look further at a still more complicated molecule with large numbers 
of parts, for example, C 2 H 6 , which is ethane. It has eight different atoms, and they 
are all vibrating and rotating in various combinations, so the total amount of 
internal energy must be an enormous number of kTs, at least 12 kT for kinetic 
energy alone, and 7 — 1 must be very close to zero, or 7 almost exactly 1. In 
fact, it is lower, but 1.22 is not so much lower, and is higher than the 1^ calculated 
from the kinetic energy alone, and it is just not understandable! 

Furthermore, the whole mystery is deep, because the diatomic molecule cannot 
be made rigid by a limit. Even if we made the couplings stiff er indefinitely, although 
it might not vibrate much, it would nevertheless keep vibrating. The vibrational 
energy inside is still kT , since it does not depend on the strength of the coupling. 
But if we could imagine absolute rigidity, stopping all vibration to eliminate a 
variable, then we would get U = f kT and 7 = 1.40 for the diatomic case. This 
looks good for H 2 or 0 2 . On the other hand, we would still have problems, 
because 7 for either hydrogen or oxygen varies with temperature! From the 
measured values shown in Fig. 40-6, we see that for H 2 , 7 varies from about 1.6 
at — 185°C to 1.3 at 2000°C. The variation is more substantial in the case of 
hydrogen than for oxygen, but nevertheless, even in oxygen, 7 tends definitely to 
go up as we go down in temperature. 

40-6 The failure of classical physics 

So, all in all, we might say that we have some difficulty. We might try some 
force law other than a spring, but it turns out that anything else will only make 
7 higher. If we include more forms of energy, 7 approaches unity more closely, 
contradicting the facts. All the classical theoretical things that one can think of 
will only make it worse. The fact is that there are electrons in each atom, and we 
know from their spectra that there are internal motions ; each of the electrons should 
have at least \kT of kinetic energy, and something for the potential energy, so 
when these are added in, 7 gets still smaller. It is ridiculous. It is wrong. 

The first great paper on the dynamical theory of gases was by Maxwell in 
1859. On the basis of ideas we have been discussing, he was able accurately to 
explain a great many known relations, such as Boyle’s law, the diffusion theory, 
the viscosity of gases, and things we shall talk about in the next chapter. He listed 
all these great successes in a final summary, and at the end he said, “Finally, by 
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establishing a necessary relation between the motions of translation and rotation 
(he is talking about the \kT theorem) of all particles not spherical, we proved that 
a system of such particles could not possibly satisfy the known relation between 
the two specific heats.” He is referring to 7 (which we shall see later is related to 
two ways of measuring specific heat), and he says we know we cannot get the right 
answer. 

Ten years later, in a lecture, he said, “I have now put before you what I 
consider to be the greatest difficulty yet encountered by the molecular theory.” 
These words represent the first discovery that the laws of classical physics were 
wrong. This was the first indication that there was something fundamentally 
impossible, because a rigorously proved theorem did not agree with experiment. 
About 1890, Jeans was to talk about this puzzle again. One often hears it said that 
physicists at the latter part of the nineteenth century thought they knew all the 
significant physical laws and that all they had to do was to calculate more decimal 
places. Someone may have said that once, and others copied it. But a thorough 
reading of the literature of the time shows they were all worrying about something. 
Jeans said about this puzzle that it is a very mysterious phenomenon, and it seems 
as though as the temperature falls, certain kinds of motions “freeze out.” 

If we could assume that the vibrational motion, say, did not exist at low tem- 
perature and did exist at high temperature, then we could imagine that a gas might 
exist at a temperature sufficiently low that vibrational motion does not occur, 
so 7 = 1.40, or a higher temperature at which it begins to come in, so 7 falls. 
The same might be argued for the rotation. If we can eliminate the rotation, say 
it “freezes out” at sufficiently low temperature, then we can understand the fact 
that the 7 of hydrogen approaches 1.66 as we go down in temperature. How can 
we understand such a phenomenon? Of course that these motions “freeze out” 
cannot be understood by classical mechanics. It was only understood when quan- 
tum mechanics was discovered. 

Without proof, we may state the results for statistical mechanics of the 
quantum-mechanical theory. We recall that according to quantum mechanics, a 
system which is bound by a potential, for the vibrations, for example, will have a 
discrete set of energy levels, i.e., states of different energy. Now the question is: 
how is statistical mechanics to be modified according to quantum-mechanical 
theory? It turns out, interestingly enough, that although most problems are more 
difficult in quantum mechanics than in classical mechanics, problems in statistical 
mechanics are much easier in quantum theory! The simple result we have in classi- 
cal mechanics, that n = n 0 e ~ eneTgylkT , becomes the following very important 
theorem: If the energies of the set of molecular states are called, say, Ei, E 2 , 
. . . , E if . . . , then in thermal equilibrium the probability of finding a molecule 
in the particular state of having energy E { is proportional to e ~ E ^ !kT . That gives 
the probability of being in various states. In other words, the relative chance, the 
probability, of being in state E x relative to the chance of being in state E 0 , is 


Pi 

>o 


which, of course, is the same as 


e -Ei IkT 
“ e -E Q ikT ’ 

( 40 . 10 ) 

—(Ei—Eo)lkT 
riQC , 

( 40 . 11 ) 


since P 1 - n x /N and P 0 = nJN. So it is less likely to be in a higher energy state 
than in a lower one. The ratio of the number of atoms in the upper state to the 
number in the lower state is e raised to the power (minus the energy difference, 
over kT ) — a very simple proposition. 

Now it turns out that for a harmonic oscillator the energy levels are evenly 
spaced. Calling the lowest energy E$ — 0 (it actually is not zero, it is a little differ- 
ent, but it does not matter if we shift all energies by a constant), the first one is 
then Ei = ftco, and the second one is 2fto>, and the third one is 3fko, and so on. 

Now let us see what happens. We suppose we are studying the vibrations of a 
diatomic molecule, which we approximate as a harmonic oscillator. Let us ask 
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what is the relative chance of finding a molecule in state Ex instead of in state E 0 . 
The answer is that the chance of finding it in state Ex, relative to that of finding it in 
state E 0 , goes down as e ~ n “ lkT . Now suppose that kT is much less than fua, and 
we have a low-temperature circumstance. Then the probability of its being in state 
Ex is extremely small. Practically all the atoms are in state E 0 . If we change the 
temperature but still keep it very small, then the chance of its being in state Ex = 
hoi remains infinitesimal — the energy of the oscillator remains nearly zero; it does 
not change with temperature so long as the temperature is much less than hoi. 
All oscillators are in the bottom state, and their motion is effectively “frozen”; 
there is no contribution of it to the specific heat. We can judge, then, from Table 
40-1, that at 100°C, which is 373 degrees absolute, kT is much less than the vibra- 
tional energy in the oxygen or hydrogen molecules, but not so in the iodine mole- 
cule. The reason for the difference is that an iodine atom is very heavy, compared 
with hydrogen, and although the forces may be comparable in iodine and hydro- 
gen, the iodine molecule is so heavy that the natural frequency of vibration is 
very low compared with the natural frequency of hydrogen. With hoi higher than 
kT at room temperature for hydrogen, but lower for iodine, only the latter, iodine, 
exhibits the classical vibrational energy. As we increase the temperature of a gas, 
starting from a very low value of T, with the molecules almost all in their lowest 
state, they gradually begin to have an appreciable probability to be in the second 
state, and then in the next state, and so on. When the probability is appreciable 
for many states, the behavior of the gas approaches that given by classical physics, 
because the quantized states become nearly indistinguishable from a continuum 
of energies, and the system can have almost any energy. Thus, as the temperature 
rises, we should again get the results of classical physics, as indeed seems to be the 
case in Fig. 40-6. It is possible to show in the same way that the rotational states 
of atoms are also quantized, but the states are so much closer together that in 
ordinary circumstances kT is bigger than the spacing. Then many levels are excited, 
and the rotational kinetic energy in the system participates in the classical way. 
The one example where this is not quite true at room temperature is for hydrogen. 

This is the first time that we have really deduced, by comparison with experi- 
ment, that there was something wrong with classical physics, and we have looked 
for a resolution of the difficulty in quantum mechanics in much the same way as 
it was done originally. It took 30 or 40 years before the next difficulty was dis- 
covered, and that had to do again with statistical mechanics, but this time the 
mechanics of a photon gas. That problem was solved by Planck, in the early years 
of this century. 
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41 


The Brownian Movement 


41-1 Equipartition of energy 

The Brownian movement was discovered in 1827 by Robert Brown, a botanist. 
While he was studying microscopic life, he noticed little particles of plant pollens 
jiggling around in the liquid he was looking at in the microscope, and he was wise 
enough to realize that these were not living, but were just little pieces of dirt moving 
around in the water. In fact he helped to demonstrate that this had nothing to do 
with life by getting from the ground an old piece of quartz in which there was 
some water trapped. It must have been trapped for millions and millions of years, 
but inside he could see the same motion. What one sees is that very tiny particles 
are jiggling all the time. 

This was later proved to be one of the effects of molecular motion , and we can 
understand it qualitatively by thinking of a great push ball on a playing field, 
seen from a great distance, with a lot of people underneath, all pushing the ball 
in various directions. We cannot see the people because we imagine that we are 
too far away, but we can see the ball, and we notice that it moves around rather 
irregularly. We also know, from the theorems that we have discussed in previous 
chapters, that the mean kinetic energy of a small particle suspended in a liquid 
or a gas will be %kT even though it is very heavy compared with a molecule. If it 
is very heavy, that means that the speeds are relatively slow, but it turns out, 
actually, that the speed is not really so slow. In fact, we cannot see the speed of 
such a particle very easily because although the mean kinetic energy is f kT, which 
represents a speed of a millimeter or so per second for an object a micron or two 
in diameter, this is very hard to see even in a microscope, because the particle 
continuously reverses its direction and does not get anywhere. How far it does get 
we will discuss at the end of the present chapter. This problem was first solved by 
Einstein at the beginning of the present century. 

Incidentally, when we say that the mean kinetic energy of this particle is 
f&r, we claim to have derived this result from the kinetic theory, that is, from 
Newton’s laws. We shall find that we can derive all kinds of things — marvelous 
things — from the kinetic theory, and it is most interesting that we can apparently 
get so much from so little. Of course we do not mean that Newton’s laws are “little” 
— they are enough to do it, really — what we mean is that we did not do very much. 
How do we get so much out? The answer is that we have been perpetually making 
a certain important assumption, which is that if a given system is in thermal 
equilibrium at some temperature, it will also be in thermal equilibrium with 
anything else at the same temperature. For instance, if we wanted to see how a 
particle would move if it was really colliding with water, we could imagine that 
there was a gas present, composed of another kind of particle, little fine pellets 
that (we suppose) do not interact with water, but only hit the particle with “hard” 
collisions. Suppose the particle has a prong sticking out of it; all our pellets have 
to do is hit the prong. We know all about this imaginary gas of pellets at tempera- 
ture T — it is an ideal gas. Water is complicated, but an ideal gas is simple. Now, 
our particle has to be in equilibrium with the gas of pellets. Therefore, the mean 
motion of the particle must be what we get for gaseous collisions, because if it 
were not moving at the right speed relative to the water but, say, was moving 
faster, that would mean that the pellets would pick up energy from it and get 
hotter than the water. But we had started them at the same temperature, and we 
assume that if a thing is once in equilibrium, it stays in equilibrium-parts of it 
do not get hotter and other parts colder, spontaneously. 


41-1 Equipartition of energy 
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41-3 Equipartition and the quantum 
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<°) (b) 

Fig. 41-1. (a) A sensitive light-beam 

galvanometer. Light from a source L is 
reflected from a small mirror onto a 
scale, (b) A schematic record of the 
reading of the scale as a function of 
the time. 



(a) ( b) 


Fig. 41-2. A high-Q resonant circuit. 

(a) Actual circuit, at temperature T. 

(b) Artificial circuit, with an ideal (noise- 
less) resistance and a “noise generator*’ 
G. 


This proposition is true and can be proved from the laws of mechanics, but 
the proof is very complicated and can be established only by using advanced 
mechanics. It is much easier to prove in quantum mechanics than it is in classical 
mechanics. It was proved first by Boltzmann, but for now we simply take it to be 
true, and then we can argue that our particle has to have f A T of energy if it is hit 
with artificial pellets, so it also must have § kT when it is being hit with water at 
the same temperature and we take away the pellets; so it is fAT. It is a strange 
line of argument, but perfectly valid. 

In addition to the motion of colloidal particles for which the Brownian move- 
ment was first discovered, there are a number of other phenomena, both in the 
laboratory and in other situations, where one can see Brownian movement. If we 
are trying to build the most delicate possible equipment, say a very small mirror 
on a thin quartz fiber for a very sensitive ballistic galvanometer (Fig. 41-1), the 
mirror does not stay put, but jiggles all the time — all the time — so that when we 
shine a light on it and look at the position of the spot, we do not have a perfect 
instrument because the mirror is always jiggling. Why ? Because the average kinetic 
energy of rotation of this mirror has to be, on the average, %kT. 

What is the mean-square angle over which the mirror will wobble? Suppose 
we find the natural vibration period of the mirror by tapping on one side and 
seeing how long it takes to oscillate back and forth, and we also know the moment 
of inertia, I. We know the formula for the kinetic energy of rotation— it is given 
by Eq. (19.8): T = J/co 2 . That is the kinetic energy, and the potential energy that 
goes with it will be proportional to the square of the angle— it is V = ±a6 2 . 
But, if we know the period t 0 and calculate from that the natural frequency o> 0 = 
27 t// 0 , then the potential energy is V = ^Ico^d 2 . Now we know that the average 
kinetic energy is hut since it is a harmonic oscillator the average potential 
energy is also \kT. Thus 

Wo(e 2 ) = 

or 

(6 2 ) = kT/Ia>l (41.1) 

In this way we can calculate the oscillations of a galvanometer mirror, and thereby 
find what the limitations of our instrument will be. If we want to have smaller 
oscillations, we have to cool the mirror. An interesting question is, where to 
cool it. This depends upon where it is getting its “kicks” from. If it is through the 
fiber, we cool it at the top — if the mirror is surrounded by a gas and is getting hit 
mostly by collisions in the gas, it is better to cool the gas. As a matter of fact, if 
we know where the damping of the oscillations comes from, it turns out that that is 
always the source of the fluctuations also, a point which we will come back to. 

The same thing works, amazingly enough, in electrical circuits . Suppose that 
we are building a very sensitive, accurate amplifier for a definite frequency and have 
a resonant circuit (Fig. 41-2) in the input so as to make it very sensitive to this 
certain frequency, like a radio receiver, but a really good one. Suppose we wish 
to go down to the very lowest limit of things, so we take the voltage, say off the 
inductance, and send it into the rest of the amplifier. Of course, in any circuit 
like this, there is a certain amount of loss. It is not a perfect resonant circuit, but 
it is a very good one and there is a little resistance, say (we put the resistor in so we 
can see it, but it is supposed to be small). Now we would like to find out: How 
much does the voltage across the inductance fluctuate? Answer: We know that 
2 LI is the kinetic energy” — the energy associated with a coil in a resonant circuit 
(Chapter 25). Therefore the mean value of \LI 2 is equal to \kT — that tells us what 
the rms current is and we can find out what the rms voltage is from the rms current. 
For if we want the voltage across the inductance the formula is V L = /«£/, and 
the mean absolute square voltage on the inductance is (Vt) = L 2 o>l(I 2 ), and 
putting in iL(I 2 ) = \kT, we obtain 

(Vl) = lM%kT. (41.2) 

So now we can design circuits and tell when we are going to get what is called 
Johnson noise , the noise associated with thermal fluctuations! 
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Where do the fluctuations come from this time? They come again from the 
resistor — they come from the fact that the electrons in the resistor are jiggling 
around because they are in thermal equilibrium with the matter in the resistor, and 
they make fluctuations in the density of electrons. They thus make tiny electric 
fields which drive the resonant circuit. 

Electrical engineers represent the answer in another way. Physically, the 
resistor is effectively the source of noise. However, we may replace the real circuit 
having an honest, true physical resistor which is making noise, by an artificial 
circuit which contains a little generator that is going to represent the noise, and 
now the resistor is otherwise ideal — no noise comes from it. All the noise is in 
the artificial generator. And so if we knew the characteristics of the noise generated 
by a resistor, if we had the formula for that, then we could calculate what the circuit 
is going to do in response to that noise. So, we need a formula for the noise 
fluctuations. Now the noise that is generated by the resistor is at all frequencies, 
since the resistor by itself is not resonant. Of course the resonant circuit only 
“listens” to the part that is near the right frequency, but the resistor has many 
different frequencies in it. We may describe how strong the generator is, as follows: 
The mean power that the resistor would absorb if it were connected directly across 
the noise generator would be (. E 2 )/R , if E were the voltage from the generator. 
But we would like to know in more detail how much power there is at every fre- 
quency. There is very little power in any one frequency; it is a distribution. Let 
P( co) do) be the power that the generator would deliver in the frequency range do) 
into the very same resistor. Then we can prove (we shall prove it for another case, 
but the mathematics is exactly the same) that the power comes out 

P(o))dco = (: 2/ir)kTdu >, (41.3) 

and is independent of the resistance when put this way. 


41-2 Thermal equilibrium of radiation 

Now we go on to consider a still more advanced and interesting proposition 
that is as follows. Suppose we have a charged oscillator like those we were talking 
about when we were discussing light, let us say an electron oscillating up and down 
in an atom. If it oscillates up and down, it radiates light. Now suppose that this 
oscillator is in a very thin gas of other, atoms, and that from time to time the atoms 
collide with it. Then in equilibrium, after a long time, this oscillator will pick up 
energy such that its kinetic energy of oscillation is \kT , and since it is a harmonic 
oscillator, its entire energy of motion will become kT. That is, of course, a wrong 
description so far, because the oscillator carries electric charge , and if it has an 
energy kT it is shaking up and down and radiating light. Therefore it is impossible 
to have equilibrium of real matter alone without the charges in it emitting light, 
and as light is emitted, energy flows away, the oscillator loses its kT as time goes 
on, and thus the whole gas which is colliding with the oscillator gradually cools 
off. And that is, of course, the way a hot stove cools on a cold night by radiating 
the light into the sky, because the atoms are jiggling their charge and they contin- 
ually radiate, and slowly, because of this radiation, the jiggling motion slows down. 

On the other hand, if we enclose the whole thing in a box so that the light does 
not go away to infinity, then we can eventually get thermal equilibrium. We may 
either put the gas in a box where we can say that there are other radiators in the 
box walls sending light back or, to take a nicer example, we may suppose the box 
has mirror walls. It is easier to think about that case. Thus we assume that all the 
radiation that goes out from the oscillator keeps running around in the box. 
Then, of course, it is true that the oscillator starts to radiate, but pretty soon it 
can maintain its kT of kinetic energy in spite of the fact that it is radiating, because 
it is being illuminated, we may say, by its own light reflected from the walls of the 
box. That is, after a while there is a great deal of light rushing around in the box, 
and although the oscillator is radiating some, the light comes back and returns 
some of the energy that was radiated. 
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We shall now determine how much light there must be in such a box at tempera- 
ture T in order that the shining of the light on this oscillator will generate just 
enough energy to account for the light it radiated. 

Let the gas atoms be very few and far between, so that we have an ideal 
oscillator with no resistance except radiation resistance. Then we consider that at 
thermal equilibrium the oscillator is doing two things at the same time. First, it 
has a mean energy kT , and we calculate how much radiation it emits. Second, this 
radiation should be exactly the amount that would result because of the fact that 
the light shining on the oscillator is scattered. Since there is nowhere else the energy 
can go, this effective radiation is really just scattered light from the light that is 
in there. 

Thus we first calculate the energy that is radiated by the oscillator per second, 
if the oscillator has a certain energy. (We borrow from Chapter 32 on radiation 
resistance a number of equations without going back over their derivation.) The 
energy radiated per radian divided by the energy of the oscillator is called l/Q 
(Eq. 32.8): l/Q = (dW/dt)/a) 0 W. Using the quantity 7, the damping constant, 
this can also be written as l/Q = 7/co 0 , where co 0 is the natural frequency of the 
oscillator — if gamma is very small, Q is very large. The energy radiated per second 
is then 


dW = wo W 
dt Q 


o> 0 Wy 

too 


= 7 W. 


(41.4) 


The energy radiated per second is thus simply gamma times the energy of the 
oscillator. Now the oscillator should have an average energy kT , so we see that 
gamma kT is the average amount of energy radiated per second : 


(dW/dt) = 7 kT. 


(41.5) 


Now we only have to know what gamma is. Gamma is easily found from Eq. 
(32.12). It is 


_ co 0 __ 2 rotoo 

2 3c 


(41.6) 


where r 0 = e 2 /mc 2 is the classical electron radius, and we have set X = 2wc/o) 0 . 

Our final result for the average rate of radiation of light near the frequency 
to o is therefore 


dW _ 2 r 0 a>tkT 
dt 3 c 


(41.7) 


Next we ask how much light must be shining on the oscillator. It must be 
enough that the energy absorbed from the light (and thereupon scattered) is just 
exactly this much. In other words, the emitted light is accounted for as scattered 
light from the light that is shining on the oscillator in the cavity. So we must now 
calculate how much light is scattered from the oscillator if there is a certain amount 
— unknown — of radiation incident on it. Let /(co) c/co be the amount of light energy 
there is at the frequency co, within a certain range c/co (because there is no light at 
exactly a certain frequency ; it is spread all over the spectrum). So /(co) is a certain 
spectral distribution which we are now going to find — it is the color of a furnace at 
temperature T that we see when we open the door and look in the hole. Now 
how much light is absorbed? We worked out the amount of radiation absorbed 
from a given incident light beam, and we calculated it in terms of a cross section . 
It is just as though we said that all of the light that falls on a certain cross section 
is absorbed. So the total amount that is re-radiated (scattered) is the incident 
intensity /(co) c/co multiplied by the cross section cr. 

The formula for the cross section which we derived (Eq. 31.19) did not have 
the damping included. It is not hard to go through the derivation again and put in 
the resistance term which we neglected. If we do that, and calculate the cross 
section the same way, we get 


= 87TTo 

3 


co 


4 


(41.8) 
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(co 2 - CO 2 ) 2 + 7 2 co 2 



Now, as a function of frequency, <x s is of significant size only for co very near 
to the natural frequency co 0 . (Remember that the Q for a radiating oscillator is 
about 10 8 .) The oscillator scatters very strongly when co is equal to co 0 , and very 
weakly for other values of co. Therefore we can replace co by co 0 and co 2 — coq 
by 2co 0 (co — co 0 ), and we get 


<r s 


lirrWo 

3[(co - co 0 ) 2 + Y 2 /4] 


(41.9) 


Now the whole curve is localized near co = co 0 . (We do not really have to make 
any approximations, but it is much easier to do the integrals if we simplify the 
equation a bit.) Now we multiply the intensity in a given frequency range by the 
cross section of scattering, to get the amount of energy scattered in the range c/co. 
The total energy scattered is then the integral of this for all co. Thus 


dW 9 

dt 



/(co)cr s ( co) c/co 

lTrr%o)%I{(S) da) 
3[(co - co 0 ) 2 + 774] 


(41.10) 


Now we set dW s /dt = 37 kT. Why threel Because when we made our 
analysis of the cross section in Chapter 32, we assumed that the polarization was 
such that the light could drive the oscillator. If we had used an oscillator which 
could move only in one direction, and the light, say, was polarized in the wrong 
way, it would not give any scattering. So we must either average the cross section 
of an oscillator which can go only in one direction, over all directions of incidence 
and polarization of the light or, more easily, we can imagine an oscillator which 
will follow the field no matter which way the field is pointing. Such an oscillator, 
which can oscillate equally in three directions, would have 3 kT average energy 
because there are 3 degrees of freedom in that oscillator. So we should use 37 kT 
because of the 3 degrees of freedom. 

Now we have to do the integral. Let us suppose that the unknown spectral 
distribution /(co) of the light is a smooth curve and does not vary very much across 
the very narrow frequency region where <r s is peaked (Fig. 41-3). Then the only 
significant contribution comes when co is very close to co 0 , within an amount 
gamma, which is very small. So therefore, although /(co) may be an unknown and 
complicated function, the only place where it is important is near co = co 0 , and 
there we may replace the smooth curve by a flat one — a “constant” — at the same 
height. In other words, we simply take /(co) outside the integral sign and call it 
/(co 0 ). We may also take the rest of the constants out in front of the integral, and 
what we have left is 

£ (a _ J; + y2/4 - mt. (4U1) 


Now, the integral should go from 0 to 00 , but 0 is so far from co 0 that the curve is 
all finished by that time, so we go instead to minus 00 — it makes no difference and 
it is much easier to do the integral. The integral is an inverse tangent function of 
the form fdx/(x 2 + a 2 ). If we look it up in a book we see that it is equal to 
ir/a. So what it comes to for our case is 27r/7. Therefore we get, with some re- 
arranging, 


/(«o) 


97 2 kT 

47T 2 /*oCOo 


(41.12) 


Then we substitute the formula (41.6) for gamma (do not worry about writing 
co 0 ; since it is true of any co 0 , we may just call it co) and the formula for /(co) then 
comes out 


/(co) = 


c A t 

7T 2 C 2 


(41.13) 


And that gives us the distribution of light in a hot furnace. It is called the black - 
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Fig. 41-3. The factors in the integrand 
(41.1 0). The peak is the resonance curve 
l/(co — coo ) 2 + 7 2 /4. To a good ap- 
proximation the factor /(co) can be 
replaced by /(coo). 




Fig. 41—4. The blackbody intensity 
distribution at two temperatures, accord- 
ing to classical physics (solid curves). The 
dashed curves show the actual distribution. 


body radiation . Black, because the hole in the furnace that we look at is black when 
the temperature is zero. 

Inside a closed box at temperature T, (41.13) is the distribution of energy of 
the radiation, according to classical theory. First, let us notice a remarkable feature 
of that expression. The charge of the oscillator, the mass of the oscillator, all 
properties specific to the oscillator, cancel out , because once we have reached equilib- 
rium with one oscillator, we must be at equilibrium with any other oscillator 
of a different mass, or we will be in trouble. So this is an important kind of check 
on the proposition that equilibrium does not depend on what we are in equilibrium 
with, but only on the temperature . Now let us draw a picture of the /(w) curve 
(Fig. 41-4). It tells us how much light we have at different frequencies. 

The amount of intensity that there is in our box, per unit frequency range, 
goes, as we see, as the square of the frequency, which means that if we have a box 
at any temperature at all, and if we look at the x-rays that are coming out, there 
will be a lot of them! 

Of course we know this is false. When we open the furnace and take a look 
at it, we do not burn our eyes out from x-rays at all. It is completely false. Further- 
more, the total energy in the box, the total of all this intensity summed over all 
frequencies, would be the area under this infinite curve. Therefore, something is 
fundamentally, powerfully, and absolutely wrong. 

Thus was the classical theory absolutely incapable of correctly describing the 
distribution of light from a blackbody, just as it was incapable of correctly de- 
scribing the specific heats of gases. Physicists went back and forth over this deriva- 
tion from many different points of view, and there is no escape. This is the pre- 
diction of classical physics. Equation (41.13) is called Rayleigh's law , and it is the 
prediction of classical physics, and is obviously absurd. 

41-3 Equipartition and the quantum oscillator 

The difficulty above was another part of the continual problem of classical 
physics, which started with the difficulty of the specific heat of gases, and now has 
been focused on the distribution of light in a blackbody. Now, of course, at the 
time that theoreticians studied this thing, there were also many measurements 
of the actual curve. And it turned out that the correct curve looked like the dashed 
curves in Fig. 41-4. That is, the x-rays were not there. If we lower the temperature, 
the whole curve goes down in proportion to T, according to the classical theory, 
but the observed curve also cuts off sooner at a lower temperature. Thus the low- 
frequency end of the curve is right, but the high-frequency end is wrong. Why? 
When Sir James Jeans was worrying about the specific heats of gases, he noted 
that motions which have high frequency are “frozen out” as the temperature goes 
too low. That is, if the temperature is too low, if the frequency is too high, the 
oscillators do not have kT of energy on the average. Now recall how our derivation 
of (41.13) worked: It all depends on the energy of an oscillator at thermal equilib- 
rium. What the kT of (41.5) was, and what the same kT in (41.13) is, is the mean 
energy of a harmonic oscillator of frequency w at temperature T. Classically, this 
is kT , but experimentally, no! — not when the temperature is too low or the oscillator 
frequency is too high. And so the reason that the curve falls off is the same reason 
that the specific heats of gases fail. It is easier to study the blackbody curve than it 
is the specific heats of gases, which are so complicated, therefore our attention is 
focused on determining the true blackbody curve, because this curve is a curve 
which correctly tells us, at every frequency, what the average energy of harmonic 
oscillators actually is as a function of temperature. 

Planck studied this curve. He first determined the answer empirically, by 
fitting the observed curve with a nice function that fitted very well. Thus he had 
an empirical formula for the average energy of a harmonic oscillator as a function 
of frequency. In other words, he had the right formula instead of kT , and then by 
fiddling around he found a simple derivation for it which involved a very peculiar 
assumption. That assumption was that the harmonic oscillator can take up energies 
only ho) at a time . The idea that they can have any energy at all is false. Of course, 
that was the beginning of the end of classical mechanics. 
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The very first correctly determined quantum-mechanical formula will now be 
derived. Suppose that the permitted energy levels of a harmonic oscillator were 
equally spaced at ha> Q apart, so that the oscillator could take on only these different 
energies (Fig. 41-5). Planck made a somewhat more complicated argument than 
the one that is being given here, because that was the very beginning of quantum 
mechanics and he had to prove some things. But we are going to take it as a fact 
(which he demonstrated in this case) that the probability of occupying a level of 
energy E is P(E ) = ae~ ElkT . If we go along with that, we will obtain the right 
result. 

Suppose now that we have a lot of oscillators, and each is a vibrator of fre- 
quency co 0 . Some of these vibrators will be in the bottom quantum state, some will 
be in the next one, and so forth. What we would like to know is the average energy 
of all these oscillators. To find out, let us calculate the total energy of all the oscilla- 
tors and divide by the number of oscillators. That will be the average energy per 
oscillator in thermal equilibrium, and will also be the energy that is in equilibrium 
with the blackbody radiation and that should go in Eq. (41.13) in place of kT . 
Thus we let N 0 be the number of oscillators that are in the ground state (the 
lowest energy state); N x the number of oscillators in the state E x \ N 2 the number 
that are in state E 2 \ and so on. According to the hypothesis (which we have not 
proved) that in quantum mechanics the law that replaced the probability e - p - E - /fc7T 
or e~ K,E lkT in classical mechanics is that the probability goes down as e ~ AEIkT , 
where A E is the excess energy, we shall assume that the number N x that are in the 
first state will be the number N 0 that are in the ground state, times e ~ huifkT . Simi- 
larly, N 2 , the number of oscillators in the second state, is N 2 = N 0 e ~ 2fiulkT . To 
simplify the algebra, let us call e~ h( * }kT = x. Then we simply have N x = A 0 x, 
N 2 = A 0 x 2 ,...,A n = N 0 x n . 

The total energy of all the oscillators must first be worked out. If an oscillator 
is in the ground state, there is no energy. If it is in the first state, the energy is foo 0 , 
and there are N x of them. So N x hco , or hoN 0 x is how much energy we get from 
those. Those that are in the second state have 2ho ) 0 , and there are N 2 of them, 
so N 2 • 2ho) = 2hooN 0 x 2 is how much energy we get, and so on. Then we add 
it all together to get E tot = N 0 ho )( 0 + x + 2x 2 + 3x 3 +...). 

And now, how many oscillators are there? Of course, N 0 is the number that 
are in the ground state, N x in the first state, and so on, and we add them together: 
N to t = N 0 (l + x + x 2 + x 3 + . . .). Thus the average energy is 


/F\ — Et°t „ Noho)(0 + x + 2x 2 + 3x 3 + * ■ ■) 
K 7 N tot N 0 (\ + x + x2 + . • .) 


(41.14) 
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Fig. 41-5. The energy levels of a 
harmonic oscillator are equally spaced: 
E n = nho). 


Now the two sums which appear here we shall leave for the reader to play with and 
have some fun with. When we are all finished summing and substituting for x in 
the sum, we should get — if we make no mistakes in the sum — 


(E) 


ho) 

e *«lkT _ | 


(41.15) 


This, then, was the first quantum-mechanical formula ever known, or ever discussed, 
and it was the beautiful culmination of decades of puzzlement. Maxwell knew 
that there was something wrong, and the problem was, what was right! Here 
is the quantitative answer of what is right instead of kT. This expression should, 
of course, approach /cT as co — > 0 or as T — > 00 . See if you can prove that it does — 
learn how to do the mathematics. 

This is the famous cutoff factor that Jeans was looking for, and if we use it 
instead of kT in (41.13), we obtain for the distribution of light in a black box 


7(co) do) = 


ho) 3 do) 
*. 2 ^ 2 / JiufkT 

7 r c (e 


1) 


(41.16) 


We see that for a large to, even though we have to 3 in the numerator, there is an e 
raised to a tremendous power in the denominator, so the curve comes down again 
and does not “blow up” — we do not get ultraviolet light and x-rays where we do 
not expect them! 
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One might complain that in our derivation of (41.16) we used the quantum 
theory for the energy levels of the harmonic oscillator, but the classical theory in 
determining the cross section a s . But the quantum theory of light interacting with a 
harmonic oscillator gives exactly the same result as that given by the classical theory. 
That, in fact, is why we were justified in spending so much time on our analysis of 
the index of refraction and the scattering of light, using a model of atoms like 
little oscillators— the quantum formulas are substantially the same. 

Now let us return to the Johnson noise in a resistor. We have already re- 
marked that the theory of this noise power is really the same theory as that of the 
classical blackbody distribution. In fact, rather amusingly, we have already 
said that if the resistance in a circuit were not a real resistance, but were an antenna 
(an antenna acts like a resistance because it radiates energy), a radiation resistance, 
it would be easy for us to calculate what the power would be. It would be just the 
power that runs into the antenna from the light that is all around, and we would 
get the same distribution, changed by only one or two factors. We can suppose 
that the resistor is a generator with an unknown power spectrum P(a>). The spec- 
trum is determined by the fact that this same generator, connected to a resonant 
circuit of any frequency , as in Fig. 41 -2(b), generates in the inductance a voltage 
of the magnitude given in Eq. (41.2). One is thus led to the same integral as in 
(41.10), and the same method works to give Eq. (41.3). For low temperatures the 
kT in (41.3) must of course be replaced by (41.15). The two theories (blackbody 
radiation and Johnson noise) are also closely related physically, for we may of 
course connect a resonant circuit to an antenna , so the resistance R is a pure 
radiation resistance. Since (41.2) does not depend on the physical origin of the 
resistance, we know the generator G for a real resistance and for radiation resist- 
ance is the same. What is the origin of the generated power P(co) if the resistance R 
is only an ideal antenna in equilibrium with its environment at temperature T1 
It is the radiation /(o>) in the space at temperature T which impinges on the antenna 
and, as “received signals,” makes an effective generator. Therefore one can deduce 
a direct relation of P( co) and /(to), leading then from (41.13) to (41.3). 

All the things we have been talking about — the so-called Johnson noise and 
Planck’s distribution, and the correct theory of the Brownian movement which we 
are about to describe — are developments of the first decade or so of this century. 
Now with those points and that history in mind, we return to the Brownian move- 
ment. 


41-4 The random walk 



Fig. 41-6. A random walk of 36 
steps of length /. How far is S 3& from B? 
Ans: about 61 on the average. 


Let us consider how the position of a jiggling particle should change with time, 
for very long times compared with the time between “kicks.” Consider a little 
Brownian movement particle which is jiggling about because it is bombarded on 
all sides by irregularly jiggling water molecules. Query: After a given length of 
time, how far away is it likely to be from where it began? This problem was solved 
by Einstein and Smoluchowski. If we imagine that we divide the time into little 
intervals, let us say a hundredth of a second or so, then after the first hundredth of 
a second it moves here, and in the next hundredth it moves some more, in the 
next hundredth of a second it moves somewhere else, and so on. In terms of the 
rate of bombardment, a hundredth of a second is a very long time. The reader 
may easily verify that the number of collisions a single molecule of water receives 
in a second is about 10 14 , so in a hundredth of a second it has 10 1 2 collisions, which 
is a lot! Therefore, after a hundredth of a second it is not going to remember what 
happened before. In other words, the collisions are all random , so that one “step” 
is not related to the previous “step.” It is like the famous drunken sailor problem: 
the sailor comes out of the bar and takes a sequence of steps, but each step is 
chosen at an arbitrary angle, at random (Fig. 41-6). The question is: After a long 
time, where is the sailor? Of course we do not know! It is impossible to say. 
What do we mean — he is just somewhere more or less random. Well then, on the 
average, where is he? On the average , how far away from the bar has he gone ? 
We have already answered this question, because once we were discussing the 
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superposition of light from a whole lot of different sources at different phases, 
and that meant adding a lot of arrows at different angles (Chapter 32). There we 
discovered that the mean square of the distance from one end to the other of the 
chain of random steps, which was the intensity of the light, is the sum of the 
intensities of the separate pieces. And so, by the same kind of mathematics, we 
can prove immediately that if R^ is the vector distance from the origin after A steps, 
the mean square of the distance from the origin is proportional to the number N 
of steps. That is, (R%) = NL 2 , where L is the length of each step. Since the num- 
ber of steps is proportional to the time in our present problem, the mean square 
distance is proportional to the time: 


(R 2 ) = oct. (41.17) 

This does not mean that the mean distance is proportional to the time. If the mean 
distance were proportional to the time it would mean that the drifting is at a nice 
uniform velocity. The sailor is making some relatively sensible headway, but only 
such that his mean square distance is proportional to time. That is the character- 
istic of a random walk. 

We may show very easily that in each successive step the square of the distance 
increases, on the average, by L 2 . For if we write Rjv = Rat_i + L, we find that 
R|r is 

Ra' ' Rtv = Rn = Rn—i + 2R;v_ 1 • L + L 2 , 

and averaging over many trials, we have (R%) = (R %_ + L 2 , since (R Ar _ 1 • L> 
= 0. Thus, by induction, 

R% = NL 2 . (41.18) 

Now we would like to calculate the coefficient a in Eq. (41.17), and to do so 
we must add a feature. We are going to suppose that if we were to put a force 
on this particle (having nothing to do with the Brownian movement — we are 
taking a side issue for the moment), then it would react in the following way against 
the force. First, there would be inertia. Let m be the coefficient of inertia, the 
effective mass of the object (not necessarily the same as the real mass of the real 
particle, because the water has to move around the particle if we pull on it). Thus 
if we talk about motion in one direction, there is a term like m (d 2 x/dt 2 ) on one side. 
And next, we want also to assume that if we kept a steady pull on the object* there 
would be a drag on it from the fluid, proportional to its velocity. Besides the inertia 
of the fluid, there is a resistance to flow due to the viscosity and the complexity of 
the fluid. It is absolutely essential that there be some irreversible losses, something 
like resistance, in order that there be fluctuations. There is no way to produce the 
kT unless there are also losses. The source of the fluctuations is very closely related 
to these losses. What the mechanism of this drag is, we will discuss soon — we shall 
talk about forces that are proportional to the velocity and where they come from. 
But let us suppose for now that there is such a resistance. Then the formula for 
the motion under an external force, when we are pulling on it in a normal manner, is 

d 2 x dx _ 

m dfi + ^di = Fext - (4U9) 

The quantity p can be determined directly from experiment. For example, we can 
watch the drop fall under gravity. Then we know that the force is mg * and jjl is mg 
divided by the speed of fall the drop ultimately acquires. Or we could put the 
drop in a centrifuge and see how fast it sediments. Or if it is charged, we can put 
an electric field on it. So ju is a measurable thing, not an artificial thing, and it is 
known for many types of colloidal particles, etc. 

Now let us use the same formula in the case where the force is not external, 
but is equal to the irregular forces of the Brownian movement. We shall then try 
to determine the mean square distance that the object goes. Instead of taking the 
distances in three dimensions, let us take just one dimension, and find the mean 
of x 2 , just to prepare ourselves. (Obviously the mean of x 2 is the same as the mean 
of y 2 is the same as the mean of z 2 , and therefore the mean square of the distance 
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is just 3 times what we are going to calculate.) The ^-component of the irregular 
forces is, of course, just as irregular as any other component. What is the rate of 
change of x 2 ? It is d{x 2 )/dt = 2 x(dx/dt), so what we have to find is the average 
of the position times the velocity. We shall show that this is a constant, and that 
therefore the mean square radius will increase proportionally to the time, and at 
what rate. Now if we multiply Eq. (41.19) by x, mx(d 2 x/dt 2 ) + px(dx/dt) = xF x . 
We want the time average of x(dx/dt), so let us take the average of the whole 
equation, and study the three terms. Now what about x times the force? If the 
particle happens to have gone a certain distance x, then, since the irregular force is 
completely irregular and does not know where the particle started from, the next 
impulse can be in any direction relative to x. If x is positive, there is no reason why 
the average force should also be in that direction. It is just as likely to be one 
way as the other. The bombardment forces are not driving it in a definite direction. 
So the average value of x times F is zero. On the other hand, for the term 
mx(d 2 x/dt 2 ) we will have to be a little fancy, and write this as 


d 2 x 


mx 


dt 2 


m d[x(dx/dt)\ 
dt 



Thus we put in these two terms and take the average of both. So let us see how 
much x times the velocity should be. Now x times the velocity has a mean that 
does not change with time, because when it gets to some position it has no re- 
membrance of where it was before, so things are no longer changing with time. 
So this quantity, on the average, is zero. We have left the quantity mv 2 , and that 
is the only thing we know: mv 2 /l has a mean value %kT. Therefore we find that 


implies 


or 


i!£) + "(*£) = ^ 

-<»» 2 ) + S| <*’>-.<>, 

d(x 2 ) _ 0 kT 
dt n 


(41.20) 


Therefore the object has a mean square distance (R 2 ), at the end of a certain amount 
of t, equal to 

(R 2 ) = 6kT~. (41.21) 


And so we can actually determine how far the particles go! We first must determine 
how they react to a steady force, how fast they drift under a known force (to 
find ix), and then we can determine how far they go in their random motions. 
This equation was of considerable importance historically, because it was one of 
the first ways by which the constant k was determined. After all, we can measure 
p, the time, how far the particles go, and we can take an average. The reason that 
the determination of k was important is that in the law PV = RT for a mole, 
we know that R, which can also be measured, is equal to the number of atoms in a 
mole times k. A mole was originally defined as so and so many grams of oxygen — 
16 (now carbon is used), so the number of atoms in a mole was not known, orig- 
inally. It is, of course, a very interesting and important problem. How big are 
atoms? How many are there? So one of the earliest determinations of the number 
of atoms was by the the determination of how far a dirty little particle would move 
if we watched it patiently under a microscope for a certain length of time. And 
thus Boltzmann’s constant k and the Avogadro number N 0 were determined be- 
cause R had already been measured. 
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42 

Applications of Kinetic Theory 


42-1 Evaporation 

In this chapter we shall discuss some further applications of kinetic theory. 
In the previous chapter we emphasized one particular aspect of kinetic theory, 
namely, that the average kinetic energy in any degree of freedom of a molecule or 
other object is \kT. The central feature of what we shall now discuss, on the other 
hand, is the fact that the probability of finding a particle in different places, per 
unit volume, varies as e ” potential « lier Ky/* r ; we shall make a number of applications 
of this. 

The phenomena which we want to study are relatively complicated: a liquid 
evaporating, or electrons in a metal coming out of the surface, or a chemical re- 
action in which there are a large number of atoms involved. In such cases it is no 
longer possible to make from the kinetic theory any simple and correct statements, 
because the situation is too complicated. Therefore, this chapter, except where 
otherwise emphasized, is quite inexact. The idea to be emphasized is only that 
we can understand, from the kinetic theory, more or less how things ought to be- 
have. By using thermodynamic arguments, or some empirical measurements of 
certain critical quantities, we can get a more accurate representation of the phe- 
nomena. 

However, it is very useful to know even only more or less why something 
behaves as it does, so that when the situation is a new one, or one that we have not 
yet started to analyze, we can say, more or less, what ought to happen. So this 
discussion is highly inaccurate but essentially right — right in idea, but a little bit 
simplified, let us say, in the specific details. 

The first example that we shall consider is the evaporation of a liquid. Suppose 
we have a box with a large volume, partially filled with liquid in equilibrium and 
with the vapor at a certain temperature. We shall suppose that the molecules of 
the vapor are relatively far apart, and that inside the liquid, the molecules are 
packed close together. The problem is to find out how many molecules there are 
in the vapor phase, compared with the number there are in the liquid. How dense 
is the vapor at a given temperature, and how does it depend on the temperature? 

Let us say that n equals the number of molecules per unit volume in the vapor. 
That number, of course, varies with the temperature. If we add heat, we get more 
evaporation. Now let another quantity, 1/K a , equal the number of atoms per 
unit volume in the liquid: We suppose that each molecule in the liquid occupies a 
certain volume, so that if there are more molecules of liquid, then all together they 
occupy a bigger volume. Thus if V a is the volume occupied by one molecule, the 
number of molecules in a unit volume is a unit volume divided by the volume of 
each molecule. Furthermore, we suppose that there is a force of attraction between 
the molecules to hold them together in the liquid. Otherwise we cannot understand 
why it condenses. Thus suppose that there is such a force and that there is an energy 
of binding of the molecules in the liquid which is lost when they go into the vapor. 
That is, we are going to suppose that, in order to take a single molecule out of the 
liquid into the vapor, a certain amount of work W has to be done. There is a 
certain difference, W, in the energy of a molecule in the liquid from what it would 
have if it were in the vapor, because we have to pull it away from the other molecules 
which attract it. 

Now we use the general principle that the number of atoms per unit volume 
in two different regions is n 2 /ni = e ~ iE 2 ~ E i ) i kT m So the number n per unit volume 
in the vapor, divided by the number \/V (l per unit volume in the liquid, is equal to 

nV a = e- 1 ™ 7 , (42.1) 
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because that is the general rule. It is like the atmosphere in equilibrium under 
gravity, where the gas at the bottom is denser than that at the top because of the work 
mgh needed to lift the gas molecules to the height h. In the liquid, the molecules 
are denser than in the vapor because we have to pull them out through the energy 
“hill” W , and the ratio of the densities is e~ wlkT . 

This is what we wanted to deduce— that the vapor density varies as e to the 
minus some energy or other over kT, The factors in front are not really interesting 
to us, because in most cases the vapor density is very much lower than the liquid 
density. In those circumstances, where we are not near the critical point where 
they are almost the same, but where the vapor density is much lower than the 
liquid density, then the fact that n is very much less than \/V a is occasioned by the 
fact that W is very much greater than kT . So formulas such as (42. 1) are interesting 
only when W is very much bigger than kT , because in those circumstances, since 
we are raising e to minus a tremendous amount, if we change T a little bit, that 
tremendous power changes a bit, and the change produced in the exponential factor 
is very much more important than any change that might occur in the factors out 
in front. Why should there be any changes in such factors as K a ? Because ours 
was an approximate analysis. After all, there is not really a definite volume for 
each molecule; as we change the temperature, the volume V a does not stay constant 
— the liquid expands. There are other little features like that, and so the actual 
situation is more complicated. There are slowly varying temperature-dependent 
factors all over the place. In fact, we might say that W itself varies slightly with 
temperature, because at a higher temperature, at a different molecular volume, 
there would be different average attractions, and so on. So, while we might think 
that if we have a formula in which everything varies in an unknown way with 
temperature then we have no formula at all, if we realize that the exponent W/kT 
is, in general, very large, we see that in the curve of the vapor density as a function 
of temperature most of the variation is occasioned by the exponential factor, and 
if we take W as a constant and the coefficient \/V a as nearly constant, it is a good 
approximation for short intervals along the curve. Most of the variation, in other 
words, is of the general nature e ~ W!kT . 

It turns out that there are many, many phenomena in nature which are 
characterized by having to borrow an energy from somewhere, and in which the 
central feature of the temperature variation is e to the minus the energy over kT, 
This is a useful fact only when the energy is large compared with kT , so that most 
of the variation is contained in the variation of the kT and not in the constant and 
in other factors. 

Now let us consider another way of obtaining a somewhat similar result for 
the evaporation, but looking at it in more detail. To arrive at (42.1), we simply 
applied a rule which is valid at equilibrium, but in order to understand things better, 
there is no harm in trying to look at the details of what is going on. We may also 
describe what is going on in the following way : the molecules that are in the vapor 
continually bombard the surface of the liquid; when they hit it, they may bounce 
off or they may get stuck. There is an unknown factor for that — maybe 50-50, 
maybe 10 to 90— we do not know. Let us say they always get stuck— we can ana- 
lyze it over again later on the assumption that they do not always get stuck. Then 
at a given moment there will be a certain number of atoms which are condensing 
onto the surface of the liquid. The number of condensing molecules, the number 
that arrive on a unit area, is the number n per unit volume times the velocity v . 
This velocity of the molecules is related to the temperature, because we know that 
\mv 2 is equal to \kT on the average. So v is some kind of a mean velocity. Of 
course we should integrate over the angles and get some kind of an average, but it 
is roughly proportional to the root-mean-square velocity, within some factor. Thus 

N c = no (42.2) 

is the number which arrive per unit area and are condensing. 

At the same time, however, the atoms in the liquid are jiggling about, and 
from time to time one of them gets kicked out. Now we have to estimate how fast 
they get kicked out. The idea will be that at equilibrium the number that are 
kicked out per second and the number that arrive per second are equal. 
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How many get kicked out? In order to get kicked out, a particular molecule 
has to have acquired by accident an excess energy over its neighbors — a consider- 
able excess energy, because it is attracted very strongly by the other molecules in the 
liquid. Ordinarily it does not leave because it is so strongly attracted, but in the 
collisions sometimes one of them gets an extra energy by accident. And the chance 
that it gets the extra energy W which it needs in our case is very small if W » kT. 
In fact, e~~ WikT is the chance that an atom has picked up more than this much en- 
ergy. That is the general principle in kinetic theory: in order to borrow an excess 
energy W over the average, the odds are e to the minus the energy that we have to 
borrow, over kT. Now suppose that some molecules have borrowed this energy. 
We now have to estimate how many leave the surface per second. Of course, just 
because a molecule has the necessary energy does not mean that it will actually 
evaporate, since it may be buried too deeply inside the liquid or, even if it is near 
the surface, it may be travelling in the wrong direction. The number that are going 
to leave a unit area per second is going to be something like this: the number of 
atoms there are near the surface, per unit area, divided by the time it takes one to 
escape, multiplied by the probability e~ WjkT that they are ready to escape in the 
sense that they have enough energy. 

We shall suppose that each molecule at the surface of the liquid occupies a 
certain cross-sectional area A. Then the number of molecules per unit area of 
liquid surface will be 1 /A. And now, how long does it take a molecule to escape? 
If the molecules have a certain average speed v, and have to move, say, one molec- 
ular diameter D, the thickness of the first layer, then the time it takes to get across 
that thickness is the time needed to escape, if the molecule has enough energy. 
The time will be D/v. Thus the number evaporating should be approximately 

N e = (1 /A){v/D)e~ WikT . (42.3) 

Now the area of each atom times the thickness of the layer is approximately the 
same as the volume V a occupied by a single atom. And so, in order to get equilib- 
rium, we must have N c — N e , or 

= (v/V a )e- WikT . (42.4) 

We may cancel the v\ since they are equal; even though one is the velocity of a 
molecule in the vapor and the other is the velocity of an evaporating molecule, 
these are the same, because we know their mean kinetic energy (in one direction) 
is \kT. But one may object, “No! No! These are the especially fast-moving ones; 
these are the ones that have picked up excess energy.” Not really, because the 
moment they start to pull away from the liquid, they have to lose that excess 
energy against the potential energy. So, as they come to the surface they are slowed 
down to the velocity y! It is the same as it was in our discussion of the distribution 
of molecular velocities in the atmosphere — at the bottom, the molecules had a 
certain distribution of energy. The ones that arrive at the top have the same distri- 
bution of energy, because the slow ones did not arrive at all, and the fast ones were 
slowed down. The molecules that are evaporating have the same distribution of 
energy as the ones inside — a rather remarkable fact. Anyway, it is useless to try to 
argue so closely about our formula because of other inaccuracies, such as the prob- 
ability of bouncing back rather than entering the liquid, and so on. Thus we have 
a rough idea of the rate of evaporation and condensation, and we see, of course, 
that the vapor density n varies in the same way as before, but now we have under- 
stood it in some detail rather than just as an arbitrary formula. 

This deeper understanding permits us to analyze some things. For example, 
suppose that we were to pump away the vapor at such a great rate that we removed 
the vapor as fast as it formed (if we had very good pumps and the liquid was 
evaporating very slowly), how fast would evaporation occur if we maintained a 
liquid temperature T? Suppose that we have already experimentally measured 
the equilibrium vapor density, so that we know, at the given temperature, how 
many molecules per unit volume are in equilibrium with the liquid. Now we would 
like to know how fast it will evaporate. Even though we have used only a rough 
analysis so far as the evaporation part of it is concerned, the number of vapor 
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molecules arriving was not done so badly, aside from the unknown factor of re- 
flection coefficient. So therefore we may use the fact that the number that are 
leaving, at equilibrium, is the same as the number that arrive. True, the vapor is 
being swept away and so the molecules are only coming out, but if the vapor were 
left alone, it would attain the equilibrium density at which the number that come 
back would equal the number that are evaporating. Therefore, we can easily see 
that the number that are coming off* the surface per second is equal to the unknown 
reflection coefficient R times the number that would come down to the surface 
per second were the vapor still there, because that is how many would balance the 
evaporation at equilibrium: 

N e = nvR = ( vR/V a )e- wlkT . (42.5) 

Of course, the number of molecules that hit the liquid from the vapor is easy to 
calculate, since we do not need to know as much about the forces as we do when 
we are worrying about how they get to escape through the liquid surface; it is 
much easier to make the argument the other way. 

42-2 Thermionic emission 

We may give another example of a very practical situation that is similar to 
the evaporation of a liquid — so similar that it is not worth making a separate 
analysis. It is essentially the same problem. In a radio tube there is a source of 
electrons, namely a heated tungsten filament, and a positively charged plate to 
attract the electrons. Any electron that escapes from the surface of the tungsten is 
immediately swept away to the plate. That is our ideal “pump,” which is “pump- 
ing” the electrons away all the time. Now the question is: How many electrons 
per second can we get out of a piece of tungsten, and how does that number vary 
with temperature? The answer to that problem is the same as (42.5), because it 
turns out that in a piece of metal, electrons are attracted to the ions, or to atoms, 
of the metal. They are attracted, if we may say it crudely, to the metal. In order 
to get an electron out of a piece of metal, it takes a certain amount of energy or 
work to pull it out. This work varies with the different kinds of metal. In fact, it 
varies even with the character of the surface of a given kind of metal, but the total 
work may be a few electron volts, which, incidentally, is typical of the energy 
involved in chemical reactions. We can remember the latter fact by remembering 
that the voltage in a chemical cell like a flashlight battery, which is produced by 
chemical reactions, is about one volt. 

How can we find out how many electrons come out per second? It would be 
quite difficult to analyze the effects on the electrons going out; it is easier to 
analyze the situation the other way. So, we could start out by imagining that we did 
not draw the electrons away, and that the electrons were like a gas, and could 
come back to the metal. Then there would be a certain density of electrons at 
equilibrium which would, of course, be given by exactly the same formula as (42.1), 
where V a is the volume per electron in the metal, roughly, and W is equal to 
where <t> is the so-called work function , or the voltage needed to pull an electron off 
the surface. This would tell us how many electrons would have to be in the 
surrounding space and striking the metal in order to balance the ones that are 
coming out. And thus it is easy to calculate how many are coming out if we sweep 
away all of them, because the number that are coming out is exactly equal to the 
number that would be going in with the above density of electron “vapor.” In 
other words, the answer is that the current of electricity that comes in per unit 
area is equal to the charge on each times the number that arrive per second per 
unit area, which is the number per unit volume times the velocity, as we have 
seen many times: 

I — q e nv = (qev/V a )e~ Qe * lkT . (42.6) 

Now one electron volt corresponds to kT at a temperature of 1 1,600 degrees. The 
filament of the tube may be operating at a temperature of, say, 1 100 degrees, so 
the exponential factor is something like e~ 10 ; when we change the temperature a 
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little bit, the exponential factor changes a lot. Thus, again, the central feature of 
the formula is the e~ q ^ jkT . As a matter of fact, the factor in front is quite wrong — 
it turns out that the behavior of electrons in a metal is not correctly described by 
the classical theory, but by quantum mechanics, but this only changes the factor in 
front a little. Actually, no one has ever been able to get the thing straightened out 
very well, even though many people have used the high-class quantum-mechanical 
theory for their calculations. The big problem is, does W change slightly with 
temperature? If it does, one cannot distinguish a W changing slowly with tempera- 
ture from a different coefficient in front. That is, if W changed linearly, say, with 
temperature, so that W — W 0 + akT , then we would have 

e -WlkT = e ( W q + «kT)lkT = e -a e -W 0 ikT 

Thus a linearly temperature-dependent W is equivalent to a shifted “constant.” 
It is really quite difficult and usually fruitless to try to obtain the coefficient in 
the front accurately. 


42-3 Thermal ionization 


Now we go on to another example of the same idea; always the same idea. 
This has to do with ionization. Suppose that in a gas we have a whole lot of atoms 
which are in the neutral state, say, but the gas is hot and the atoms can become 
ionized. We would like to know how many ions there are in a given circumstance 
if we have a certain density of atoms per unit volume at a certain temperature. 
Again we consider a box in which there are N atoms which can hold electrons. 
(If an electron has come off an atom, it is called an ion, and if the atom is neutral, 
we simply call it an atom.) Then suppose that, at any given moment, the number 
of neutral atoms is n a , the number of ions is nu and the number of electrons is n e , 
all per unit volume. The problem is: What is the relationship of these three 
numbers? 

In the first place, we have two conditions or constraints on the numbers. For 
instance, as we vary different conditions, like the temperature and so on, n a + n* 
would remain constant, because this would be simply the number N of atomic 
nuclei that are in the box. If we keep the number of nuclei per unit volume fixed, 
and change, say, the temperature, then as the ionization proceeded some atoms 
would turn to ions, but the total number of atoms plus ions would be unchanged. 
That is, n a + = N. Another condition is that if the entire gas is to be electri- 

cally neutral (and if we neglect double or triple ionization), that means that the 
number of ions is equal to the number of electrons at all times, or n* = n e . These 
are subsidiary equations that simply express the conservation of charge and the 
conservation of atoms. 

These equations are true, and we ultimately will use them when we consider 
a real problem. But we want to obtain another relationship between the quantities. 
We can do this as follows. We again use the idea that it takes a certain amount 
of energy to lift the electron out of the atom, which we call the ionization energy , 
and we would write it as W, in order to make all of the formulas look the same. So 
we let W equal the energy needed to pull an electron out of an atom and make an 
ion. Now we again say that the number of free electrons per unit volume in the 
“vapor” is equal to the number of bound electrons per unit volume in the atoms, 
times e to the minus the energy difference between being bound and being free, 
over kT, That is the basic equation again. How can we write it? The number of 
free electrons per unit volume would, of course, be n e , because that is the definition 
of n e . Now what about the number of electrons per unit volume that are bound 
to atoms? The total number of places that we could put the electrons is apparently 
n a + ni, and we will suppose that when they are bound each one is bound within a 
certain volume V a . So the total amount of volume which is available to electrons 
which would be bound is (n a + « z )K a , so we might want to write our formula as 


n e = 


flq —WlkT 

(jta “b ni)V a 
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The formula is wrong, however, in one essential feature, which is the following: 
when an electron is already on an atom, another electron cannot come to that 
volume anymore! In other words, all the volumes of all the possible sites are not 
really available for the one electron which is trying to make up its mind whether 
or not to be in the vapor or in the condensed position, because in this problem there 
is an extra feature that when one electron is where another electron is, it is not 
allowed to go — it is repelled. For that reason, it comes out that we should count 
only that part of the volume which is available for an electron to sit on or not. That 
is, those which are already occupied do not count in the total available volume, 
but the only volume which is allowed is that of the ions , where there are vacant 
places for the electron to go. Then, in those circumstances, we find that a nicer 
way to write our formula is 

Mi = * (42.7) 

tta * a 


This formula is called the Saha ionization equation . Now let us see if we can under- 
stand qualitatively why a formula like this is right, by arguing about the kinetic 
things that are happening. 

First, every once in a while an electron comes to an ion and they combine to 
make an atom. And also, every once in a while, an atom gets into a collision and 
breaks up into an ion and an electron. Now those two rates must be equal. How 
fast do electrons and ions find each other? The rate is certainly increased if the 
number of electrons per unit volume is increased. It is also increased if the number 
of ions per unit volume is increased. That is, the total rate at which recombination 
is occurring is certainly proportional to the number of electrons times the number 
of ions. Now the total rate at which ionization is occurring due to collisions must 
be dependent linearly on how many atoms there are to ionize. And so the rates 
will balance when there is some relationship between the product n e ni and the 
number of atoms, n a . The fact that this relationship happens to be given by this 
particular formula, where W is the ionization energy, is of course a little bit more 
information, but we can easily understand that the formula would necessarily 
involve the concentrations of the electrons, ions, and atoms in the combination 
n e ni/n a to produce a constant independent of the n 9 s, and dependent only on 
temperature, the atomic cross sections, and other constant factors. 

We may also note that, since the equation involves the numbers per unit 
volume , if we were to do two experiments with a given total number N of atoms 
plus ions, that is, a certain fixed number of nuclei, but using boxes with different 
volumes, the n 9 s would all be smaller in the larger box. But since the ratio n e ni/n a 
stays the same, the total number of electrons and ions must be greater in the larger 
box. To see this, suppose that there are N nuclei inside a box of volume V, and 
that a fraction/of them are ionized. Then n e = fN/V = n^, and« a = (1— f)N/V. 
Then our equation becomes 


/2 N^e- w/kr 
1 - f V V a 


(42.8) 


In other words, if we take a smaller and smaller density of atoms, or make the vol- 
ume of the container bigger and bigger, the fraction / of electrons and ions must 
increase. That ionization, just from “expansion” as the density goes down, is the 
reason why we believe that at very low densities, such as in the cold space between 
the stars, there may be ions present, even though we might not understand it from 
the point of view of the available energy. Although it takes many, many kT of 
energy to make them, there are ions present. 

Why can there be ions present when there is so much space around, while if 
we increase the density, the ions tend to disappear? Answer : Consider an atom. 
Every once in a while, light, or another atom, or an ion, or whatever it is that 
maintains thermal equilibrium, strikes it. Very rarely, because it takes such a 
terrific amount of excess energy, an electron comes off and an ion is left. Now 
that electron, if the space is enormous, wanders and wanders and does not come 
near anything for years, perhaps. But once in a very great while, it does come back 
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to an ion and they combine to make an atom. So the rate at which electrons are 
coming out from the atoms is very slow. But if the volume is enormous, an electron 
which has escaped takes so long to find another ion to recombine with that its 
probability of recombination is very, very small; thus, in spite of the large excess 
energy needed, there may be a reasonable number of electrons. 


42-4 Chemical kinetics 

The same situation that we have just called “ionization” is also found in a 
chemical reaction. For instance, if two objects A and B combine into a compound 
AB , then if we think about it for a while we see that AB is what we have called an 
atom, B is what we call an electron, and A is what we call an ion. With these 
substitutions the equations of equilibrium are exactly the same in form: 


nAftB „ —WjkT 

— ce , 

riAB 


(42.9) 


This formula, of course, is not exact, since the “constant” c depends on how much 
volume is allowed for the A and B to combine, and so on, but by thermodynamic 
arguments one can identify what the meaning of the W in the exponential 
factor is, and it turns out that it is very close to the energy needed in the reaction. 

Suppose that we tried to understand this formula as a result of collisions, 
much in the way that we understood the evaporation formula, by arguing about 
how many electrons came off and how many of them came back per unit time. 
Suppose that A and B combine in a collision every once in a while to form a 
compound AB. And suppose that the compound AB is a complicated molecule 
which jiggles around and is hit by other molecules, and from time to time it gets 
enough energy to explode and break up again into A and B. 

Now it actually turns out, in chemical reactions, that if the atoms come to- 
gether with too small an energy, even though energy may be released in the reaction 
A + B — > AB , the fact that A and B may touch each other does not necessarily 
make the reaction start. It usually is required that the collision be rather hard, in 
fact, to get the reaction to go at all — a “soft” collision between A and B may not 
do it, even though energy may be released in the process. So let us suppose that 
it is very common in chemical reactions that, in order for A and B to form AB , 
they cannot just hit each other, but they have to hit each other with sufficient energy. 
This energy is called the activation energy — the energy needed to “activate” the 
reaction. Call A* the activation energy, the excess energy needed in a collision in 
order that the reaction may really occur. Then the rate R/ at which A and B 
produce AB would involve the number of atoms of A times the number of atoms 
of B, times the rate at which a single atom would strike a certain cross section 
<tab, times a factor e ~ A * lkT f which is the probability that they have enough energy: 

R f = n A n B v<7 AB e~ A * lkT . (42.10) 


Now we have to find the opposite rate, R r . There is a certain chance that AB will 
fly apart. In order to fly apart, it not only must have the energy W which it needs 
in order to get apart at all but, just as it was hard for A and B to come together, so 
there is a kind of hill that A and B have to climb over to get apart again ; they must 
have not only enough energy just to get ready to pull apart, but a certain excess. 
It is like climbing a hill to get into a deep valley; they have to climb the hill coming 
in and they have to climb out of the valley and then over the hill coming back 
(Fig. 42-1). Thus the rate at which AB goes to A and B will be proportional to the 
number hab that are present, times 

Rr = c'n AB e-' w+A * )liT . (42.11) 



Fig. 42-1. The energy relationship 
for the reaction A + 6 — * AB. 


The c' will involve the volume of atoms and the rate of collisions, which we can 
work out, as we did the case of evaporation, with areas and times and thicknesses; 
but we shall not do this. The main feature of interest to us is that when these two 
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rates are equal, the ratio of them is equal to unity. This tells us that n A n B /n AB = 
Ce ~ w ' kT , as before, where c involves the cross sections, velocities, and other factors 
independent of the n' s. 

The interesting thing is that the rate of the reaction also varies as e ~ conBtlkT , 
although the constant is not the same as that which governs the concentrations; 
the activation energy A* is quite different from the energy W. W governs the pro- 
portions of A, B, and AB that we have in equilibrium, but if we want to know how 
fast A + B goes to AB, that is not a question of equilibrium, and here a different 
energy, the activation energy, governs the rate of reaction through an exponential 
factor. 

Furthermore, A* is not a fundamental constant like W. Suppose that at the 
surface of the wall— or at some other place— A and B could temporarily stick there 
in such a way that they could combine more easily. In other words, we might 
find a “tunnel” through the hill, or perhaps a lower hill. By the conservation of 
energy, when we are all finished we have still made AB out of A and B, so the 
energy difference W will be quite independent of the way the reaction occurred, 
but the activation energy A* will depend very much on the way the reaction occurs. 
This is why the rates of chemical reactions are very sensitive to outside conditions. 
We can change the rate by putting in a surface of a different kind, we can put it in 
a “different barrel” and it will go at a different rate, if it depends on the nature of 
the surface. Or if we put in a third kind of object it may change the rate very much; 
some things produce enormous changes in rate simply by changing the A* a little 
bit— they are called catalysts. A reaction might practically not occur at all because 
A * is too big at the given temperature, but when we put in this special stuff, the 
catalyst, then the reaction goes very fast indeed, because A* is reduced. 

Incidentally, there is some trouble with such a reaction, A plus B, making AB, 
because we cannot conserve both energy and momentum when we try to put two 
objects together to make one that is more stable. Therefore, we need at least a 
third object C, so the actual reaction is much more complicated. The forward 
rate would involve the product n A n B n c , and it might seem that our formula is 
going wrong, but no! When we look at the rate at which AB goes the other way, 
we find that it also needs to collide with C, so there is an n AB nc in the reverse rate; 
the n c ' s cancel out in the formula for the equilibrium concentrations. The law of 
equilibrium, (42.9), which we first wrote down is absolutely guaranteed to be true, 
no matter what the mechanism of the reaction may be! 


42-5 Einstein’s laws of radiation 


We now turn to an interesting analogous situation having to do with the black- 
body radiation law. In the last chapter we worked out the distribution law for the 
radiation in a cavity the way Planck did, considering the radiation from an oscilla- 
tor. The oscillator had to have a certain mean energy, and since it was oscillating, 
it would radiate and would keep pumping radiation into the cavity until it piled 
up enough radiation to balance the absorption and emission. In that way we 
found that the intensity of radiation at frequency w was given by the formula 


/( co) do) = 


hw 3 da) 

7r 2 c 2 (e*“ /kT 


1 ) 


(42.12) 


This result involved the assumption that the oscillator which was generating the 
radiation had definite, equally spaced energy levels. We did not say that light had 
to be a photon or anything like that. There was no discussion about how, when an 
atom goes from one level to another, the energy must come out in one unit of en- 
ergy, ho), in the form of light. Planck’s original idea was that the matter was 
quantized but not the light: material oscillators cannot take up just any energy, 
but have to take it in lumps. Furthermore, the trouble with the derivation 
is that it was partially classical. We calculated the rate of radiation from an 
oscillator according to classical physics; then we turned around and said, “No, 
this oscillator has a lot of energy levels.” So gradually, in order to find the 
right result, the completely quantum-mechanical result, there was a slow develop- 
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merit which culminated in the quantum mechanics of 1927. But in the meantime, 
there was an attempt by Einstein to convert Planck’s viewpoint that only oscillators 
of matter were quantized, to the idea that light was really photons and could be 
considered in a certain way as particles with energy hco. Furthermore, Bohr had 
pointed out that any system of atoms has energy levels, but they are not necessarily 
equally spaced like Planck’s oscillator. And so it became necessary to rederive or 
at least rediscuss the radiation law from a more completely quantum-mechanical 
viewpoint. 

Einstein assumed that Planck’s final formula was right, and he used that 
formula to obtain some new information, previously unknown, about the inter- 
action of radiation with matter. His discussion went as follows: Consider any two 
of the many energy levels of an atom, say the mth level and the nth level (Fig. 42-2). 
Now Einstein proposed that when such an atom has light of the right frequency 
shining on it, it can absorb that photon of light and make a transition from state 
n to state m, and that the probability that this occurs per second depends upon the 
two levels, of course, but is proportional to how intense the light is that is shining 
on it. Let us call the proportionality constant B nm , merely to remind us that this 
is not a universal constant of nature, but depends on the particular pair of levels: 
some levels are easy to excite; some levels are hard to excite. Now what is the 
formula going to be for the rate of emission from m to nl Einstein proposed that 
this must have two parts to it. First, even if there were no light present, there 
would be some chance that an atom in an excited state would fall to a lower state, 
emitting a photon; this we call spontaneous emission. It is analogous to the idea 
that an oscillator with a certain amount of energy, even in classical physics, does 
not keep that energy, but loses it by radiation. Thus the analog of spontaneous 
radiation of a classical system is that if the atom is in an excited state there is a 
certain probability A mn , which depends on the levels again, for it to go down from m 
to n , and this probability is independent of whether light is shining on the atom or 
not. But then Einstein went further, and by comparison with the classical theory 
and by other arguments, concluded that emission was also influenced by the pres- 
ence of light — that when light of the right frequency is shining on an atom, it has 
an increased rate of emitting a photon that is proportional to the intensity of the 
light, with a proportionality constant B mn . Later, if we deduce that this coefficient 
is zero, then we will have found that Einstein was wrong. Of course we will find 
he was right. 

Thus Einstein assumed that there are three kinds of processes: an absorption 
proportional to the intensity of light, an emission proportional to the intensity of 
light, called induced emission or sometimes stimulated emission , and a spontaneous 
emission independent of light. 

Now suppose that we have, in equilibrium at temperature T , a certain number 
of atoms N n in the state n and another number N m in the state m. Then the total 
number of atoms that are going from n to m is the number that are in the state n 
times the rate per second that, if one is in n, it goes up to m. So we have a formula 
for the number that are going from n to m per second: 


Absorption 



Spontaneous emission 


induced emission 


Fig. 42-2. Transitions between two 
energy levels of an atom. 


Rn—>m ~~ Af n Z? nm /(co). 


(42.13) 


The number that will go from m to n is expressed in the same manner, as the num- 
ber N m that are in m y times the chance per second that each one goes down to n . 
This time our expression is 

Rm^n = N m [A mn + B mn I ( «)]. (42.14) 


Now we shall suppose that in thermal equilibrium the number of atoms going up 
must equal the number coming down. That is one way, at least, in which the 
number will be sure to stay constant in each level.* So we take these two rates 


* This is not the only way one can arrange to keep the numbers of atoms in the various 
levels constant, but it is the way it actually works. That every process must, in thermal 
equilibrium, be balanced by its exact opposite is called the principle of detailed balancing. 
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Fig. 42-3. By exciting, say by blue 
light, a higher state h, which may emit a 
photon leaving atoms in state m, the 
number in this state m becomes sufficiently 
large to start laser action. 


to be equal at equilibrium. But we have one other piece of information: we know 
how large N m is compared with N n — the ratio of those two is e~ iE m~ E n^ kT . 
Now Einstein assumed that the only light which is effective in making the transition 
from n to m is the light which has the frequency corresponding to the energy 
difference, so E m — E n — ha) in all our formulas. Thus 


N m = N n e- n “ lkT . (42.15) 

Thus if we set the two rates equal: N n B nm I(w) = N m [A mn + B mn I(u)], and 
divide by N m9 we get 

BnJip,^ 7 = A mn + B mn I( a). (42.16) 

From this equation, we can calculate /(co). It is simply 

~ B » ' (42 - 17 > 

G nmP &mn 


But Planck has already told us that the formula must be (42.12). Therefore we 
can deduce something: First, that B nm must equal B mn , since otherwise we cannot 
get the (e no>lkT — 1). So Einstein discovered some things that he did not know how 
to calculate, namely that the induced emission probability and the absorption prob- 
ability must be equal . This is interesting. And furthermore, in order for (42.17) 
and (42.12) to agree, 

■A mnf B mn must be hu> z /ir 2 c 2 . (42.18) 

So if we know, for instance, the absorption rate for a given level, we can deduce 
the spontaneous emission rate and the induced emission rate, or any combination. 

This is as far as Einstein or anyone else could go using such arguments. To 
actually compute the absolute spontaneous emission rate or the other rates for 
any specific atomic transition, of course, requires a knowledge of the machinery 
of the atom, called quantum electrodynamics, which was not discovered until 
eleven years later. This work of Einstein was done in 1916. 

The possibility of induced emission has, today, found interesting applications. 
If there is light present, it will tend to induce the downward transition. The transi- 
tion then adds its ho) to the available light energy, if there were some atoms sitting 
in the upper state. Now we can arrange, by some non thermal method, to have a 
gas in which the number in the state m is very much greater than the number in the 
state n. This is far out of equilibrium, and so is not given by the formula 
which is for equilibrium. We can even arrange it so that the number in the upper 
state is very large, while the number in the lower state is practically zero. Then 
light which has the frequency corresponding to the energy difference E m — E n 
will not be strongly absorbed, because there are not many atoms in state n to 
absorb it. On the other hand, when that light is present, it will induce the emission 
from this upper state! So, if we had a lot of atoms in the upper state, there would 
be a sort of chain reaction, in which, the moment the atoms began to emit, more 
would be caused to emit, and the whole lot of them would dump down together. 
This is what is called a laser , or, in the case of the far infrared, a maser . 

Various tricks can be used to obtain the atoms in state m. There may be higher 
levels to which the atoms can get if we shine in a strong beam of light of high 
frequency. From these high levels, they may trickle down, emitting various pho- 
tons, until they all get stuck in the state m. If they tend to stay in the state m 
without emitting, the state is called metastable. And then they are all dumped 
down together by induced emissions. One more technical point — if we put this 
system in an ordinary box, it would radiate in so many different directions spon- 
taneously, compared with the induced effect, that we would still be in trouble. 
But we can enhance the induced effect, increase its efficiency, by putting nearly 
perfect mirrors on each side of the box, so that the light which is emitted gets 
another chance, and another chance, and another chance, to induce more emission. 
Although the mirrors are almost one hundred percent reflecting, there is a slight 
amount of transmission of the mirror, and a little light gets out. In the end, of 
course, from the conservation of energy, all the light goes out in a nice uniform 
straight direction which makes the strong light beams that are possible today with 
lasers. 
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43 


Diffusion 


43-1 Collisions between molecules 

We have considered so far only the molecular motions in a gas which is in 
thermal equilibrium. We want now to discuss what happens when things are near, 
but not exactly in, equilibrium. In a situation far from equilibrium, things are 
extremely complicated, but in a situation very close to equilibrium we can easily 
work out what happens. To see what happens, we must, however, return to the 
kinetic theory. Statistical mechanics and thermodynamics deal with the equilibrium 
situation, but away from equilibrium we can only analyze what occurs atom by 
atom, so to speak. 

As a simple example of a nonequilibrium circumstance, we shall consider 
the diffusion of ions in a gas. Suppose that in a gas there is a relatively small 
concentration of ions — electrically charged molecules. If we put an electric field 
on the gas, then each ion will have a force on it which is different from the forces 
on the neutral molecules of the gas. If there were no other molecules present, an 
ion would have a constant acceleration until it reached the wall of the container. 
But because of the presence of the other molecules, it cannot do that; its velocity 
increases only until it collides with a molecule and loses its momentum. It starts 
again to pick up more speed, but then it loses its momentum again. The net effect 
is that an ion works its way along an erratic path, but with a net motion in the di- 
rection of the electric force. We shall see that the ion has an average “drift” with 
a mean speed which is proportional to the electric field — the stronger the field, 
the faster it goes. While the field is on, and while the, ion is moving along, it is, 
of course, not in thermal equilibrium, it is trying to get to equilibrium, which is to 
be sitting at the end of the container. By means of the kinetic theory we can com- 
pute the drift velocity. 

It turns out that with our present mathematical abilities we cannot really 
compute precisely what will happen, but we can obtain approximate results which 
exhibit all the essential features. We can find out how things will vary with pressure, 
with temperature, and so on, but it will not be possible to get precisely the correct 
numerical factors in front of all the terms. We shall, therefore, in our derivations, 
not worry about the precise value of numerical factors. They can be obtained only 
by a very much more sophisticated mathematical treatment. 

Before we consider what happens in nonequilibrium situations, we shall need 
to look a little closer at what goes on in a gas in thermal equilibrium. We shall 
need to know, for example, what the average time between successive collisions 
of a molecule is. 

Any molecule experiences a sequence of collisions with other molecules — in a 
random way, of course. A particular molecule will, in a long period of time T, 
have a certain number, N , of hits. If we double the length of time, there will be 
twice as many hits. So the number of collisions is proportional to the time T. We 
would like to write it this way: 

N = T/t (43.1) 

We have written the constant of proportionality as 1/r, where r will have the di- 
mensions of a time. The constant r is the average time between collisions. Suppose, 
for example, that in an hour there are 60 collisions; then r is one minute. We 
would say that r (one minute) is the average time between the collisions. 

We may often wish to ask the following question: “What is the chance that a 
molecule will experience a collision during the next small interval of time dtV ’ 
The answer, we may intuitively understand, is dt/r. But let us try to make a more 
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convincing argument. Suppose that there were a very large number N of molecules. 
How many will have collisions in the next interval of time dt? If there is equilibrium, 
nothing is changing on the average with time. So N molecules waiting the time dt 
will have the same number of collisions as one molecule waiting for the time N dt. 
That number we know is Ndt/r . So the number of hits of N molecules is N dt/r 
in a time dt , and the chance, or probability, of a hit for any one molecule is just 
1 /N as large, or (1 fN){N dt/r) = dt/r , as we guessed above. That is to say, the 
fraction of the molecules which will suffer a collision in the time dt is dt/r. To take 
an example, if r is one minute, then in one second the fraction of particles which 
will suffer collisions is 1/60. What this means, of course, is that 1/60 of the 
molecules happen to be close enough to what they are going to hit next that 
their collisions will occur in the next minute. 

When we say that r, the mean time between collisions, is one minute, we do 
not mean that all the collisions will occur at times separated by exactly one minute. 
A particular particle does not have a collision, wait one minute, and then have 
another collision. The times between successive collisions are quite variable. 
We will not need it for our later work here, but we may make a small diversion to 
answer the question: “What are the times between collisions ?” We know that for 
the case above, the average time is one minute, but we might like to know, for 
example, what is the chance that we get no collision for two minutes? 

We shall find the answer to the general question: “What is the probability 
that a molecule will go for a time t without having a collision?” At some arbitrary 
instant — that we call t = 0 — we begin to watch a particular molecule. What is 
the chance that it gets by until t without colliding with another molecule? To 
compute the probability, we observe what is happening to all N 0 molecules in a 
container. After we have waited a time t , some of them will have had collisions. 
We let N(t) be the number that have not had collisions up to the time t . N{t) is, of 
course, less than N 0 . We can find N(t) because we know how it changes with time. 
If we know that N(t) molecules have got by until t , then N(t + dt), the number 
which get by until t + dt, is less than N(t) by the number that have collisions in 
dt. The number that collide in dt we have written above in terms of the mean time 
r as dN = N(t) dt/r. We have the equation 

N{t + dt) = N(t ) - N(t)- • (43.2) 

T 

The quantity on the left-hand side, N(t + dt), can be written, according to the 
definitions of calculus, as N(t) + (dN/dt)dt. Making this substitution, Eq. 
(43.2) yields 

dm = _ m . (43.3) 

dt r 

The number that are being lost in the interval dt is proportional to the number that 
are present, and inversely proportional to the mean life r. Equation (43.3) is easily 
integrated if we rewrite it as 

dN(t) _ _dt 
N(t) r * 

Each side is a perfect differential, so the integral is 

In N(t) — —t/r + (a constant), 
which says the same thing as 

N(f) = (constant)e _(/T . 

We know that the constant must be just N 0 , the total number of molecules present, 
since all of them start at t = 0 to wait for their “next” collision. We can write 
our result as 

N(t) - Noe~ tlT . (43.7) 


(43.4) 

(43.5) 

(43.6) 
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If we wish the probability of no collision, P(t), we can get it by dividing N{i) by 
No, so 

P(t) = e ~ tlr . (43.8) 

Our result is: the probability that a particular molecule survives a time t without a 
collision is e~ tlT 9 where r is the mean time between collisions. The probability 
starts out at 1 (or certainty) for t = 0, and gets less as t gets bigger and bigger. 
The probability that the molecule avoids a collision for a time equal to r is e~ l = 
0.37 . . . The chance is less than one-half that it will have a greater than average time 
between collisions. That is all right, because there are enough molecules which go 
collision-free for times much longer than the mean time before colliding, so that the 
average time can still be r. 

We originally defined r as the average time between collisions. The result we 
have obtained in Eq. (43.7) also says that the mean time from an arbitrary starting 
instant to the next collision is also r. We can demonstrate this somewhat surprising 
fact in the following way. The number of molecules which experience their next 
collision in the interval dt at the time t after an arbitrarily chosen starting time is 
N(t ) dt/r . Their “time until the next collision” is, of course, just t. The “average 
time until the next collision” is obtained in the usual way: 

/•X 

Average time until the next collision = I t — * 

No Jo T 

Using N(t) obtained in (43.7) and evaluating the integral, we find indeed that r 
is the average time from any instant until the next collision. 


43-2 The mean free path 

Another way of describing the molecular collisions is to talk not about the 
time between collisions, but about how far the particle moves between collisions. 
If we say that the average time between collisions is r, and that the molecules have 
a mean velocity v, we can expect that the average distance between collisions, 
which we shall call /, is just the product of r and v. This distance between collisions 
is usually called the mean free path : 

Mean free path / = rv. (43.9) 


In this chapter we shall be a little careless about what kind of average we mean 
in any particular case. The various possible averages — the mean, the root-mean- 
square, etc. — are all nearly equal and differ by factors which are near to one. Since 
a detailed analysis is required to obtain the correct numerical factors anyway, we 
need not worry about which average is required at any particular point. We may 
also warn the reader that the algebraic symbols we are using for some of the 
physical quantities (e.g., / for the mean free path) do not follow a generally accepted 
convention, mainly because there is no general agreement. 

Just as the chance that a molecule will have a collision in a short time dt is 
equal to rfr/r, the chance that it will have a collision in going a distance dx is dx/l. 
Following the same line of argument used above, the reader can show that the 
probability that a molecule will go at least the distance x before having its next 
collision is e~ xl1 . 

The average distance a molecule goes before colliding with another molecule — 
the mean free path /—will depend on how many molecules there are around and 
on the “size” of the molecules, i.e., how big a target they represent. The effective 
“size” of a target in a collision we usually describe by a “collision cross section,” 
the same idea that is used in nuclear physics, or in light-scattering problems. 

Consider a moving particle which travels a distance dx through a gas which 
has «o scatterers (molecules) per unit volume (Fig. 43-1). If we look at each 
unit of area perpendicular to the direction of motion of our selected particle, we 
will find there n 0 dx molecules. If each one presents an effective collision area or, 
as it is usually called, “collision cross section,” cr c , then the total area covered by 
the scatterers is <x c n Q dx . 



Fig. 43-1. Collision cross section. 
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By “collision cross section” we mean the area within which the center of our 
particle must be located if it is to collide with a particular molecule. If molecules 
were little spheres (a classical picture) we would expect that a c — ir(ri + r 2 ) 2 , 
where r\ and r 2 are the radii of the two colliding objects. The chance that our 
particle will have a collision is the ratio of the area covered by scattering molecules 
to the total area, which we have taken to be one. So the probability of a collision in 
going a distance dx is just <r c n 0 dx: 

Chance of a collision in dx — <x c n 0 dx. (43.10) 

We have seen above that the chance of a collision in dx can also be written in 
terms of the mean free path / as dx/L Comparing this with (43.10), we can relate 
the mean free path to the collision cross section: 

j = CcMo, (43.11) 

which is easier to remember if we write it as 

<W = 1. (43.12) 

This formula can be thought of as saying that there should be one collision, 

on the average, when the particle goes through a distance / in which the scattering 

molecules could just cover the total area. In a cylindrical volume of length / and a 
base of unit area, there are n 0 l scatterers; if each one has an area <r c the total area 
covered is n 0 la c , which is just one unit of area. The whole area is not covered, of 
course, because some molecules are partly hidden behind others. That is why 
some molecules go farther than / before having a collision. It is only on the average 
that the molecules have a collision by the time they go the distance /. From measure- 
ments of the mean free path / we can determine the scattering cross section <r c , and 
compare the result with calculations based on a detailed theory of atomic structure. 
But that is a different subject! So we return to the problem of nonequilibrium 
states. 

43-3 The drift speed 

We want to describe what happens to a molecule, or several molecules, which 
are different in some way from the large majority of the molecules in a gas. We 
shall refer to the “majority” molecules as the “background” molecules, and we 
shall call the molecules which are different from the background molecules “special” 
molecules or, for short, the 5-molecules. A molecule could be special for any 
number of reasons: It might be heavier than the background molecules. It might 
be a different chemical. It might have an electric charge — i.e., be an ion in a back- 
ground of uncharged molecules. Because of their different masses or charges the 
5-molecules may have forces on them which are different from the forces on the 
background molecules. By considering what happens to these 5-molecules we can 
understand the basic effects which come into play in a similar way in many different 
phenomena. To list a few: the diffusion of gases, electric currents in batteries, 
sedimentation, centrifugal separation, etc. 

We begin by concentrating on the basic process: an 5-molecule in a back- 
ground gas is acted on by some specific force F (which might be, e.g., gravitational 
or electrical) and in addition by the not-so-specific forces due to collisions with the 
background molecules. We would like to describe the general behavior of the 
5-molecule. What happens to it, in detail , is that it darts around hither and yon as 
it collides over and over again with other molecules. But if we \yatch it carefully 
we see that it does make some net progress in the direction of the force F. We say 
that there is a drift y superposed on its random motion. We would like to know what 
the speed of its drift is — its drift velocity — due to the force F. 

If we start to observe an 5-molecule at some instant we may expect that it is 
somewhere between two collisions. In addition to the velocity it was left with after 
its last collision it is picking up some velocity component due to the force F. In a 
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short time (on the average, in a time r) it will experience a collision and start out on 
a new piece of its trajectory. It will have a new starting velocity, but the same ac- 
celeration from F. 

To keep things simple for the moment, we shall suppose that after each 
collision our 5-molecule gets a completely “fresh” start. That is, that it keeps no 
remembrance of its past acceleration by F. This might be a reasonable assumption 
if our 5-molecule were much lighter than the background molecules, but it is 
certainly not valid in general. We shall discuss later an improved assumption. 

For the moment, then, our assumption is that the 5-molecule leaves each 
collision with a velocity which may be in any direction with equal likelihood. 
The starting velocity will take it equally in all directions and will not contribute 
to any net motion, so we shall not worry further about its initial velocity after a 
collision. In addition, to its random motion, each 5-molecule will have, at any 
moment, an additional velocity in the direction of the force F, which it has picked 
up since its last collision. What is the average value of this part of the velocity? 
It is just the acceleration F/m (where m is the mass of the 5-molecule) times the 
average time since the last collision. Now the average time since the last collision 
must be the same as the average time until the next collision, which we have called 
r, above. The average velocity from F, of course, is just what is called the drift 
velocity, so we have the relation 

»drift = ~ (43.13) 

This basic relation is the heart of our subject. There may be some complication 
in determining what r is, but the basic process is defined by Eq. (43.13). 

You will notice that the drift velocity is proportional to the force. There is, 
unfortunately, no generally used name for the constant of proportionality. Differ- 
ent names have been used for each different kind of force. If in an electrical prob- 
lem the force is written as the charge times the electric field, F = qE , then the con- 
stant of proportionality between the velocity and the electric field E is usually 
called the “mobility.” In spite of the possibility of some confusion, we shall use 
the term mobility for the ratio of the drift velocity to the force for any force. We 
write 

^ihft = P-F (43.14) 

in general, and we shall call /z the mobility. We have from Eq. (43.13) that 

p = r/m. (43.15) 

The mobility is proportional to the mean time between collisions (there are fewer 
collisions to slow it down) and inversely proportional to the mass (more inertia 
means less speed picked up between collisions). 

To get the correct numerical coefficient in Eq. (43.13), which is correct as 
given, takes some care. Without intending to confuse, we should still point out 
that the arguments have a subtlety which can be appreciated only by a careful and 
detailed study. To illustrate that there are difficulties, in spite of appearances, we 
shall make over again the argument which led to Eq. (43.13) in a reasonable but 
erroneous way (and the way one will find in many textbooks!). 

We might have said: The mean time between collisions is r. After a collision 
the particle starts out with a random velocity, but it picks up an additional velocity 
between collisions, which is equal to the acceleration times the time. Since it takes 
the time r to arrive at the next collision it gets there with the velocity {F/m)r. At the 
beginning of the collision it had zero velocity. So between the two collisions it has, 
on the average, a velocity one-half of the final velocity, so the mean drift velocity 
is \Fr/m. (Wrong!) This result is wrong and the result in Eq. (43.13) is right, 
although the arguments may sound equally satisfactory. The reason the second 
result is wrong is somewhat subtle, and has to do with the following: The argument 
is made as though all collisions were separated by the mean time r. The fact is 
that some times are shorter and others are longer than the mean. Short times occur 
more often but make less contribution to the drift velocity because they have less 
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chance “to really get going.” If one takes proper account of the distribution of 
free times between collisions, one can show that there should not be the factor | 
that was obtained from the second argument. The error was made in trying to 
relate by a simple argument the average final velocity to the average velocity itself. 
This relationship is not simple, so it is best to concentrate on what is wanted: 
the average velocity itself. The first argument we gave determines the average 
velocity directly and correctly! But we can perhaps see now why we shall not in 
general try to get all of the correct numerical coefficients in our elementary deriva- 
tions! 

We return now to our simplifying assumption that each collision knocks out 
all memory of the past motion — that a fresh start is made after each collision. 
Suppose our 5-molecule is a heavy object in a background of lighter molecules. 
Then our 5-molecule will not lose its “forward” momentum in each collision. It 
would take several collisions before its motion was “randomized” again. We 
should assume, instead, that at each collision — in each time t on the average 
it loses a certain fraction of its momentum. We shall not work out the details, 
but just state that the result is equivalent to replacing t, the average collision tin™, 
by a new— and longer — t which corresponds to the average “forgetting time,” 
i.e., the average time to forget its forward momentum. With such an interpretation 
of t we can use our formula (43.15) for situations which are not quite as simple 
as we first assumed. 


43-4 Ionic conductivity 



Fig. 43-2. Electric current from an 
ionized gas. 


We now apply our results to a special case. Suppose we have a gas in a vessel 
in which there are also some ions— atoms or molecules with a net electric charge. 
We show the situation schematically in Fig. 43-2. If two opposite walls of the 
container are metallic plates, we can connect them to the terminals of a battery 
and thereby produce an electric field in the gas. The electric field will result in a 
force on the ions, so they will begin to drift toward one or the other of the plates. 
An electric current will be induced, and the gas with its ions will behave like a 
resistor. By computing the ion flow from the drift velocity we can compute the 
resistance. We ask, specifically: How does the flow of electric current depend 
on the voltage difference V that we apply across the two plates? 

We consider the case that our container is a rectangular box of length b and 
cross-sectional area A (Fig. 43-2). If the potential difference, or voltage, from one 
plate to the other is V, the electric field E between the plates is V/b. (The electric 
potential is the work done in carrying a unit charge from one plate to the other. 
The force on a unit charge is E. If E is the same everywhere between the plates, 
which is a good enough approximation for now, the work done on a unit charge 
is just Eb, so V = Eb.) The special force on an ion of the gas is <?E, where q is 
the charge on the ion. The drift velocity of the ion is then fi times this force, or 


Adrift = fxF = nqE — nq 


(43.16) 


An electric current / is the flow of charge in a unit time. The electric current to 
one of the plates is given by the total charge of the ions which arrive at the plate in 
a unit of time. If the ions drift toward the plate with the velocity » dPift , then those 
which are within a distance (v drift - T) will arrive at the plate in the time T. If 
there are ni ions per unit volume, the number which reach the plate in the time 
T is («i • A • thrift ' T). Each ion carries the charge q, so we have that 

Charge collected in T = qniAv drUt T. (43.17) 

The current I is the charge collected in T divided by T, so 
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I qn{Av dT 


(43.18) 



Substituting thrift from (43.16), we have 


/ = mV ^ V. (43.19) 

We find that the current is proportional to the voltage, which is just the form of 
Ohm’s law, and the resistance R is the inverse of the proportionality constant: 

^ = mV j • (43.20) 

We have a relation between the resistance and the molecular properties m, q, and 
ju, which depends in turn on m and r. If we know ni and q from atomic measure- 
ments, a measurement of R could be used to determine p, and from fi also r. 


43-5 Molecular diffusion 

We turn now to a different kind of problem, and a different kind of analysis: 
the theory of diffusion. Suppose that we have a container of gas in thermal 
equilibrium, and that we introduce a small amount of a different kind of gas at 
some place in the container. We shall call the original gas the “background” gas 
and the new one the “special” gas. The special gas will start to spread out through 
the whole container, but it will spread slowly because of the presence of the back- 
ground gas. This slow spreading-out process is called diffusion. The diffusion is 
controlled mainly by the molecules of the special gas getting knocked about by the 
molecules of the background gas. After a large number of collisions, the special 
molecules end up spread out more or less evenly throughout the whole volume. We 
must be careful not to confuse diffusion of a gas with the gross transport that may 
occur due to convection currents. Most commonly, the mixing of two gases occurs 
by a combination of convection and diffusion. We are interested now only in the 
case that there are no “wind” currents. The gas is spreading only by molecular 
motions, by diffusion. We wish to compute how fast diffusion takes place. 

We now compute the net flow of molecules of the “special” gas due to the 
molecular motions. There will be a net flow only when there is some nonuniform 
distribution of the molecules, otherwise all of the molecular motions would average 
to give no net flow. Let us consider first the flow in the x-direction. To find the 
flow, we consider an imaginary plane surface perpendicular to the x-axis and count 
the number of special molecules that cross this plane. To obtain the net flow, we 
must count as positive those molecules which cross in the direction of positive x 
and subtract from this number the number which cross in the negative x-direction. 
As we have seen many times, the number which cross a surface area in a time AT 
is given by the number which start the interval AT in a volume which extends the 
distance v AT from the plane. (Note that v, here, is the actual molecular velocity, 
not the drift velocity.) 

We shall simplify our algebra by giving our surface one unit of area. Then 
the number of special molecules which pass from left to right (taking the +x-direc- 
tion to the right) is n_ v AT, where n_ is the number of special molecules per unit 
volume to the left (within a factor of 2 or so, but we are ignoring such factors!). 
The number which cross from right to left is, similarly, n+v AT, where n + is the 
number density of special molecules on the right-hand side of the plane. If we 
call the molecular current J, by which we mean the net flow of molecules per unit 
area per unit time, we have 

J== n - vAT .Z f n ^, (43.21) 

or 

J = («_ — n+)v. (43.22) 

What shall we use for and « + ? When we say “the density on the left,” 
how far to the left do we mean? We should choose the density at the place from 
which the molecules started their “flight,” because the number which start such 
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trips is determined by the number present at that place. So by n_ we should mean 
the density a distance to the left equal to the mean free path /, and by n + , the density 
at the distance / to the right of our imaginary surface. 

It is convenient to consider that the distribution of our special molecules in 
space is described by a continuous function of x, y, and z which we shall call n a . 
By n a (x, y, z) we mean the number density of special molecules in a small volume 
element centered on (x, y, z). In terms of n a we can express the difference (n + — 
«_) as 

(*+-»-)-£ ( 43 - 23 ) 

Substituting this result in Eq. (43.22) and neglecting the factor of 2, we get 

J, - -k ^ ■ (43.24) 


We have found that the flow of special molecules is proportional to the derivative 
of the density, or to what is sometimes called the “gradient” of the density. 

It is clear that we have made several rough approximations. Besides various 
factors of two we have left out, we have used v where we should have used v x , and 
we have assumed that n + and n_ refer to places at the perpendicular distance / 
from our surface, whereas for those molecules which do not travel perpendicular 
to the surface element, / should correspond to the slant distance from the surface. 
All of these refinements can be made; the result of a more careful analysis shows 
that the right-hand side of Eq. (43.24) should be multiplied by 1/3. So a better 
answer is 


J x 


h dria 
3 dx 


(43.25) 


Similar equations can be written for the currents in the y- and z-directions. 

The current J x and the density gradient dnjdx can be measured by macroscopic 
observations. Their experimentally determined ratio is called the “diffusion co- 
efficient,” D . That is, 

= • (43 - 26 > 


We have been able to show that for a gas we expect 

D = £/y. (43.27) 

So far in this chapter we have considered two distinct processes: mobility , 
the drift of molecules due to “outside” forces; and diffusion , the spreading deter- 
mined only by the internal forces, the random collisions. There is, however, a 
relation between them, since they both depend basically on the thermal motions, 
and the mean free path / appears in both calculations. 

If, in Eq. (43.25), we substitute l = vt and r = jum, we have 


t i 2 dn a 

J x= -hnw^. 

(43.28) 

But mv 2 depends only on the temperature. We recall that 


\mv 2 = f£T, 

(43.29) 

so 



(43.30) 


We find that D , the diffusion coefficient, is just kT times ju, the mobility coefficient: 

D = fikT. (43.31) 

And it turns out that the numerical coefficient in (43.31) is exactly right — no extra 
factors have to be thrown in to adjust for our rough assumptions. We can show, 
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in fact, that (43.31) must always be correct — even in complicated situations 
(for example, the case of a suspension in a liquid) where the details of our simple 
calculations would not apply at all. 

To show that (43.31) must be correct in general, we shall derive it in a different 
way, using only our basic principles of statistical mechanics. Imagine a situation 
in which there is a gradient of “special” molecules, and we have a diffusion current 
proportional to the density gradient, according to Eq. (43.26). We now apply a 
force field in the x-direction, so that each special molecule feels the force F. Accord- 
ing to the definition of the mobility ju there will be a drift velocity given by 

Adrift = nF. (43.32) 

By our usual arguments, the drift current (the net number of molecules which pass 
a unit of area in a unit of time) will be 


drift ^a^drift) (43.33) 

or 

/drift = n a tiF. (43.34) 


We now adjust the force F so that the drift current due to F just balances the diffu- 
sion, so that there is no net flow of our special molecules. We have J x + /drift = 0, 
or 

D ^ = n ai xF. (43.35) 


Under the “balance” conditions we find a steady (with time) gradient of density 
given by 


dn a = n a \xF 
dx ~D 


(43.36) 


But notice! We are describing an equilibrium condition, so our equilibrium 
laws of statistical mechanics apply. According to these laws the probability of 
finding a molecule at the coordinate x is proportional to e ~ ulkT , where U is the 
potential energy. In terms of the number density n a , this means that 

/la = n 0 e- u i kT . (43.37) 


If we differentiate (43.37) with respect to x, we find 


or 


dn a 

dx 


- n 0 e 


—U/kT 


J_ dU 
kT dx 


dn a _ _ n a dU_ 
dx kT dx 


(43.38) 

(43.39) 


In our situation, since the force F is in the x-direction, the potential energy U is 
just — Fx, and —dU/dx = F. Equation (43.39) then gives 

^ . (43.40) 

dx kT 

[This is just exactly Eq. (40.2), from which we deduced e~ u ^ kT in the first place, 
so we have come in a circle]. Comparing (43.40) with (43.36), we get exactly Eq. 
(43.31). We have shown that Eq. (43.31), which gives the diffusion current in terms 
of the mobility, has the correct coefficient and is very generally true. Mobility and 
diffusion are intimately connected. This relation was first deduced by Einstein. 


43-6 Thermal conductivity 

The methods of the kinetic theory that we have been using above can be used 
also to compute the thermal conductivity of a gas. If the gas at the top of a con- 
tainer is hotter than the gas at the bottom, heat will flow from the top to the bottom. 
(We think of the top being hotter because otherwise convection currents would be 
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set up and the problem would no longer be one of heat conduction .) The transfer 
of heat from the hotter gas to the colder gas is by the diffusion of the “hot” mole- 
cules — those with more energy — downward and the diffusion of the “cold” mole- 
cules upward. To compute the flow of thermal energy we can ask about the energy 
carried downward across an element of area by the downward-moving molecules, 
and about the energy carried upward across the surface by the upward-moving 
molecules. The difference will give us the net downward flow of energy. 

The thermal conductivity k is defined as the ratio of the rate at which thermal 
energy is carried across a unit surface area, to the temperature gradient: 


}_dQ = _ dT 
A dt K dz 


( 43 . 41 ) 


Since the details of the calculations are quite similar to those we have done above 
in considering the flow of electric current in an ionized gas, we shall leave it as an 
exercise for the reader to show that 


K 


knlv 


( 43 . 42 ) 


where (7 — 1 )kT is the average energy of a molecule at the temperature T. 
If we use our relation nla c = 1 , the heat conductivity can be written as 


1 kv 
7—1 <J C 


( 43 . 43 ) 


We have a rather surprising result. We know that the average velocity of gas 
molecules depends on the temperature but not on the density . We expect <r c to 
depend only on the size of the molecules. So our simple result says that the thermal 
conductivity k (and therefore the rate of flow of heat in any particular circumstance) 
is independent of the density of the gas! The change in the number of “carriers” 
of energy with a change in density is just compensated by the larger distance the 
“carriers” can go between collisions. 

One may ask: “Is the heat flow independent of the gas density in the limit as 
the density goes to zero? When there is no gas at all?” Certainly not! The formula 
(43.43) W as derived, as were all the others in this chapter, under the assumption 
that the mean free path between collisions is much smaller than any of the dimen- 
sions of the container. Whenever the gas density is so low that a molecule has a fair 
chance of crossing from one wall of its container to the other without having a 
collision, none of the calculations of this chapter apply. We must in such cases go 
back to kinetic theory and calculate again the details of what will occur. 
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44 


The JLairs of Thermodynamics 


44-1 Heat engines; the first law 

So far we have been discussing the properties of matter from the atomic point 
of view, trying to understand roughly what will happen if we suppose that things 
are made of atoms obeying certain laws. However, there are a number of relation- 
ships among the properties of substances which can be worked out without con- 
sideration of the detailed structure of the materials. The determination of the 
relationships among the various properties of materials, without knowing their 
internal structure, is the subject of thermodynamics. Historically, thermodynamics 
was developed before an understanding of the internal structure of matter was 
achieved. 

To give an example: we know from the kinetic theory that the pressure of a gas 
is caused by molecular bombardment, and we know that if we heat a gas, so that 
the bombardment increases, the pressure must increase. Conversely, if the piston 
in a container of the gas is moved inward against the force of bombardment, the 
energy of the molecules bombarding the piston will increase, and consequently 
the temperature will increase. So, on the one hand, if we increase the temperature 
at a given volume, we increase the pressure. On the other hand, if we compress 
the gas, we will find that the temperature will rise. From the kinetic theory, one 
can derive a quantitative relationship between these two effects, but instinctively 
one might guess that they are related in some necessary fashion which is independ- 
ent of the details of the collisions. 

Let us consider another example. Many people are familiar with this interest- 
ing property of rubber: If we take a rubber band and pull it, it gets warm. If one 
puts it between his lips, for example, and pulls it out, he can feel a distinct 
warming, and this warming is reversible in the sense that if he relaxes the rubber 
band quickly while it is between his lips, it is distinctly cooled. That means that 
when we stretch a rubber band it heats, and when we release the tension of the band 
it cools. Now our instincts might suggest that if we heated a band, it might pull: 
that the fact that pulling a band heats it might imply that heating a band should 
cause it to contract. And, in fact, if we apply a gas flame to a rubber band holding 
a weight, we will see that the band contracts abruptly (Fig. 44-1). So it is true that 
when we heat a rubber band it pulls, and this fact is definitely related to the fact 
that when we release the tension of it, it cools. 

The internal machinery of rubber that causes these effects is quite complicated. 
We will describe it from a molecular point of view to some extent, although our 
main purpose in this chapter is to understand the relationship of these effects 
independently of the molecular model. Nevertheless, we can show from the molec- 
ular model that the effects are closely related. One way to understand the behavior 
of rubber is to recognize that this substance consists of an enormous tangle of 
long chains of molecules, a kind of “molecular spaghetti,” with one extra compli- 
cation: between the chains there are cross-links— like spaghetti that is sometimes 
welded together where it crosses another piece of spaghetti — a grand tangle. 
When we pull out such a tangle, some of the chains tend to line up along the direc- 
tion of the pull. At the same time, the chains are in thermal motion, so they hit 
each other continually. It follows that such a chain, if stretched, would not by 
itself remain stretched, because it would be hit from the sides by the other chains 
and other molecules, and would tend to kink up again. So the real reason why a 
rubber band tends to contract is this: when one pulls it out, the chains are length- 
wise, and the thermal agitations of the molecules on the sides of the chains tend 
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Fig. 44-1. The heated rubber band. 
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Fig. 44-2. The rubber-band heat 
engine. 


to kink the chains up, and make them shorten. One can then appreciate that if the 
chains are held stretched and the temperature is increased, so that the vigor of the 
bombardment on the sides of the chains is also increased, the chains tend to pull 
in, and they are able to pull a stronger weight when heated. If, after being stretched 
for a time, a rubber band is allowed to relax, each chain becomes soft, and the 
molecules striking it lose energy as they pound into the relaxing chain. So the 
temperature falls. 

We havp seen how these two process, contraction when heated and cooling 
during relaxation, can be related by the kinetic theory, but it would be a tremendous 
challenge to determine from the theory the precise relationship between the two. 
We would have to know how many collisions there were each second and what the 
chains look like, and we would have to take account of all kinds of other complica- 
tions. The detailed mechanism is so complex that we cannot, by kinetic theory, 
really determine exactly what happens; still, a definite relation between the two 
effects we observe can be worked out without knowing anything about the internal 
machinery! 

The whole subject of thermodynamics depends essentially upon the following 
kind of consideration: because a rubber band is “stronger” at higher temperatures 
than it is at lower temperatures, it ought to be possible to lift weights, and to 
move them around, and thus to do work with heat. In fact, we have already seen 
experimentally that a heated rubber band can lift a weight. The study of the way 
that one does work with heat is the beginning of the science of thermodynamics. 
Can we make an engine which uses the heating effect on a rubber band to do work? 
One can make a silly looking engine that does just this. It consists of a bicycle 
wheel in which all the spokes are rubber bands (Fig. 44-2). If one heats the rubber 
bands on one side of the wheel with a pair of heat lamps, they become “stronger” 
than the rubber bands on the other side. The center of gravity of the wheel will 
be pufled to one side, away from the bearing, so that the wheel turns. As it turns, 
cool rubber bands move toward the heat, and the heated bands move away from 
the heat and cool, so that the wheel turns slowly so long as the heat is applied. 
The efficiency of this engine is extremely low. Four hundred watts of power pour 
into the two lamps, but it is just possible to lift a fly with such an engine! An inter- 
esting question, however, is whether we can get heat to do the work in more 
efficient ways. 

In fact, the science of thermodynamics began with an analysis, by the great 
engineer Sadi Carnot, of the problem of how to build the best and most efficient 
engine, and this constitutes one of the few famous cases in which engineering has 
contributed fundamentally to physical theory. Another example that comes 
to mind is the more recent analysis of information theory by Claude Shannon. 
These two analyses, incidentally, turn out to be closely related. 

Now the way a steam engine ordinarily operates is that heat from a fire boils 
some water, and the steam so formed expands and pushes on a piston which makes 
a wheel go around. So the steam pushes the piston — what then? One has to finish 
the job: q stupid way to complete the cycle would be to let the steam escape into 
the air, for then one has to keep supplying water. It is cheaper — more efficient — 
to let the steam go into another box, where it is condensed by cool water, and then 
pump the water back into the boiler, so that it circulates continuously. Heat is 
thus supplied to the engine and converted into work. Now would it be better to 
use alcohol? What property should a substance have so that it makes the best 
possible engine? That was the question to which Carnot addressed himself, and 
one of the by-products was the discovery of the type of relationship that we have 
just explained above. 

The results of thermodynamics are all contained implicitly in certain appar- 
ently simple statements called the laws of thermodynamics. At the time when Carnot 
lived, the first law of thermodynamics, the conservation of energy, was not known. 
Carnot’s arguments were so carefully drawn, however, that they are valid even 
though the first law was not known in his time! Some time afterwards, Clausius 
made a simpler derivation that could be understood more easily than Carnot’s 
very subtle reasoning. But it turned out that Clausius assumed, not the conserva- 
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tion of energy in general, but that heat was conserved according to the caloric 
theory, which was later shown to be false. So it has often been said that Carnot’s 
logic was wrong. But his logic was quite correct. Only Clausius’ simplified ver- 
sion, that everybody read, was incorrect. 

The so-called second law of thermodynamics was thus discovered by Carnot 
before the first law! It would be interesting to give Carnot’s argument that did not 
use the first law, but we shall not do so because we want to learn physics, not 
history. We shall use the first law from the start, in spite of the fact that a great 
deal can be done without it. 

Let us begin by stating the first law, the conservation of energy: if one has a 
system and puts heat into it, and does work on it, then its energy is increased by 
the heat put in and the work done. We can write this as follows: The heat Q put 
into the system, plus the work W done on the system, is the increase in the energy 
U of the system; the latter energy is sometimes called the internal energy: 

Change in U = Q + W. (44.1) 

The change in U can be represented as adding a little heat A Q and adding a little 
work AW: 

AU = AQ + AW, (44.2) 

which is a differential form of the same law. We know that very well, from an 
earlier chapter. 

44-2 The second ldw 

Now, what about the second law of thermodynamics? We know that if we 
do work against friction, say, the work lost to us is equal to the heat produced. 
If we do work in a room at temperature T , and we do the work slowly enough, the 
room temperature does not change much, and we have converted work into heat 
at a given temperature. What about the reverse possibility? Is it possible to convert 
the heat back into work at a given temperature? The second law of thermo- 
dynamics asserts that it is not. It would be very convenient to be able to convert 
heat into work merely by reversing a process like friction. If we consider only the 
conservation of energy, we might think that heat energy, such as that in the vibra- 
tional motions of molecules, might provide a goodly supply of useful energy. 
But Carnot assumed that it is impossible to extract the energy of heat at a single 
temperature. In other words, if the whole world were at the same temperature, 
one could not convert any of its heat energy into work: while the process of making 
work go into heat can take place at a given temperature, one cannot reverse it to 
get the work back again. Specifically, Carnot assumed that heat cannot be taken 
in at a certain temperature and converted into work with no other change in the 
system or the surroundings. 

That last phrase is very important. Suppose we have a can of compressed 
air at a certain temperature, and we let the air expand. It can do work; it can make 
hammers go, for example. It cools off a little in the expansion, but if we had a big 
sea, like the ocean, at a given temperature — a heat reservoir — we could warm it up 
again. So we have taken the heat out of the sea, and we have done work with the 
compressed air. But Carnot was not wrong, because we did not leave everything as 
it was . If we recompress the air that we let expand, we will find we are doing 
extra work, and when we are finished we will discover that we not only got no 
work out of the system at temperature T, but we actually put some in. We must 
talk only about situations in which the net result of the whole process is to take heat 
away and convert it into work, just as the net result of the process of doing work 
against friction is to take work and convert it into heat. If we move in a circle, we 
can bring the system back precisely to its starting point, with the net result that 
we did work against friction and produced heat. Can we reverse the process? 
Turn a switch, so that everything goes backwards, so the friction does work against 
us, and cools the sea? According to Carnot: no! So let us suppose that this is 
impossible. 


44-3 




If it were possible it would mean, among other things, that we could take 
heat out of a cold body and put it into a hot body at no cost, as it were. Now we 
know it is natural that a hot thing can warm up a cool thing; if we simply put a 
hot body and a cold one together, and change nothing else, our experience assures 
us that it is not going to happen that the hot one gets hotter, and the cold one gets 
colder! But if we could obtain work by extracting the heat out of the ocean, say, 
or from anything else at a single temperature, then that work could be converted 
back into heat by friction at some other temperature. For instance, the other arm 
of a working machine could be rubbing something that is already hot. The net 
result would be to take heat from a “cold” body, the ocean, and to put it into a 
hot body. Now, the hypothesis of Carnot, the second law of thermodynamics, is 
sometimes stated as follows: heat cannot, of itself, flow from a cold to a hot object. 
But, as we have just seen, these two statements are equivalent: first, that one can- 
not devise a process whose only result is to convert heat to work at a single temper- 
ature, and second, that one cannot make heat flow by itself from a cold to a hot 
place. We shall mostly use the first form. 

Carnot’s analysis of heat engines is quite similar to the argument that we 
gave about weight-lifting engines in our discussion of the conservation of energy 
in Chapter 4. In fact, that argument was patterned after Carnot’s argument about 
heat engines, and so the present treatment will sound very much the same. 

Suppose we build a heat engine that has a “boiler” somewhere at a tempera- 
ture TV A certain heat Q x is taken from the boiler, the steam engine does some 
work W, and it then delivers some heat Q 2 into a “condenser” at another tempera- 
ture T 2 (Fig. 44-3). Carnot did not say how much heat, because he did not know 
the first law, and he did not use the law that Q 2 was equal to Q x because he did 
not believe it. Although everybody thought that, according to the caloric theory, 
the heats Q x and Q 2 would have to be the same, Carnot did not say they were the 
same — that is part of the cleverness of his argument. If we do use the first law, we 
find that the heat delivered, Q 2 , is the heat Q x that was put in minus the work W 
that was done: 

Q 2 = Qi - w. (44.3) 

(If we have some kind of cyclic process where water is pumped back into the 
boiler after it is condensed, we will say that we have heat Q x absorbed and work 
W done, during each cycle, for a certain amount of water that goes around the 
cycle.) 

Now we shall build another engine, and see if we cannot get more y/ork from 
the same amount of heat being delivered at the temperature T i, with the condenser 
still at the temperature T 2 . We shall use the same amount of heat Q x from the 
boiler, and we shall try to get more work than we did out of the steam engine, 
perhaps by using another fluid, such as alcohol. 



Fig. 44-4. Reversible heat transfer. 


44-3 Reversible engines 

Now we must analyze our engines. One thing is clear: we will lose something 
if the engines contain devices in which there is friction. The best engine will be a 
frictionless engine. We assume, then, the same idealization that we did when we 
studied the conservation of energy; that is, a perfectly frictionless engine. 

We must also consider the analog of frictionless motion, “frictionless” heat 
transfer. If we put a hot object at a high temperature against a cold object, so that 
the heat flows, then it is not possible to make that heat flow in a reverse direction 
by a very small change in the temperature of either object. But when we have a 
practically frictionless machine, if we push it with a little force one way, it goes 
that way, and if we push it with a little force the other way, it goes the other way. 
We need to find the analog of frictionless motion: heat transfer whose direction we 
can reverse with only a tiny change. If the difference in temperature is finite, that is 
impossible, but if one makes sure that heat flows always between two things at 
essentially the same temperature, with just an infinitesimal difference to make it 
flow in the desired direction, the flow is said to be reversible (Fig. 44-4). If we heat 
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the object on the left a little, heat will flow to the right; if we cool it a little, heat 
will flow to the left. So we find that the ideal engine is a so-called reversible engine, 
in which every process is reversible in the sense that, by minor changes, infinitesimal 
changes, we can make the engine go in the opposite direction. That means that 
nowhere in the machine must there be any appreciable friction, and nowhere in 
the machine must there be any place where the heat of the reservoirs, or the flame 
of the boiler, is in direct contact with something definitely cooler or warmer. 

Let us now consider an idealized engine in which all the processes are reversible. 
To show that such a thing is possible in principle, we will give an example of an 
engine cycle which may or may not be practical, but which is at least reversible, 
in the sense of Carnot’s idea. Suppose that we have a gas in a cylinder equipped 
with a frictionless piston. The gas is not necessarily a perfect gas. The fluid does not 
even have to be a gas, but to be specific let us say we do have a perfect gas. Also, 
suppose that we have two heat pads, T x and T 2 — great big things that have definite 
temperatures, T x and T 2 . We will suppose in this case that T i is higher than T 2 . 
Let us first heat the gas and at the same time expand it, while it is in contact with 
the heat pad at 7Y As we do this, pulling the piston out very slowly as the heat 
flows into the gas, we will make sure that the temperature of the gas never gets 
very far from 7Y If we pull the piston out too fast, the temperature of the gas will 
fall too much below 2T and then the process will not be quite reversible, but if 
we pull it out slowly enough, the temperature of the gas will never depart much 
from TV On the other hand, if we push the piston back slowly, the temperature 
would be only infinitesimally higher than T u and the heat would pour back. We 
see that such an isothermal (constant-temperature) expansion, done slowly and 
gently enough, is a reversible process. 

To understand what we are doing, we shall use a plot (Fig. 44-6) of the pressure 
of the gas against its volume. As the gas expands, the pressure falls. The curve 
marked (1) tells us how the pressure and volume change if the temperature is 
kept fixed at the value 7\. For an ideal gas this curve would be PV = NkTi. 
During an isothermal expansion the pressure falls as the volume increases until 
we stop at the point ft. At the same time, a certain heat Q i must flow into the gas 
from the reservoir, for if the gas were expanded without being in contact with the 
reservoir it would cool off, as we already know. Having completed the isothermal 
expansion, stopping at the point ft, let us take the cylinder away from the reservoir 
and continue the expansion. This time we permit no heat to enter the cylinder. 
Again we perform the expansion slowly, so there is no reason why we cannot 
reverse it, and we again assume there is no friction. The gas continues to expand 
and the temperature falls, since there is no longer any heat entering the cylinder. 

We let the gas expand, following the curve marked (2), until the temperature 
falls to r 2 , at the point marked c. This kind of expansion, made without adding 
heat, is called an adiabatic expansion. For an ideal gas, we already know that 
curve (2) has the form PV y = constant, where 7 is a constant greater than 1, so 
that the adiabatic curve has a more negative slope than the isothermal curve. The 
gas cylinder has now reached the temperature r 2 , so that if we put it on the heat 
pad at temperature T 2 there will be no irreversible changes. Now we slowly com- 
press the gas while it is in contact with the reservoir at r 2 , following the curve 
marked (3) (Fig. 44-5, Step 2). Because the cylinder is in contact with the reservoir, 
the temperature does not rise, but heat Q 2 flows from the cylinder into the reservoir 
at the temperature T 2 . Having compressed the gas isothermally along curve (3) 
to the point d , we remove the cylinder from the heat pad at temperature T 2 and 
compress it still further, without letting any heat flow out. The temperature will 
rise, and the pressure will follow the curve marked (4). If we carry out each step 
properly, we can return to the point a at temperature T\ where we started, and 
repeat the cycle. 

We see that on this diagram we have carried the gas around a complete cycle, 
and during one cycle we have put Q i in at temperature T u and have removed Q 2 
at temperature T 2 . Now the point is that this cycle is reversible, so that we could 
represent all the steps the other way around. We could have gone backwards 
instead of forwards: we could have started at point a, at temperature T u expanded 
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Fig. 44-5. Steps in Carnot cycle. 



Fig. 44-6. The Carnot cycle. 




Fig. 44-7. Reversible engine A being 
driven backwards by engine B. 


along the curve (4), expanded further ut the temperature r 2 , absorbing heat Q 2 , 
and so on, going around the cycle backward. If we go around the cycle in one direc- 
tion, we must do work on the gas; if we go in the other direction, the gas does work 
on us. 

Incidentally, it is easy to find out what the total amount of work is, because 
the work during any expansion is the pressure times the change in volume, jP dV. 
On this particular diagram, we have plotted P vertically and V horizontally. So 
if we call the vertical distance y and the horizontal distance x, this is jy dx— in 
other words, the area under the curve. So the area under each of the numbered 
curves is a measure of the work done by or on the gas in the corresponding step. 

It is easy to see that the net work done is the shaded area of the picture. 

Now that we have given a single example of a reversible machine, we shall 
suppose that other such engines are also possible. Let us assume that we have a 
reversible engine A which takes £>i atr 1; does work W, and delivers some heat at 
T 2 . Now let us assume we have any other engine B, made by man, already designed 
or not yet invented, made of rubber bands, steam, or whatever, reversible or not, 
which is designed so that it takes in the same amount of heat Q i at T i, and rejects 
the heat at the lower temperature T 2 (Fig. 44-7). Assume that engine B does 
some work, W'. Now we shall show that W' is not greater than IF— that no engine 
can do more work than a reversible one. Why? Suppose that, indeed, W' were 
bigger than W. Then we could take the heat Q j out of the reservoir at T u and 
with engine B we could do work W' and deliver some heat to the reservoir at J 2 ; 
we do not care how much. That done, we could save some of the work W', which is 
supposed to be greater than W ; we could use a part of it, W , and save the remainder, 
W' — W, for useful work. With the work W we could run engine A backwards 
because it is a reversible engine. It will absorb some heat from the reservoir at 
T 2 and deliver Q 1 back to the reservoir at 7Y After this double cycle, the net 
result would be that we would have put everything back the way it was before, and 
we would have done some excess work, namely W' — W, and all we would have 
done would be to extract energy from the reservoir at T 2 ! We were careful to re- 
store the heat Q i to the reservoir at T x . So that reservoir can be small and “inside” 
our combined machine A + B, whose net effect is therefore to extract a net heat 
ffrr _ jjr f rom t he reservoir at T 2 and convert it into work. But to obtain useful 
work from a reservoir at a single temperature with no other changes is impossible 
according to Carnot’s postulate; it cannot be done. Therefore no engine which 
absorbs a given amount of heat from a higher temperature T i and delivers it at the 
temperature T 2 can do more work than a reversible engine operating under the 
same temperature conditions. 

Now suppose that engine B is also reversible. Then, of course, not only must 
W l be not greater than W , but now we can reverse the argument and show that W 
cannot be greater than W f . So, if both engines are reversible they must both do 
the same amount of work, and we thus come to Carnot’s brilliant conclusion, 
that if an engine is reversible, it makes no difference how it is designed, because the 
amount of work one will obtain if the engine absorbs a given amount of heat at 
temperature T x and delivers heat at some other temperature T 2 does not depend 
on the design of the engine. It is a property of the world, not a property of a particu- 
lar engine. . 

If we could find out what the law is that determines how much work we obtain 
when we absorb the heat Q \ at T\ and deliver heat at T 2i this quantity would be a 
universal thing, independent of the substance. Of course if we knew the properties 
of a particular substance, we could work it out and then say that all other substances 
must give the same amount of work in a reversible engine. That is the key idea, the 
clue by which we can find the relationship between how much, for instance, a rub- 
ber band contracts when we heat it, and how much it cools when we let it contract. 
Imagine that we put that rubber band in a reversible machine, and that we make it 
go around a reversible cycle. The net result, the total amount of work done, is 
that universal function, that great function which is independent of substance. 
So we see that a substance’s properties must be limited in a certain way; one 
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cannot make up anything he wants, or he would be able to invent a substance which 
he could use to produce more than the maximum allowable work when he carried 
it around a reversible cycle. This principle, this limitation, is the only real rule that 
comes out of the thermodynamics. 


44-4 The efficiency of an ideal engine 

Now we shall try to find the law which determines the work W as a function 
of Qu T i, and T 2 . It is clear that W is proportional to Q u for if we consider two 
reversible engines in parallel, both working together and both double engines, the 
combination is also a reversible engine. If each one absorbed heat Q u the two to- 
gether absorb 2Q\ and the work done is 2 W, and so on. So it is not unreasonable 
that W is proportional to Q lm 

Now the next important step is to find this universal law. We can, and will, 
do so by studying a reversible engine with the one particular substance whose 
laws we know, a perfect gas. It is also possible to obtain the rule by a purely 
logical argument, using no particular substance at all. This is one of the very 
beautiful pieces of reasoning in physics and we are reluctant not to show it to 
you, so for those who would like to see it we shall discuss it in just a moment. 
But first we shall use the much less abstract and simpler method of direct calcula- 
tion for a perfect gas. 

We need only obtain formulas for Q x and Q 2 (for W is just Q x — Q 2 \ the 
heats exchanged with the reservoirs during the isothermal expansion or con- 
traction. For example, how much heat Q x is absorbed from the reservoir at 
temperature 7\ during the isothermal expansion [marked (1) in Fig. 44-6] from point 
a , at pressure p a , volume V a , temperature T u to point b with pressure p b , volume 
V b , and the same temperature T 1 ? For a perfect gas each molecule has an energy 
that depends only on the temperature, and since the temperature and the number 
of molecules are the same at a and at b , the internal energy is the same. There is 
no change in U; all the work done by the gas, 

W= [* pdV, 

Ja 


during the expansion is energy Q i taken from the reservoir. During the expansion, 
pV = NkT u or 


or 


P = 


NkTi 

V 


or 



(44.4) 


fii = 


NkTi In 


Vb 

V a 


is the heat taken from the reservoir at T x . In the same way, for the compression at 
T 2 [curve (3) of Fig. 44-6] the heat delivered to the reservoir at T 2 is 


02 = NkT 2 ln|r ' (44.5) 

To finish our analysis we need only find a relation between V c /V d and V b /V a . 
This we do by noting that (2) is an adiabatic expansion from b to c, during which 
pV y is a constant. Since pV = NkT, we can write this as (pV)V y ~ l = const or, 
in terms of T and V, as TV y ~ 1 — const, or 

T x Vl~ l = T 2 Vl~\ (44.6) 

Likewise, since (4), the expansion from d to a , is also adiabatic, we find 

= T 2 V y r l . (44.6a) 
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If we divide this equation by the previous one, we find that Vb/V a must equal 
V c /V d> so the In’s in (44.4) and (44.5) are equal, and that 


0 1 _ 02 
T x T 2 


(44.7) 



Fig. 44-8. Engines 1 and 2 together 
are equivalent to engine 3. 


This is the relation we were seeking. Although proved for a perfect gas engine, 
we know it must be true for any reversible engine at all 

Now we shall see how this universal law could also be obtained by logical 
argument, without knowing the properties of any specific substances, as follows. 
Suppose that we have three engines and three temperatures, let us say T u T 2) 
and TV Let one engine absorb heat Q x from the temperature T x and do a certain 
amount of work W 13 , and let it deliver heat Q 3 to the temperature T 3 (Fig. 44-8). 
Let another engine run backwards between T 2 and T 3 . Suppose that we let the 
second engine be of such a size that it will absorb the same heat Q 3i and deliver 
the heat Q 2 . We will have to put a certain amount of work, W 32i into it — negative 
because the engine is running backwards. When the first machine goes through a 
cycle, it absorbs heat Q x and delivers Q 3 at the temperature T 3 \ then the second 
machine takes the same heat Q 3 out of the reservoir at the temperature T 3 and 
delivers it into the reservoir at temperature T 2 . Therefore the net result of the 
two machines in tandem is to take the heat Q x from T u and deliver Q 2 at T 2 . 
The two machines are thus equivalent to a third one, which absorbs 0 X at T u 
does work W X2 , and delivers heat Q 2 at T 2i because W 12 — W± 3 — W 32 , 
as one can immediately show from the first law, as follows: 

*13 - ^32 = (01 - 03) ” (02 - 03) = 01 - 02 = JVi2. (44.8) 


We can now obtain the laws which relate the efficiencies of the engines, because 
there clearly must be some kind of relationship between the efficiencies of engines 
running between the temperatures T x and T 3i and between T 2 and T 3i and be- 
tween Ti and T 2 . 

We can make the argument very clear in the following way: We have just 
seen that we can always relate the heat absorbed at T x to the heat delivered at T 2 
by finding the heat delivered at some other temperature T 3 . Therefore we can 
get all the engines’ properties if we introduce a standard temperature, analyzing 
everything with that standard temperature. In other words, if we knew the effi- 
ciency of an engine running between a certain temperature T and a certain arbi- 
trary standard temperature, then we could work out the efficiency for any other 
difference in temperature. Because we assume we are using only reversible engines, 
we can work from the initial temperature down to the standard temperature and 
back up to the final temperature again. We shall define the standard temperature 
arbitrarily as one degree . We shall also adopt a special symbol for the heat which 
is delivered at this standard temperature: we shall call it Q s . In other words, when 
a reversible engine absorbs the heat Q at temperature T , it will deliver, at the unit 
temperature, a heat Q s . If one engine , absorbing heat Qi at T u delivers the heat 
Qs at one degree , and if an engine absorbing heat 0 2 at temperature T 2 will also 
deliver the same heat Qs at one degree , then it follows that an engine which absorbs 
heat Q\ at the temperature T x will deliver heat Q 2 if it runs between T x and T 2 , as 
we have already proved by considering engines running between three tempera- 
tures. So all we really have to do is to find how much heat Q x we need to put in 
at the temperature T 1 in order to deliver a certain amount of heat Q s at the unit 
temperature. If we discover that, we have everything. The heat £7, of course, is 
a function of the temperature T. It is easy to see that the heat must increase as the 
temperature increases, for we know that it takes work to run an engine backwards 
and deliver heat at a higher temperature. It is also easy to see that the heat Q x must 
be proportional to Qs . So the great law is something like this: for a given amount 
of heat Qs delivered at one degree from an engine running at temperature T 
degrees, the heat Q absorbed must be that amount Qs times some increasing func- 
tion of the temperature: 

0 = Qsf(n (44.9) 
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44-5 The thermodynamic temperature 

At this stage we are not going to try to find the formula for the above in- 
creasing function of the temperature in terms of our familiar mercury temperature 
scale, but instead we shall define temperature by a new scale. At one time “the 
temperature” was defined arbitrarily by dividing the expansion of water into even 
degrees of a certain size. But when one then measures temperature with a mercury 
thermometer, one finds that the degrees are no longer even. But now we can 
make a definition of temperature which is independent of any particular substance. 
We can use that function /(T), which does not depend on what device we use, be- 
cause the efficiency of these reversible engines is independent of their working sub- 
stances. Since the function we found is rising with temperature, we will define the 
function itself as the temperature, measured in units of the standard one-degree 
temperature, as follows: 

Q = ST, (44.10) 

where 

2s = 5'1°. (44.11) 

This means that we can tell how hot an object is by finding out how much heat is 
absorbed by a reversible engine working between the temperature of the object 
and the unit temperature (Fig. 44-9). If seven times more heat is taken out of a 
boiler than is delivered at a one-degree condenser, the temperature of the boiler 
will be called seven degrees, and so forth. So, by measuring how much heat is 
absorbed at different temperatures, we determine the temperature. The tempera- 
ture defined in this way is called the absolute thermodynamic temperature, and it 
is independent of the substance. We shall use this definition exclusively from now 
on.* 

Now we see that when we have two engines, one working between Ti and 
one degree, the other working between T 2 and one degree, delivering the same 
heat at unit temperature, then the heats absorbed must be related by 


/ / / / //^ / / / / / T 


Rcvtrtibl* 

Engine 


- W-O-S-l 



Fig. 44-9. 
temperature. 


Absolute thermodynamic 


Qi o Q2 
T t S T 2 


(44.12) 


But that means that if we have a single engine running between T x and T 2i then 
the result of the whole analysis, the grand finale, is that Q x is to T 1 as Q 2 is to T 2t 
if the engine absorbs energy Q t at temperature T x and delivers heat Q 2 at temper- 
ature 7Y Whenever the engine is reversible, this relationship between the heats 
must follow. That is all there is to it: that is the center of the universe of thermo- 
dynamics. 

If this is all there is to thermodynamics, why is it considered such a difficult 
subject? In doing a problem involving a given mass of some substance, the condi- 
tion of the substance at any moment can be described by telling what its tempera- 
ture is and what its volume is. If we know the temperature and volume of a sub- 
stance, and that the pressure is some function of the temperature and volume, 
then we know the internal energy. One could say, “I do not want to do it that way. 
Tell me the temperature and the pressure, and I will tell you the volume. I can 
think of the volume as a function of temperature and pressure, and the internal 
energy as a function of temperature and pressure, and so on.” That is why thermo- 
dynamics is hard, because everyone uses a different approach. If we could only sit 
down once and decide on our variables, and stick to them, it would be fairly easy. 

Now we start to make deductions. Just as F =7 ma is the center of the uni- 
verse in mechanics, and it goes on and on and on after that, in the same way the 
principle just found is all there is to thermodynamics. But can one make conclu- 
sions out of it? 


* We have previously defined our scale of temperature in a different way, namely by 
stating that the mean kinetic energy of a molecule in a perfect gas is proportional to the 
temperature, or that the perfect gas law says pV is proportional to T. Is this new definition 
equivalent? Yes, since the final result (44.7) derived from the gas law is the same a^ that 
derived here. We shall discuss this point again in the next chapter. 
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We begin. To obtain our first conclusion, we shall combine both laws, the 
law of conservation of energy and this law which relates the heats Q 2 and Q u 
and we can easily obtain the efficiency of a reversible engine. From the first law, 
we have W — Q x — Q 2 . According to our new principle, 

Q2 = jT Qi, 
j 1 

so the work becomes 

w = Qi (l - Ijr) = Qi Tl f i T ' 2 > (44.13) 

which tells us the efficiency of the engine — how much work we get out of so much 
heat. The efficiency of an engine is proportional to the difference in the tempera- 
tures between which the engine runs, divided by the higher temperature: 

Efficiency = = Tl ~ Ti • (44.14) 

Hi 1 1 

The efficiency cannot be greater than unity and the absolute temperature cannot 
be less than zero, absolute zero. So, since T 2 must be positive, the efficiency is 
always less than unity. That is our first conclusion. 

44-6 Entropy 

Equation (44.7) or (44.12) can be interpreted in a special way. Working 
always with reversible engines, a heat Q\ at temperature T i is “equivalent” to 
Q 2 at T 2 if Qi/Ti = £> 2 /^ 2 > in the sense that as one is absorbed the other is 
delivered. This suggests that if we call Q/T something, we can say: in a reversible 
process as much Q/T is absorbed as is liberated; there is no gain or loss of Q/T . 
This Q/T is called entropy , and we say “there is no net change in entropy in a 
reversible cycle.” If T is 1°, then the entropy is 2/1° or, as we symbolized it, 
Qs/ 1° = S. Actually, S is the letter usually used for entropy, and it is numerically 
equal to the heat (which we have called Qs) delivered to a l°-reservoir (entropy is 
not itself a heat, it is heat divided by a temperature, hence it is measured in joules 
per degree ). 

Now it is interesting that besides the pressure, which is a function of the 
temperature and the volume, and the internal energy, which is a function of 
temperature and volume, we have found another quantity which is a function of 
the condition, i.e., the entropy of the substance. Let us try to explain how we 
compute it, and what we mean when we call it a “function of the condition.” 
Consider the system in two different conditions, much as we had in the experiment 
where we did the adiabatic and isothermal expansions. (Incidentally, there is no 
need that a heat engine have only two reservoirs, it could have three or four differ- 
ent temperatures at which it takes in and delivers heats, and so on.) We can 
move around on a pV diagram all over the place, and go from one condition to 
another. In other words, we could say the gas is in a certain condition a , and then 
it goes over to some other condition, Z>, and we will require that this transition, 
made from a to b, be reversible. Now suppose that all along the path from a to b 
we have little reservoirs at different temperatures, so that the heat dQ removed 
from the substance at each little step is delivered to each reservoir at the tempera- 
ture corresponding to that point on the path. Then let us connect all these 
reservoirs, by reversible heat engines, to a single reservoir at the unit temperature. 
When we are finished carrying the substance from a to b, we shall bring all the 
reservoirs back to their original condition. Any heat dQ that has been absorbed 
from the substance at temperature T has now been converted by a reversible 
machine, and a certain amount of entropy dS has been delivered at the unit 
temperature as follows : 

dS = dQ/T ; (44.15) 

Let us compute the total amount of entropy which has been delivered. The 
entropy difference, or the entropy needed to go fron Ta to b by this particular 
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the gas until it is brought to a solid (or for helium, a liquid) at 0°, by integrating 
/ dQ/T. It can also be determined theoretically by means of Planck’s constant 
and quantum mechanics, but we shall not study it in this course.] 

Now we shall remark on some of the properties of the entropy of things. We 
first remember that if we go along a reversible cycle from a to b , then the entropy 
of the substance will change by — S a . And we remember that as we go along 
the path, the entropy — the heat delivered at unit temperature — increases accord- 
ing to the rule dS = dQ/T, where dQ is the heat we remove from the substance 
when its temperature is T . 

We already know that if we have a reversible cycle, the total entropy of 
everything is not changed, because the heat Q x absorbed at T x and the heat Q 2 
delivered at T 2 correspond to equal and opposite changes in entropy, so that the 
net change in the entropy is zero. So for a reversible cycle there is no change in 
the entropy of anything, including the reservoirs. This rule may look like the 
conservation of energy again, but it is not; it applies only to reversible cycles. 
If we include irreversible cycles there is no law of conservation of entropy. 

We shall give two examples. First, suppose that we do irreversible work on 
an object by friction, generating a heat Q on some object at temperature T. The 
entropy is increased by Q/T. The heat Q is equal to the work, and thus when 
we do a certain amount of work by friction against an object whose temperature 
is T, the entropy of the whole world increases by W/T. 

Another example of irreversibility is this: If we put together two objects 
that are at different temperatures, say T x and T 2 , a certain amount of heat will flow 
from one to the other by itself. Suppose, for instance, we put a hot stone in cold 
water. Then when a certain heat AQ is transferred from T x to T 2 , how much does 
the entropy of the hot stone change? It decreases by A Q/T x . How much does the 
water entropy change? It increases by A Q/T 2 . The heat will, of course, flow only 
from the higher temperature T x to the lower temperature T 2 , so that AQ is positive 
if r x is greater than T 2 . So the change in entropy of the whole world is positive, 
and it is the difference of the two fractions: 



AQ 

T x 


(44.19) 


So the following proposition is true: in any process that is irreversible, the 
entropy of the whole world is increased. Only in reversible processes does the 
entropy remain constant. Since no process is absolutely reversible, there is always 
at least a small gain in the entropy; a reversible process is an idealization in which 
we have made the gain of entropy minimal. 

Unfortunately, we are not going to enter into the field of thermodynamics 
very far. Our purpose is only to illustrate the principal ideas involved and the 
reasons why it is possible to make such arguments, but we will not use thermo- 
dynamics very much in this course. Thermodynamics is used very often by engi- 
neers and, particularly, by chemists. So we must learn our thermodynamics in 
practice in chemistry or engineering. Because it is not worth while duplicating 
everything, we shall just give some discussion of the origin of the theory, rather 
than much detail for special applications. 

The two laws of thermodynamics are often stated this way: 

First law: the energy of the universe is always constant. 

Second law: the entropy of the universe is always increasing. 

That is not a very good statement of the second law; it does not say, for example, 
that in a reversible cycle the entropy stays the same, and it does not say exactly 
what the entropy is. It is just a clever way of remembering the two laws, but it 
does not really tell us exactly where we stand. We have summarized the laws dis- 
cussed in this chapter in Table 44-1. In the next chapter we shall apply these laws 
to discover the relationship between the heat generated in the expansion of a 
rubber band, and the extra tension when it is heated. 
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Table 44-1 

Summary of the laws of thermodynamics 

First law: 

Heat put into a system + Work done on a system = Increase in internal energy of 
the system : 

dQ + dW - dU. 

Second law: 

A process whose only net result is to take heat from a reservoir and convert it to work 
is impossible. 

No heat engine taking heat Q\ from T i and delivering heat Q 2 at T 2 can do more work 
than a reversible engine, for which 

w = Qi - Q 2 = Qi ~ ra ) • 

The entropy of a system is defined this way: 

(a) If heat A Q is added reversibly to a system at temperature T, the increase in entropy 
of the system is AS = A Q/T. 

(b) At T = 0, S = 0 {third law). 

In a reversible change , the total entropy of all parts of the system (including reservoirs) 
does not change. 

In irreversible change , the total entropy of the system always increases. 
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45 


Illustrations of Thermodynamics 


45-1 Internal energy 

Thermodynamics is a rather difficult and complex subject when we come to 
apply it, and it is not appropriate for us to go very far into the applications in this 
course. The subject is of very great importance, of course, to engineers and chem- 
ists, and those who are interested in the subject can learn about the applications 
in physical chemistry or in engineering thermodynamics. There are also good 
reference books, such as Zemansky’s Heat and Thermodynamics , where one can 
learn more about the subject. In the Encyclopedia Britannica, fourteenth edition, 
one can find excellent articles on thermodynamics and thermochemistry, and in 
the article on chemistry, the sections on physical chemistry, vaporization, liquefi- 
cation of gases, and so on. 

The subject of thermodynamics is complicated because there are so many 
different ways of describing the same thing. If we wish to describe the behavior of 
a gas, we can say that the pressure depends on the temperature and on the volume, 
or we can say that the volume depends on the temperature and the pressure. 
Or with respect to the internal energy U, we might say that it depends on the 
temperature and volume, if those are the variables we have chosen — but we might 
also say that it depends on the temperature and the pressure, or the pressure and 
the volume, and so on. In the last chapter we discussed another function of 
temperature and volume, called the entropy S , and we can of course construct as 
many other functions of these variables as we like: U — TS is a function of tem- 
perature and volume. So we have a large number of different quantities which can 
be functions of many different combinations of variables. 

To keep the subject simple in this chapter, we shall decide at the start to use 
temperature and volume as the independent variables. Chemists use temperature 
and pressure, because they are easier to measure and control in chemical experi- 
ments, but we shall use temperature and volume throughout this chapter, except 
in one place where we shall see how to make the transformation into the chemists’ 
system of variables. 

We shall first, then, consider only one system of independent variables: 
temperature and volume. Secondly, we shall discuss only two dependent functions: 
the internal energy and the pressure. All the other functions can be derived from 
these, so it is not necessary to discuss them. With these limitations, thermo- 
dynamics is still a fairly difficult subject, but it is not quite so impossible! 

First we shall review some mathematics. If a quantity is a function of two 
variables, the idea of the derivative of the quantity requires a little more careful 
thought than for the case where there is only one variable. What do we mean by 
the derivative of the pressure with respect to the temperature? The pressure 
change accompanying a change in the temperature depends partly, of course, on 
what happens to the volume while T is changing. We must specify the change in V 
before the concept of a derivative with respect to T has a precise meaning. We 
might ask, for example, for the rate of change of P with respect to T if V is held 
constant. This ratio is just the ordinary derivative that we usually write as dP/dT . 
We customarily use a special symbol, dP/dT , to remind us that P depends on an- 
other variable V as well as on T % and that this other variable is held constant. We 
shall not only use the symbol d to call attention to the fact that the other variable 
is held constant, but we shall also write the variable that is held constant as a 
subscript, (dP/dT)y. Since we have only two independent variables, this notation 
is redundant, but it will help us keep our wits about us in the thermodynamic 
jungle of partial derivatives. 


45-1 Internal energy 

45-2 Applications 

45-3 The Clausius-Clapeyron 
equation 
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Let us suppose that the function /(x , y ) depends on the two independent vari- 
ables x and y. By (df/dx) y we mean simply the ordinary derivative, obtained in the 
usual way, if we treat y as a constant: 

AA = limit 

\dx/ y A*— >0 AX 

Similarly, we define 

= ii m i t f(x, y + Ay)-f(x,y) 

\dy/ x a^o Ay 

For example, if f(x,y) = x 2 + yx, then (df/dx) y = 2 x + y, and (df/dy) x = x. 
We can extend this idea to higher derivatives: d 2 f/dy 2 or d 2 f/dydx. The latter 
symbol indicates that we first differentiate / with respect to x, treating y as a con- 
stant, then differentiate the result with respect to y , treating x as a constant. The 
actual order of differentiation is immaterial: d 2 f/dxdy = d 2 f/dydx. 

We will need to compute the change A f in /(x , y) when x changes to x + Ax 
and y changes to y + Ay . We assume throughout the following that Ax and Ay 
are infinitesimally small: 


Af = f(x + Ax, y + Ay) - f(x, y) 

= fix + Ax, y + Ay) - f(x, y + Ay) + f(x, y + tiy) - f(x, y) 



The last equation is the fundamental relation that expresses Af in terms of Ax 
and Ay. 

As an example of the use of this relation, let us calculate the change in the 
internal energy U(T, V) when the temperature changes from T to T + AT and the 
volume changes from F to F + AV. Using Eq. (45.1), we write 



+ AF 



(45.2) 


In our last chapter we found another expression for the change A U in the internal 
energy when a quantity of heat A Q was added to the gas: 


AU = AQ ~ P AV. 


(45.3) 


In comparing Eqs. (45.2) and (45.3) one might at first be inclined to think that 
P = ( dU/dV)T , but this is not correct. To obtain the correct relation, let us first 
suppose that we add a quantity of heat AQ to the gas while keeping the volume 
constant, so that AF = 0. With AF = 0, Eq. (45.3) tells us that AU — AQ, 
and Eq. (45.2) tells us that AU - (dU/dT) v AT, so that (dU/dT) v - AQ/AT. 
The ratio AQ/AT, the amount of heat one must put into a substance in order to 
change its temperature by one degree with the volume held constant, is called the 
specific heat at constant volume and is designated by the symbol C v . By this argu- 
ment we have shown that 



(45.4) 


Now let us again add a quantity of heat AQ to the gas, but this time we will 
hold T constant and allow the volume to change by AF. The analysis in this case 
is more complex, but we can calculate AU by the argument of Carnot, making use 
of the Carnot cycle we introduced in the last chapter. 

The pressure-volume diagram for the Carnot cycle is shown in Fig. 45-1. 
As we have already shown, the total amount of work done by the gas in a reversible 
cycle is AQ(AT/T), where AQ is the amount of heat energy added to the gas as it 
expands isothermally at temperature T from volume F to F + AF, and T — AT 
is the final temperature reached by the gas as it expands adiabatically on the 
second leg of the cycle. Now we will show that this work done is also given by 
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the shaded area in Fig. 45-1. In any circumstances, the work done by the gas is 
\P dV, and is positive when the gas expands and negative when the gas is com- 
pressed. If we plot P vs. V, the variation of P and V is represented by a curve which 
gives the value of P corresponding to a particular value of V. As the volume 
changes from one value to another, the work done by the gas, the integral jP dV, 
is the area under the curve connecting the initial and final values of V. When we 
apply this idea to the Carnot cycle, we see that as we go around the cycle, paying 
attention to the sign of the work done by the gas, the net work done by the gas 
is just the shaded area in Fig. 45-1. 

Now we want to evaluate the shaded area geometrically. The cycle we have 
used in Fig. 45-1 differs from that used in the previous chapter in that we now 
suppose that AT and A Q are infinitesimally small. We are working between adia- 
batic lines and isothermal lines that are very close together, and the figure de- 
scribed by the heavy lines in Fig. 45-1 will approach a parallelogram as the in- 
crements AT and A Q approach zero. The area of this parallelogram is just AV A P, 
where AV is the change in volume as energy A Q is added to the gas at constant 
temperature, and A P is the change in pressure as the temperature changes by AT 
at constant volume. One can easily show that the shaded area in Fig. 45-1 is 
given by AV A P by recognizing that the shaded area is equal to the area enclosed 
by the dotted lines in Fig. 45-2, which in turn differs from the rectangle bounded 
by A P and AV only by the addition and subtraction of the equal triangular areas 
in Fig. 45-2. 

Now let us summarize the results of the arguments we have developed so far: 


or 


Work done by the gas = shaded area = AV AP = AQ 



— • (heat needed to change Fby AF) conB tantr 
T > 

= AV • (change in P when T changes by AT) con8tant 7 

— • (heat needed to change Vby AV) T = T{dP/dT ) v . 
AV 


(45.5) 


Equation (45.5) expresses the essential result of Carnot’s argument. The whole of 
thermodynamics can be deduced from Eq. (45.5) and the First Law, which is 
stated in Eq. (45.3). Equation (45.5) is essentially the Second Law, although it 
was originally deduced by Carnot in a slightly different form, since he did not 
use our definition of temperature. 

Now we can proceed to calculate (dU/dV)T> By how much would the internal 
energy U change if we changed the volume by AVI First, U changes because heat 
is put in, and second, U changes because work is done. The heat put in is 

A °- T {w)v iV ’ 


according to Eq. (45.5), and the work done on the substance is — P AV. There- 
fore the change AU in internal energy has two pieces: 


AU 



AV - P AV. 


(45.6) 


Dividing both sides by AV, we find for the rate of change of U with V at constant T 



(45.7) 


In our thermodynamics, in which T and V are the only variables and P and U are 
the only functions, Eqs. (45.3) and (45.7) are the basic equations from which all 
the result of the subject can be deduced. 



Fig. 45-1. Pressure-volume diagram 
for a Carnot cycle. The curves marked 
I and T — AT are isothermal lines; the 
steeper curves are adiabatic lines. AV 
is the volume change as heat AQ is 
added to the gas at constant temperature 
T. AP is the pressure change at constant 
volume as the gas temperature is changed 
from T to T — AT. 



Fig. 45-2. Shaded area = area 
enclosed by dashed lines = area of 
rectangle = AP AV. 
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45-2 Applications 

Now let us discuss the meaning of Eq. (45.7) and see why it answers the 
questions which we proposed in our last chapter. We considered the following 
problem : in kinetic theory it is obvious that an increase in temperature leads to 
an increase in pressure, because of the bombardments of the atoms on a piston. 
For the same physical reason, when we let the piston move back, heat is taken 
out of the gas and, in order to keep the temperature constant, heat will have to be 
put back in. The gas cools when it expands, and the pressure rises when it is heated. 
There must be some connection between these two phenomena, and this con- 
nection is given explicitly in Eq. (45.7). If we hold the volume fixed and increase 
the temperature, the pressure rises at a rate (3P/dT) v . Related to that fact is this: 
if we increase the volume, the gas will cool unless we pour some heat in to main- 
tain the temperature constant, and {dU/dV) T tells us the amount of heat needed 
to maintain the temperature. Equation (45.7) expresses the fundamental inter- 
relationship between these two effects. That is what we promised we would find 
when we came to the laws of thermodynamics. Without knowing the internal 
mechanism of the gas, and knowing only that we cannot make perpetual motion 
of the second type, we can deduce the relationship between the amount of heat 
needed to maintain a constant temperature when the gas expands, and the pressure 
change when the gas is heated! 

Now that we have the result we wanted for a gas, let us consider the rubber 
band. When we stretch a rubber band, we find that its temperature falls, and when 
we heat a rubber band, we find that it pulls itself in. What is the equation that 
gives the same relation for a rubber band as Eq. (45.3) gives for gas? For a 
rubber band the situation will be something like this: when heat A Q is put in, the 
internal energy is changed by A/7 and some work is done. The only difference will 
be that the work done by the rubber band is —FAL instead of P AV, where F 
is the force on the band, and L is the length of the band. The force F is a function 
of temperature and of length of the band. Replacing PAV in Eq. (45.3) by — F A L, 
we get 

AU = AQ + FAL (45.8) 

Comparing Eqs. (45.3) and (45.8), we see that the rubber band equation is ob- 
tained by a mere substitution of one letter for another. Furthermore, if we sub- 
stitute L for V , and — F for P, all of our discussion of the Carnot cycle applies to 
the rubber band. We can immediately deduce, for instance, that the heat AQ 
needed to change the length by A L is given by the analog to Eq. (45.5): AQ = 
— T(dF/dT) L AL. This equation tells us that if we keep the length of a rubber 
band fixed and heat the band, we can calculate how much the force will increase 
in terms of the heat needed to keep the temperature constant when the band is 
stretched a little bit. So we see that the same equation applies to both gas and a 
rubber band. In fact, if one can write AU = AQ + AAB , where A and B repre- 
sent different quantities, force and length, pressure and volume, etc., one can apply 
the results obtained for a gas by substituting A and B for P and V . For example, 
consider the electric potential difference, or “voltage,” E in a battery and the 
charge A Z that moves through the battery. We know that the work done in a 
reversible electric cell, like a storage battery, is EAZ. (Since we include no PAV 
term in the work, we require that our battery maintain a constant volume.) Let 
us see what thermodynamics can tell us about the performance of a battery. If 
we substitute E for P and Z for V in Eq. (45.6), we obtain 



Equation (45.9) says that the internal energy U is changed when a charge A Z moves 
through the cell. Why is AU/AZ not simply the voltage E of the battery? The 
answer is that a real battery gets warm when charge moves through the cell. 
The internal energy of the battery is changed, first, because the battery did some 
work on the outside circuit, and second, because the battery is heated. The re- 
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markable thing is that the second part can again be expressed in terms of the 
way in which the battery voltage changes with temperature. Incidentally, when 
the charge moves through the cell, chemical reactions occur, and Eq. (45.9) sug- 
gests a nifty way of measuring the amount of energy required to produce a chemical 
reaction. All we need do is construct a cell that works on the reaction, measure 
the voltage, and measure how much the voltage changes with temperature when 
we draw no charge from the battery! 

Now we have assumed that the volume of the battery can be maintained con- 
stant, since we have omitted the PAF term when we set the work done by the 
battery equal to £AZ. It turns out that it is technically quite difficult to keep the 
volume constant. It is much easier to keep the cell at constant atmospheric 
pressure. For that reason, the chemists do not like any of the equations we have 
written above: they prefer equations which describe performance under constant 
pressure . We chose at the beginning of this chapter to use F and T as independent 
variables. The chemists prefer P and P, and we will now consider how the results 
we have obtained so far can be transformed into the chemists’ system of variables. 
Remember that in the following treatment confusion can easily set in because we 
are shifting gears from T and F to T and P. 

We started in Eq. (45.3) with A U = A Q — PAF; PAF may be replaced by 
EAZ or AAB. If we could somehow replace the last term, PAF, by VA P, then 
we would have interchanged F and P, and the chemists would be happy. Well, 
a clever man noticed that the differential of the product PV is d(PV) = PdV + 
V dP , and if he added this equality to Eq. (45.3), he obtained 

A(PF) = PAF + VA P 
A U = AQ - PAF 
A (U + PV) - A Q + VA P 

In order that the result look like Eq. (45.3), we define U + PV to be something 
new, called the enthalpy , H , and we write AH = AQ + FAP. 

Now we are ready to transform our results into chemists’ language with 
the following rules: £/ — > PT, P — > — F, V — > P. For example, the fundamental 
relationship that chemists would use instead of Eq. (45.7) is 



It should now be clear how one transforms to the chemists’ variables T and P. 
We now go back to our original variables: for the remainder of this chapter, T 
and V are the independent variables. 

Now let us apply the results we have obtained to a number of physical situa- 
tions. Consider first the ideal gas. From kinetic theory we know that the internal 
energy of a gas depends only on the motion of the molecules and the number of 
molecules. The internal energy depends on P, but not on V. If we change V, but 
keep T constant, U is not changed. Therefore (dU/dV) T = 0, and Eq. (45.7) tells 
us that for an ideal gas 

-P- 0. (45.10) 

Equation (45.10) is a differential equation that can tell us something about P. 
We take account of the partial derivatives in the following way: Since the partial 
derivative is at constant V , we will replace the partial derivative by an ordinary 
derivative and write explicitly, to remind us, “constant F.” Equation (45.10) then 
becomes 

- P = 0; const V, (45.11) 

which we can integrate to get 

InP = InP + const; const F, 

P = const X P; const F. (45.12) 
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We know that for an ideal gas the pressure is equal to 



Fig. 45-3. Isothermal lines for a 
condensable vapor compressed in a 
cylinder. At the left, the substance is in 
the liquid phase. At the right, the sub- 
stance is vaporized. In the center, both 
liquid and vapor are present in the 
cylinder. 



Fig. 45-4. Pressure-volume diagram 
for a Carnot cycle with a condensable 
vapor in the cylinder. At the left, the 
substance is in the liquid state. A 
quantity of heat L is added at tempera- 
ture T to vaporize the liquid. The vapor 
expands adiabatically as T changes to 

r - at. 



which is consistent with (45.12), since V and R are constants. Why did we bother 
to go through this calculation if we already knew the results? Because we have 
been using two independent definitions of temperature l At one stage we assumed 
that the kinetic energy of the molecules was proportional to the temperature, an 
assumption that defines one scale of temperature which we will call the ideal gas 
scale. The T in Eq. (45.13) is based on the gas scale. We also call temperatures 
measured on the gas scale kinetic temperatures. Later, we defined the temperature 
in a second way which was completely independent of any substance. From argu- 
ments based on the Second Law we defined what we might call the “grand thermo- 
dynamic absolute temperature” T, the T that appears in Eq. (45.12). What we 
proved here is that the pressure of an ideal gas (defined as one for which the 
internal energy does not depend on the volume) is proportional to the grand 
thermodynamic absolute temperature. We also know that the pressure is pro- 
portional to the temperature measured on the gas scale. Therefore we can deduce 
that the kinetic temperature is proportional to the “grand thermodynamic ab- 
solute temperature. That means, of course, that if we were sensible we could 
make two scales agree. In this instance, at least, the two scales have been chosen 
so that they coincide; the proportionality constant has been chosen to be 1. Most 
of the time man chooses trouble for himself, but in this case he made them equal! 


45-3 The Clausius-CIapeyron equation 

The vaporization of a liquid is another application of the results we have 
derived. Suppose we have some liquid in a cylinder, such that we can compress it 
by pushing on the piston, and we ask ourselves, “If we keep the temperature 
constant, how does the pressure vary with volume?” In other words, we want to 
draw an isothermal line on the P-V diagram. The substance in the cylinder is not 
the ideal gas that we considered earlier; now it may be in the liquid or the vapor 
phase, or both may be present. If we apply sufficient pressure, the substance 
will condense to a liquid. Now if we squeeze still harder, the volume changes 
very little, and our isothermal line rises rapidly with decreasing volume, as shown 
at the left in Fig. 45-3. 

If we increase the volume by pulling the piston out, the pressure drops until 
we reach the point at which the liquid starts to boil, and then vapor starts to form. 
If we pull the piston out farther, all that happens is that more liquid vaporizes. 
When there is part liquid and part vapor in the cylinder, the two phases are in 
equilibrium liquid is evaporating and vapor is condensing at the same rate. If 
we make more room for the vapor, more vapor is needed to maintain the pressure, 
so a little more liquid evaporates, but the pressure remains constant. On the 
flat part of the curve in Fig. 45-3 the pressure does not change, and the value of 
the pressure here is called the vapor pressure at temperature T. As we continue to 
increase the volume, there comes a time when there is no more liquid to evaporate. 
At this juncture, if we expand the volume further, the pressure will fall as for an 
ordinary gas, as shown at the right of the P-V diagram. The lower curve in Fig. 
45-3 is the isothermal line at a slightly lower temperature T — AT. The pressure 
in the liquid phase is slightly reduced because liquid expands with an increase in 
temperature (for most substances, but not for water near the freezing point) 
and, of course, the vapor pressure is lower at the lower temperature. 

We will now make a cycle out of the two isothermal lines by connecting them 
(say by adiabatic lines) at the ends of the flat sections, as shown in Fig. 45-4. 
The little jiggle in the lower right-hand corner of the figure will make little differ- 
ence and we will neglect it. We are going to use the argument of Carnot, which 
tells us that the heat added to the substance in changing it from a liquid to a vapor 
is related to the work done by the substance as it goes around the cycle. Let 
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us call L the heat needed to vaporize the substance in the cylinder. As in the 
argument immediately preceding Eq. (45.5), we know that L(AT/T) = work 
done by the substance. As before, the work done by the substance is the shaded 
area, which is approximately A P(V G — V L ), where A P is the difference in vapor 
pressure at the two temperatures T and T — AT, V G is the volume of the gas, and 
V L is the volume of the liquid, both volumes measured at the vapor pressure. 
Setting these two expressions for the area equal, we get LAT/T = AP(V G — V L ), 
or 

<4514) 


Equation (45.14) gives the relationship between the rate of change of vapor pres- 
sure with temperature and the amount of heat required to evaporate the liquid. 
This relationship was deduced by Carnot, but it is called the Clausius-Clapeyron 
equation. 

Now let us compare Eq. (45.14) with the results deduced from kinetic theory. 
Usually V G is very much larger than V L . So V G - V L « V G = RT/P per mole. 
If we further assume that L is a constant, independent of temperature — not a very 
good approximation — then we would have 6P/dT = L/(RT 2 P). The solution 
of this differential equation is 

P = const e~ LIIiT . (45.15) 


Let us compare this with the pressure variation with temperature that we deduced 
earlier from kinetic theory. Kinetic theory indicated the possibility, at least 
roughly, that the number of molecules of vapor above a liquid would be 


n 



-(Uo-U L )IRT 


(45.16) 


where U G — U L is the internal energy per mole in the liquid minus the internal 
energy per mole in the gas, i.e., the energy needed to vaporize a mole of liquid. 
Equation (45.15) from thermodynamics and Eq. (45.16) from kinetic theory are 
very closely related because the pressure is nkT , but they are not exactly the same. 
However, they will turn out to be exactly the same if we assume L — U G = const, 
instead of L — const. If we assume L — U G — const, independent of tempera- 
ture, then the argument leading to Eq. (45.15) will produce Eq. (45.16). 

This comparison shows the advantages and disadvantages of thermodynamics 
over kinetic theory: First of all, Eq. (45.14) obtained by thermodynamics is exact, 
while Eq. (45.16) can only be approximated, for instance, if U is nearly constant, 
and if the model is right. Second, we may not understand correctly how the gas 
goes into the liquid; nevertheless, Eq. (45.14) is right, while (45.16) is only approxi- 
mate. Third, although our treatment applies to a gas condensing into a liquid, the 
argument is true for any other change of state. For instance, the solid-to-liquid 
transition has the same kind of curve as that shown in Figs. 45-3 and 45-4. Intro- 
ducing the latent heat for melting, M/mole, the formula analogous to Eq. (45.14) 
then is (dP mcH /dT) v = M/[T(V Uq - K so iid)]. Although we may not understand 
the kinetic theory of the melting process, we nevertheless have a correct equation. 
However, when we can understand the kinetic theory, we have another advantage. 
Equation (45.14) is only a differential relationship, and we have no way of obtain- 
ing the constants of integration. In the kinetic theory we can obtain the constants 
also if we have a good model that describes the phenomenon completely. So there 
are advantages and disadvantages to each. When knowledge is weak and the situa- 
tion is complicated, thermodynamic relations are really the most powerful. When 
the situation is very simple and a theoretical analysis can be made, then it is better 
to try to get more information from theoretical analysis. 

One more example : blackbody radiation. We have discussed a box containing 
radiation and nothing else. We have talked about the equilibrium between the 
oscillator and the radiation. We also found that the photons hitting the wall of 
the box would exert the pressure P , and we found PV = U/ 3, where U is the 
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total energy of all the photons and V is the volume of the box. If we substitute 
U — 3PV in the basic Eq. (45.7), we find 


(IfX - - t (£\ 


(45.17) 


Since the volume of our box is constant, we can replace (dP/dT) v by dP/dT to 
obtain an ordinary differential equation we can integrate: In P = 4 In T + const, 
or P = const X T 4 . The pressure of radiation varies as the fourth power of the 
temperature, and the energy content of the radiation, U/V = P/3, also varies as 
r 4 . It is usual to write U/V = (Aa/c)T 4 , where c is the speed of light and <r is a 
constant. It is not possible to get a from thermodynamics alone. Here is a good 
example of its power, and its limitations. To know that U/V goes as T* is a great 
deal, but to know how big U/ V actually is at any temperature requires that we go 
into the kind of detail that only a complete theory can supply. For blackbody 
radiation we have such a theory and we can derive an expression for the constant 
a in the following manner. 

Let 7(w) du be the intensity distribution, the energy flow through 1 m 2 in one 
second with frequency between co and co + du>. The energy density distribution = 
energy /volume = /(to) du/c is 

p = total energy density 

roo 

= I energy density co and co + du 

J w=0 

r /(to) da 

J o c 

From our earlier discussions, we know that 


/(co) = 


Tr 2 c\e^ jkT - 1 ) 


Substituting this expression for /(co) in our equation for U/ V, we get 


£/ 1 f hco 3 do) 

V ~ ^3 J Q ^IkT _ j * 

If we substitute x = hw/kT, the expression becomes 

U = 0 kTf r X s dx 
V ^ a 7T 2 c 3 Jo e x — 1 

This integral is just some number that we can get, approximately, by drawing a 
curve and taking the area by counting squares. It is roughly 6.5. The mathemati- 
cians among us can show that the integral is exactly 7r 4 /15.* Comparing this 
expression with U/V = (4cr/c)r 4 , we find 


60ft 3 c 2 


= 5.6 7 X 10- 


watts 

(meter) 2 (degree) 4 


* Since (e* — l)~ x = e 


+ . . . , the integral is 


® ut Jo e-™ dx — l//j, and differentiating with respect to n three times gives J” x 3 e~ nx 
dx ~ 6//i 4 , so the integral is 6(1 + -^ + ^- + . . .) and a good estimate comes from 
adding the first few terms. In Chapter 50 we will find a way to show that the sum of the 
reciprocal fourth powers of the integers is, in fact, 7 t 4 / 90. 
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If we make a small hole in our box, how much energy will flow per second 
through the hole of unit area? To go from energy density to energy flow, we multi- 
ply the energy density U/V by c. We also multiply by which arises as follows: 
first, a factor of because only the energy which is flowing out escapes; and second, 
another factor £, because energy which approaches the hole at an angle to the 
normal is less effective in getting through the hole by a cosine factor. The average 
value of the cosine is It is clear now why we write U/V = (4 cr/c)r 4 : so that 
we can ultimately say that the flux from a small hole is 0 T 4 per unit area. 
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Ratchet and pawl 


46-1 How a ratchet works 


In this chapter we discuss the ratchet and pawl, a very simple device which 
allows a shaft to turn only one way. The possibility of having something turn 
only one way requires some detailed and careful analysis, and there are some very 
interesting consequences. 

The plan of the discussion came about in attempting to devise an elementary 
explanation, from the molecular or kinetic point of view, for the fact that there is 
a maximum amount of work which can be extracted from a heat engine. Of 
course we have seen the essence of Carnot’s argument, but it would be nice to find 
an explanation which is elementary in the sense that we can see what is happening 
physically. Now, there are complicated mathematical demonstrations which 
follow from Newton’s laws to demonstrate that we can get only a certain amount 
of work out when heat flows from one place to another, but there is great difficulty 
in converting this into an elementary demonstration. In short, we do not under- 
stand it, although we can follow the mathematics. 

In Carnot’s argument, the fact that more than a certain amount of work 
cannot be extracted in going from one temperature to another is deduced from 
another axiom, which is that if everything is at the same temperature, heat cannot 
be converted to work by means of a cyclic process. First, let us back up and try 
to see, in at least one elementary example, why this simpler statement is true. 

Let us try to invent a device which will violate the Second Law of Thermo- 
dynamics, that is, a gadget which will generate work from a heat reservoir with 
everything at the same temperature. Let us say we have a box of gas at a certain 
temperature, and inside there is an axle with vanes in it. (See Fig. 46-1 but take 
T x — T 2 — T, say.) Because of the bombardments of gas molecules on the vane, 
the vane oscillates and jiggles. All we have to do is to hook onto the other end of 
the axle a wheel which can turn only one way — the ratchet and pawl. Then when 
the shaft tries to jiggle one way, it will not turn, and when it jiggles the other, it 
will turn. Then the wheel will slowly turn, and perhaps we might even tie a flea 
onto a string hanging from a drum on the shaft, and lift the flea! Now let us ask 
if this is possible. According to Carnot’s hypothesis, it is impossible. But if we 
just look at it, we see, prima facie , that it seems quite possible. So we must look 
more closely. Indeed, if we look at the ratchet and pawl, we see a number of 
complications. 

First, our idealized ratchet is as simple as possible, but even so, there is a pawl, 
and there must be a spring in the pawl. The pawl must return after coming off a 
tooth, so the spring is necessary. 

Another feature of this ratchet and pawl, not shown in the figure, is quite 
essential. Suppose the device were made of perfectly elastic parts. After the pawl 
is lifted off the end of the tooth and is pushed back by the spring, it will bounce 
against the wheel and continue to bounce. Then, when another fluctuation came, 
the wheel could turn the other way, because the tooth could get underneath during 
the moment when the pawl was up! Therefore an essential part of the irreversibility 
of our wheel is a damping or deadening mechanism which stops the bouncing. 
When the damping happens, of course, the energy that was in the pawl goes into 
the wheel and shows up as heat. So, as it turns, the wheel will get hotter and hotter. 
To make the thing simpler, we can put a gas around the wheel to take up some of 
the heat. Anyway, let us say the gas keeps rising in temperature, along with the 
wheel. Will it go on forever? No! The pawl and wheel, both at some temperature 
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r, also have Brownian motion. This motion is such that, every once in a while, by 
accident, the pawl lifts itself up and over a tooth just at the moment when the 
Brownian motion on the vanes is trying to turn the axle backwards. And as 
things get hotter, this happens more often. 

So, this is the reason this device does not work in perpetual motion. When 
the vanes get kicked, sometimes the pawl lifts up and goes over the end. But some- 
times, when it tries to turn the other way, the pawl has already lifted due to the 
fluctuations of the motions on the wheel side, and the wheel goes back the other 
way! The net result is nothing. It is not hard to demonstrate that when the 
temperature on both sides is equal, there will be no net average motion of the 
wheel. Of course the wheel will do a lot of jiggling this way and that way, but it 
will not do what we would like, which is to turn just one way. 

Let us look at the reason. It is necessary to do work against the spring in 
order to lift the pawl to the top of a tooth. Let us call this energy e, and let B be 
the angle between the teeth. The chance that the system can accumulate enough 
energy, e, to get the pawl over the top of the tooth, is e~* lkT . But the probability 
that the pawl will accidentally be up is also e~ €lkT . So the number of times that 
the pawl is up and the wheel can turn backwards freely is equal to the number 
of times that we have enough energy to turn it forward when the pawl is down. 
We thus get a “balance,” and the wheel will not go around. 

46-2 The ratchet as an engine 

Let us now go further. Take the example where the temperature of the 
vanes is T x and the temperature of the wheel, or ratchet, is T 2t and T 2 is less than 
7\. Because the wheel is cold and the fluctuations of the pawl are relatively in- 
frequent, it will be very hard for the pawl to attain an energy e. Because of the 
high temperature 7^, the vanes will often attain the energy e, so our gadget will 
go in one direction, as designed. 

We would now like to see if it can lift weights. Onto the drum in the middle 
we tie a string, and put a weight, such as our flea, on the string. We let L be the 
torque due to the weight. If L is not too great, our machine will lift the weight 
because the Brownian fluctuations make it more likely to move in one direction 
than the other. We want to find how much weight it can lift, how fast it goes around, 
and so on. 

First we consider a forward motion, the usual way one designs a ratchet to 
run. In order to make one step forward, how much energy has to be borrowed 
from the vane end? We must borrow an energy e to lift the pawl. The wheel turns 
through an angle 8 against a torque L, so we also need the energy L8. The total 
amount of energy that we have to borrow is thus e + L8. The probability that 
we get this energy is proportional to Actually, it is not only a ques- 

tion of getting the energy, but we also would like to know the number of times 
per second it has this energy. The probability per second is proportional to 
e -(€+Le)}kT ^ an( j we ca ji t ^ e proportionality constant 1/r. It will cancel 
out in the end anyway. When a forward step happens, the work done on the 
weight is LB. The energy taken from the vane is e + L6. The spring gets wound 
up with energy e, then it goes clatter, clatter, bang, and this energy goes into heat. 
All the energy taken out goes to lift the weight and to drive the pawl, which then 
falls back and gives heat to the other side. 

Now we look at the opposite case, which is backward motion. What happens 
here? To get the wheel to go backwards all we have to do is supply the energy to 
lift the pawl high enough so that the ratchet will slip. This is still energy e. Our 
probability per second for the pawl to lift this high is now ( \/r)e~ elkT 2 . Our 
proportionality constant is the same, but this time kT 2 shows up because of the 
different temperature. When this happens, the work is released because the wheel 
slips backward. It loses one notch, so it releases work LB. The energy taken from 
the ratchet system is e, and the energy given to the gas at Ti on the vane side is 
LB + e. It takes a little thinking to see the reason for that. Suppose the pawl has 
lifted itself up accidentally by a fluctuation. Then when it falls back and the spring 
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Table 46-1 


Summary of operation of ratchet and pawl. 


Forward: Need energy 

e T LB 

from vane. 

.'.Rate — - e ~^+€)/kT l 
r 

Takes from vane 

LB + e 



Does work 

LB 



Gives to ratchet 

e 



Backward: Needs energy 

€ 

for pawl. . 

Rate = - e ~ i/kT 2 

T 

Takes from ratchet 

e 

\ 


Releases work 

LB 

> same as 

above with sign reversed. 

Gives to vane 

LB *T 6 

J 


If system is reversible, rates are equal, hence 6 ^ ■ 

T 1 

e 

= n' 

Heat to ratchet e 

Heat from vane LB + e 


Hence — ■ 
Qi 

_ T 2 

Ti 


pushes it down against the tooth, there is a force trying to turn the wheel, because 
the tooth is pushing on an inclined plane. This force is doing work, and so is the 
force due to the weights. So both together make up the total force, and all the 
energy which is slowly released appears at the vane end as heat. (Of course it 
must, by conservation of energy, but one must be careful to think the thing 
through!) We notice that all these energies are exactly the same, but reversed. 
So, depending upon which of these two rates is greater, the weight is either slowly 
lifted or slowly released. Of course, it is constantly jiggling around, going up for a 
while and down for a while, but we are talking about the average behavior. 

Suppose that for a particular weight the rates happen to be equal. Then we 
add an infinitesimal weight to the string. The weight will slowly go down, and 
work will be done on the machine. Energy will be taken from the wheel and given 
to the vanes. If instead we take off a little bit of weight, then the imbalance is 
the other way. The weight is lifted, and heat is taken from the vane and put into 
the wheel. So we have the conditions of Carnot’s reversible cycle, provided that 
the weight is just such that these two are equal. This condition is evidently that 
(e + LB)/T X — e/7Y Let us say that the machine is slowly lifting the weight. 
Energy Q x is taken from the vanes and energy Q 2 is delivered to the wheel, and 
these energies are in the ratio (e + LB)/e. If we are lowering the weight, we also 
have Q\!Qi ~ (6 + LB)/e. Thus (Table 46-1) we have 

G1/G2 = t x /t*. 

Furthermore, the work we get out is to the energy taken from the vane as LB 
is to LB + e, hence as (T x — T 2 )/T x . We see that our device cannot extract 
more work than this, operating reversibly. This is the result that we expected from 
Carnot’s argument, and the main result of this lecture. However, we can use our 
device to understand a few other phenomena, even out of equilibrium, and there- 
fore beyond the range of thermodynamics. 

Let us now calculate how fast our one-way device would turn if everything were 
at the same temperature and we hung a weight on the drum. If we pull very, very 
hard, of course, there are all kinds of complications. The pawl slips over the 
ratchet, or the spring breaks, or something. But suppose we pull gently enough 
that everything works nicely. In those circumstances, the above analysis is right 
for the probability of the wheel going forward and backward, if we remember 
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that the two temperatures are equal. In each step an angle 6 is obtained, so the 
angular velocity is d times the probability of one of these jumps per second. It 
goes forward with probability (l/r)e- ((+L,)lkT and backward with probability 
(l/r)e elkT , so that for the angular velocity we have 

co = (e/r)e- (t+mikT - e~ elkT 

= (8/T)e- ilkT ( e - LelkT - 1). (46.1) 

If we plot co against L, we get the curve shown in Fig. 46-2. We see that it makes a 
great difference whether L is positive or negative. If L increases in the positive 
range, which happens when we try to drive the wheel backward, the backward 
velocity approaches a constant. As L becomes negative, co really “takes off” 
forward, since e to a tremendous power is very great! 

The angular velocity that was obtained from different forces is thus very un- 
symmetrical. Going one way it is easy: we get a lot of angular velocity for a little 
force. Going the other way, we can put on a lot of force, and yet the wheel hardly 
goes around. 

Fig. 46-2. Angular velocity of the We find the same thing in an electrical rectifier. Instead of the force, we have 

ratchet as a function of torque. the electric field, and instead of the angular velocity, we have the electric current. 

In the case of a rectifier, the voltage is not proportional to resistance, and the 
situation is unsymmetrical. The same analysis that we made for the mechanical 
rectifier will also work for an electrical rectifier. In fact, the kind of formula we 
obtained above is typical of the current-carrying capacities of rectifiers as a func- 
tion of their voltages. 

Now let us take all the weights away, and look at the original machine. If 
T% were less than T ,, the ratchet would go forward, as anybody would believe. 
But what is hard to believe, at first sight, is the opposite. If T 2 is greater than T u 
the ratchet goes around the opposite way! A dynamic ratchet with lots of heat 
in it runs itself backwards, because the ratchet pawl is bouncing. If the pawl, for 
a moment, is on the incline somewhere, it pushes the inclined plane sideways. 
But it is always pushing on an inclined plane, because if it happens to lift up high 
enough to get past the point of a tooth, then the inclined plane slides by, and it 
comes down again on an inclined plane. So a hot ratchet and pawl is ideally 
built to go around in a direction exactly opposite to that for which it was originally 
designed! 

In spite of all our cleverness of lopsided design, if the two temperatures are 
exactly equal there is no more propensity to turn one way than the other. The 
moment we look at it, it may be turning one way or the other, but in the long run, 
it gets nowhere. The fact that it gets nowhere is really the fundamental deep 
principle on which all of thermodynamics is based. 



46-3 Reversibility in mechanics 

What deeper mechanical principle tells us that, in the long run, if the tempera- 
ture is kept the same everywhere, our gadget will turn neither to the right nor to 
the left? We evidently have a fundamental proposition that there is no way to 
design a machine which, left to itself, will be more likely to be turning one way 
than the other after a long enough time. We must try to see how this follows from 
the laws of mechanics. 

The laws of mechanics go something like this: the mass times the acceleration 
is the force, and the force on each particle is some complicated function of the 
positions of all the other particles. There are other situations in which forces 
depend on velocity, such as in magnetism, but let us not consider that now. We 
take a simpler case, such as gravity, where forces depend only on position. Now 
suppose that we have solved our set of equations and we have a certain motion 
x(t ) for each particle. In a complicated enough system, the solutions are very 
complicated, and what happens with time turns out to be very surprising. If we 
write down any arrangement we please for the particles, we will see this arrange- 
ment actually occur if we wait long enough! If we follow our solution for a long 
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enough time, it tries everything that it can do, so to speak. This is not absolutely 
necessary in the simplest devices, but when systems get complicated enough, with 
enough atoms, it happens. Now there is something else the solution can do. If 
we solve the equations of motion, we may get certain functions such as 
t + t 2 + r 1 . We claim that another solution would be — t + t 2 — / 3 . In other 
words, if we substitute — t everywhere for t throughout the entire solution, we will 
once again get a solution of the same equation. This follows from the fact that 
if we substitute — t for / in the original differential equation, nothing is changed, 
since only second derivatives with respect to t appear. This means that if we have 
a certain motion, then the exact opposite motion is also possible. In the complete 
confusion which comes if we wait long enough, it finds itself going one way some- 
times, and it finds itself going the other way sometimes. There is nothing more 
beautiful about one of the motions than about the other. So it is impossible to 
design a machine which, in the long run, is more likely to be going one way than 
the other, if the machine is sufficiently complicated. 

One might think up an example for which this is obviously untrue. If we take 
a wheel, for instance, and spin it in empty space, it will go the same way forever. 
So there are some conditions, like the conservation of angular momentum, which 
violate the above argument. This just requires that the argument be made with a 
little more care. Perhaps the walls take up the angular momentum, or something 
like that, so that we have no special conservation laws. Then, if the system is 
complicated enough, the argument is true. It is based on the fact that the laws of 
mechanics are reversible. 

For historical interest, we would like to remark on a device invented by 
Maxwell, who first worked out the dynamical theory of gases. He supposed the 
following situation: We have two boxes of gas at the same temperature, with a 
little hole between them. At the hole sits a little demon (who may be a machine 
of course!). There is a door on the hole, which can be opened or closed by the 
demon. He watches the molecules coming from the left. Whenever he sees a fast 
molecule, he opens the door. When he sees a slow one, he leaves it closed. If we 
want him to be an extra special demon, he can have eyes at the back of his head, 
and do the opposite to the molecules from the other side. He lets the slow ones 
through to the left, and the fast through to the right. Pretty soon the left side will 
get cold and the right side hot. Then, are the ideas of thermodynamics violated 
because we could have such a demon? 

It turns out, if we build a finite-sized demon, that the demon himself gets so 
warm that he cannot see very well after a while. The simplest possible demon, as 
an example, would be a trap door held over the hole by a spring. A fast molecule 
comes through, because it is able to lift the trap door. The slow molecule cannot 
get through, and bounces back. But this thing is nothing but our ratchet and pawl 
in another form, and ultimately the mechanism will heat up. If we assume that 
the specific heat of the demon is not infinite, it must heat up. It has but a finite 
number of internal gears and wheels, so it cannot get rid of the extra heat that it 
gets from observing the molecules. Soon it is shaking from Brownian motion so 
much that it cannot tell whether it is coming or going, much less whether the 
molecules are coming or going, so it does not work. 

46-4 Irreversibility 

Are all the laws of physics reversible? Evidently not! Just try to unscramble 
an egg! Run a moving picture backwards, and it takes only a few minutes for 
everybody to start laughing. The most natural characteristic of all phenomena 
is their obvious irreversibility. 

Where does irreversibility come from? It does not come from Newton’s laws. 
If we claim that the behavior of everything is ultimately to be understood in terms 
of the laws of physics, and if it also turns out that all the equations have the 
fantastic property that if we put t = —t we have another solution, then every 
phenomenon is reversible. How then does it come about in nature on a large 
scale that things are not reversible? Obviously there must be some law, some 
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obscure but fundamental equation, perhaps in electricity, maybe in neutrino 
physics, in which it does matter which way time goes. 

Let us discuss that question now. We already know one of those laws, which 
says that the entropy is always increasing. If we have a hot thing and a cold 
thing, the heat goes from hot to cold. So the law of entropy is one such law. But 
we expect to understand the law of entropy from the point of view of mechanics. 
In fact, we have just been successful in obtaining all the consequences of the argu- 
ment that heat cannot flow backwards by itself from just mechanical arguments, 
and we thereby obtained an understanding of the Second Law. Apparently we 
can get irreversibility from reversible equations. But was it only a mechanical 
argument that we used? Let us look into it more closely. 

Since our question has to do with the entropy, our problem is to try to find a 
microscopic description of entropy. If we say we have a certain amount of energy 
in something, like a gas, then we can get a microscopic picture of it, and say 
that every atom has a certain energy. All these energies added together give us 
the total energy. Similarly, maybe every atom has a certain entropy. If we add 
everything up, we would have the total entropy. It does not work so well, but 
let us see what happens. 

As an example, we calculate the entropy difference between a gas at a certain 
temperature at one volume, and a gas at the same temperature at another volume. 
We remember, from Chapter 44, that we had, for the change in entropy, 



In the present case, the energy of the gas is the same before and after expansion, 
since the temperature does not change. So we have to add enough heat to equal 
the work done by the gas or, for each little change in volume, 


dQ = PdV . 

Putting this in for dQ , we get 
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as we obtained in Chapter 44. For instance, if we expand the volume by a factor 
of 2, the entropy change is Nk In 2. 

Let us now consider another interesting example. Suppose we have a box with 
a barrier in the middle. On one side is neon (“black” molecules), and on the other, 
argon (“white” molecules). Now we take out the barrier, and let them mix. How 
much has the entropy changed? It is possible to imagine that instead of the 
barrier we have a piston, with holes in it that let the whites through but not the 
blacks, and another kind of piston which is the other way around. If we move 
one piston to each end, we see that, for each gas, the problem is like the one we 
just solved. So we get an entropy change of Nk In 2, which means that the entropy 
has increased by A; In 2 per molecule. The 2 has to do with the extra room that 
the molecule has, which is rather peculiar. It is not a property of the molecule 
itself, but of how much room the molecule has to run around in. This is a strange 
situation, where entropy increases but where everything has the same temperature 
and the same energy! The only thing that is changed is that the molecules are 
distributed differently. 

We well know that if we just pull the barrier out, everything will get mixed 
up after a long time, due to the collisions, the jiggling, the banging, and so on. 
Every once in a while a white molecule goes toward a black, and a black one goes 
toward a white, and maybe they pass. Gradually the whites worm their way, by 
accident, across into the space of blacks, and the blacks worm their way, by 
accident, into the space of whites. If we wait long enough we get a mixture. 
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Clearly, this is an irreversible process in the real world, and ought to involve an 
increase in the entropy. 

Here we have a simple example of an irreversible process which is completely 
composed of reversible events. Every time there is a collision between any two 
molecules, they go off in certain directions. If we took a moving picture of a colli- 
sion in reverse, there would be nothing wrong with the picture. In fact, one kind 
of collision is just as likely as another. So the mixing is completely reversible, and 
yet it is irreversible. Everyone knows that if we started with white and with black, 
separated, we would get a mixture within a few minutes. If we sat and looked at 
it for several more minutes, it would not separate again but would stay mixed. 
So we have an irreversibility which is based on reversible situations. But we also 
see the reason now. We started with an arrangement which is, in some sense, 
ordered . Due to the chaos of the collisions, it becomes disordered. It is the change 
from an ordered arrangement to a disordered arrangement which is the source of 
the irreversibility . 

It is true that if we took a motion picture of this, and showed it backwards, 
we would see it gradually become ordered. Someone would say, “That is against 
the laws of physics!” So we would run the film over again, and we would look at 
every collision. Every one would be perfect, and every one would be obeying 
the laws of physics. The reason, of course, is that every molecule’s velocities are 
just right, so if the paths are all followed back, they get back to their original condi- 
tion. But that is a very unlikely circumstance to have. If we start with the gas in 
no special arrangement, just whites and blacks, it will never get back. 

46-5 Order and entropy 

So we now have to talk about what we mean by disorder and what we mean 
by order. It is not a question of pleasant order or unpleasant disorder. What is 
different in our mixed and unmixed cases is the following. Suppose we divide the 
space into little volume elements. If we have white and black molecules, how many 
ways could we distribute them among the volume elements so that white is on one 
side, and black on the other? On the other hand, how many ways could we dis- 
tribute them with no restriction on which goes where? Clearly, there are many 
more ways to arrange them in the latter case. We measure “disorder” by the 
number of ways that the insides can be arranged, so that from the outside it looks 
the same. The logarithm of that number of ways is the entropy . The number of 
ways in the separated case is less, so the entropy is less, or the “disorder” is less. 

So with the above technical definition of disorder we can understand the 
proposition. First, the entropy measures the disorder. Second, the universe al- 
ways goes from “order” to “disorder,” so entropy always increases. Order is not 
order in the sense that we like the arrangement, but in the sense that the number 
of different ways we can hook it up, and still have it look the same from the outside, 
is relatively restricted. In the case where we reversed our motion picture of the 
gas mixing, there was not as much disorder as we thought. Every single atom had 
exactly the correct speed and direction to come out right! The entropy was not 
high after all, even though it appeared so. 

What about the reversibility of the other physical laws? When we talked 
about the electric field which comes from an accelerating charge, it was said that 
we must take the retarded field. At a time t and at a distance r from the charge, 
we take the field due to the acceleration at a time t — r/c, not t + r/c. So it 
looks, at first, as if the law of electricity is not reversible. Very strangely, however, 
the laws we used come from a set of equations called Maxwell’s equations, which 
are, in fact, reversible. Furthermore, it is possible to argue that if we were to use 
only the advanced field, the field due to the state of affairs at t + r/c , and do it 
absolutely consistently in a completely enclosed space, everything happens exactly 
the same way as if we use retarded fields! This apparent irreversibility in electricity, 
at least in an enclosure, is thus not an irreversibility at all. We have some feeling 
for that already, because we know that when we have an oscillating charge which 
generates fields which are bounced from the walls of an enclosure we ultimately 
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get to an equilibrium in which there is no one-sidedness. The retarded field ap- 
proach is only a convenience in the method of solution. 

So far as we know, all the fundamental laws of physics, like Newton’s equa- 
tions, are reversible. Then where does irreversibility come from? It comes from 
order going to disorder, but we do not understand this until we know the origin 
of the order. Why is it that the situations we find ourselves in every day are always 
out of equilibrium? One possible explanation is the following. Look again at our 
box of mixed white and black molecules. Now it is possible, if we wait long enough, 
by sheer, grossly improbable, but possible, accident, that the distribution of mole- 
cules gets to be mostly white on one side and mostly black on the other. After 
that, as times goes on and accidents continue, they get more mixed up again. 

Thus one possible explanation of the high degree of order in the present-day 
world is that it is just a question of luck. Perhaps our universe happened to have 
had a fluctuation of some kind in the past, in which things got somewhat separated, 
and now they are running back together again. This kind of theory is not un- 
symmetrical, because we can ask what the separated gas looks like either a little 
in the future or a little in the past. In either case, we see a grey smear at the inter- 
face, because the molecules are mixing again. No matter which way we run time, 
the gas mixes. So this theory would say the irreversibility is just one of the acci- 
dents of life. 

We would like to argue that this is not the case. Suppose we do not look at 
the whole box at once, but only at a piece of the box. Then, at a certain moment, 
suppose we discover a certain amount of order. In this little piece, white and black 
are separate. What should we deduce about the condition in places where we have 
not yet looked? If we really believe that the order arose from complete disorder 
by a fluctuation, we must surely take the most likely fluctuation which could 
produce it, and the most likely condition is not that the rest of it has also become 
disentangled! Therefore, from the hypothesis that the world is a fluctuation, all 
of the predictions are that if we look at a part of the world we have never seen 
before, we will find it mixed up, and not like the piece we just looked at. If our 
order were due to a fluctuation, we would not expect order anywhere but where we 
have just noticed it. 

Now we assume the separation is because the past of the universe was really 
ordered. It is not due to a fluctuation, but the whole thing used to be white and 
black. This theory now predicts that there will be order in other places — the 
order is not due to a fluctuation, but due to a much higher ordering at the beginning 
of time. Then we would expect to find order in places where we have not yet 
looked. 

The astronomers, for example, have only looked at some of the stars. Every 
day they turn their telescopes to other stars, and the new stars are doing the 
same thing as the other stars. We therefore conclude that the universe is not a 
fluctuation, and that the order is a memory of conditions when things started. 
This is not to say that we understand the logic of it. For some reason, the universe 
at one time had a very low entropy for its energy content, and since then the entropy 
has increased. So that is the way toward the future. That is the origin of all ir- 
reversibility, that is what makes the processes of growth and decay, that makes us 
remember the past and not the future, remember the things which are closer to 
that moment in the history of the universe when the order was higher than now, 
and why we are not able to remember things where the disorder is higher than 
now, which we call the future. So, as we commented in an earlier chapter, the 
entire universe is in a glass of wine, if we look at it closely enough. In this case 
the glass of wine is complex, because there is water and glass and light and every- 
thing else. 

Another delight of our subject of physics is that even simple and idealized 
things, like the ratchet and pawl, work only because they are part of the universe. 
The ratchet and pawl works in only one direction because it has some ultimate 
contact with the rest of the universe. If the ratchet and pawl were in a box and 
isolated for some sufficient time, the wheel would be no more likely to go one way 
than the other. But because we pull up the shades and let the light out, because 
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we cool off on the earth and get heat from the sun, the ratchets and pawls that 
we make can turn one way. This one-wayness is interrelated with the fact that the 
ratchet is part of the universe. It is part of the universe not only in the sense that 
it obeys the physical laws of the universe, but its one-way behavior is tied to the 
one-way behavior of the entire universe. It cannot be completely understood until 
the mystery of the beginnings of the history of the universe are reduced still further 
from speculation to scientific understanding. 


46-9 



47 


Sound . The wave equation 


47-1 Waves 


In this chapter we shall discuss the phenomenon of waves. This is a phenom- 
enon which appears in many contexts throughout physics, and therefore our 
attention should be concentrated on it not only because of the particular example 
considered here, which is sound, but also because of the much wider application 
of the ideas in all branches of physics. 

It was pointed out when we studied the harmonic oscillator that there are not 
only mechanical examples of oscillating systems but electrical ones as well. Waves 
are related to oscillating systems, except that wave oscillations appear not only as 
time-oscillations at one place, but propagate in space as well. 

We have really already studied waves. When we studied light, in learning about 
the properties of waves in that subject, we paid particular attention to the interfer- 
ence in space of waves from several sources at different locations and all at the 
same frequency. There are two important wave phenomena that we have not yet 
discussed which occur in light, i.e., electromagnetic waves, as well as in any other 
form of waves. The first of these is the phenomenon of interference in time rather 
than interference in space. If we have two sources of sound which have slightly 
different frequencies and if we listen to both at the same time, then sometimes the 
waves come with the crests together and sometimes with the crest and trough to- 
gether (see Fig. 47-1). The rising and falling of the sound that results is the phenom- 
enon of beats or, in other words, of interference in time. The second phenomenon 
involves the wave patterns which result when the waves are confined within a given 
volume and reflect back and forth from walls. 

These effects could have been discussed, of course, for the case of electro- 
magnetic waves. The reason for not having done this is that by using one example 
we would not generate the feeling that we are actually learning about many different 
subjects at the same time. In order to emphasize the general applicability of waves 
beyond electrodynamics, we consider here a different example, in particular 
sound waves. 

Other examples of waves are water waves consisting of long swells that we 
see coming in to the shore, or the smaller water waves consisting of surface tension 
ripples. As another example, there are two kinds of elastic waves in solids; a 
compressional (or longitudinal) wave in which the particles of the solid oscillate 
back and forth along the direction of propagation of the wave (sound waves in a 
gas are of this kind), and a transverse wave in which the particles of the solid os- 
cillate in a direction perpendicular to the direction of propagation. Earthquake 
waves contain elastic waves of both kinds, generated by a motion at some place in 
the earth’s crust. 

Still another example of waves is found in modern physics. These are waves 
which give the probability amplitude of finding a particle at a given place — the 
“matter waves” which we have already discussed. Their frequency is proportional 
to the energy and their wave number is proportional to the momentum. They are 
the waves of quantum mechanics. 

In this chapter we shall consider only waves for which the velocity is independ- 
ent of the wavelength. This is, for example, the case for light in a vacuum. The 
speed of light is then the same for radiowaves, blue light, green light, or for any 
other wavelength. Because of this behavior, when we began to describe the wave 
phenomenon we did riot notice at first that we had wave propagation. Instead, we 
said that if a charge is moved at one place, the electric field at a distance x was 
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Fig. 47-1. Interference in time of 
two sound sources with slightly different 
frequencies, resulting in beats. 



Fig. 47-2. The solid curve shows 
what the electric field might be like at 
some instant of time and the dashed curve 
shows what the electric field is at a time 
t later. 


proportional to the acceleration, not at the time /, but at the earlier time t — x/c. 
Therefore if we were to picture the electric field in space at some instant of time, 
as in Fig. 47-2, the electric field at a time t later would have moved the distance 
ct 9 as indicated in the figure. Mathematically, we can say that in the one-dimen- 
sional example we are taking, the electric field is a function of x — ct. We see 
that at t = 0, it is some function of x . If we consider a later time, we need only to 
increase x somewhat to get the same value of the electric field. For example, if the 
maximum field occurred at x = 3 at time zero, then to find the new position of 
the maximum field at time t we need 

x — ct — 3 or x = 3 + ct. 

We see that this kind of function represents the propagation of a wave. 

Such a function, f{x — ct), then represents a wave. We may summarize this 
description of a wave by saying simply that 

f(x - ct) = f{x + Ax - c(t + AO), 

when Ax = c At. There is, of course, another possibility, i.e., that instead of a 
source to the left as indicated in Fig. 47-2, we have a source on the right, so that 
the wave propagates toward negative x. Then the wave would be described by 
g(x + ct). 

There is the additional possibility that more than one wave exists in space at 
the same time, and so the electric field is the sum of the two fields, each one propa- 
gating independently. This behavior of electric fields may be described by saying 
that if fi(x — ct) is a wave ? and if f 2 (x — ct) is another wave, then their sum is 
also a wave. This is called the principle of superposition. The same principle is 
valid in sound. 

We are familiar with the fact that if a sound is produced, we hear with complete 
fidelity the same sequence of sounds as was generated. If we had high frequencies 
travelling faster than low frequencies, a short, sharp noise would be heard as a 
succession of musical sounds. Similarly, if red light travelled faster than blue 
light, a flash of white light would be seen first as red, then as white, and finally as 
blue. We are familiar with the fact that this is not the case. Both sound and 
light travel with a speed in air which is very nearly independent of frequency. 
Examples of wave propagation 'for which this independence is not true will be 
considered in Chapter 48. 

In the case of light (electromagnetic waves) we gave a rule which determined 
the electric field at a point as a result of the acceleration of a charge. One might 
expect now that what we should do is give a rule whereby some quality of the air, 
say the pressure, is determined at a given distance from a source in terms of the 
source motion, delayed by the travel time of the sound. In the case of light this 
procedure was acceptable because all that we knew was that a charge at one place 
exerts a force on another charge at another place. The details of propagation from 
the one place to the other were not absolutely essential. In the case of sound, 
however, we know that it propagates through the air between the source and the 
hearer, and it is certainly a natural question to ask what, at any given moment, 
the pressure of the air is. We would like, in addition, to know exactly how the air 
moves. In the case of electricity we could accept a rule, since we could say that 
we do not yet know the laws of electricity, but we cannot make the same remark 
with regard to sound. We would not be satisfied with a rule stating how the sound 
pressure moves through the air, because the process ought to be understandable as 
a consequence of the laws of mechanics. In short, sound is a branch of mechanics, 
and so it is to be understood in terms of Newton’s laws. The propagation of 
sound from one place to another is merely a consequence of mechanics and the 
properties of gases, if it propagates in a gas, or of the properties of liquids or solids, 
if it propagates through such mediums. Later we shall derive the properties 
of light and its wave propagation in a similar way from the laws of electrody- 
namics. 
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47-2 The propagation of sound 

We shall give a derivation of the properties of the propagation of sound 
between the source and the receiver as a consequence of Newton’s laws, and we shall 
not consider the interaction with the source and the receiver. Ordinarily we em- 
phasize a result rather than a particular derivation of it. In this chapter we take 
the opposite view. The point here, in a certain sense, is the derivation itself. This 
problem of explaining new phenomena in terms of old ones, when we know the 
laws of the old ones, is perhaps the greatest art of mathematical physics. The 
mathematical physicist has two problems: one is to find solutions, given the equa- 
tions, and the other is to find the equations which describe a new phenomenon. 
The derivation here is an example of the second kind of problem. 

We shall take the simplest example here — the propagation of sound in one 
dimension. To carry out such a derivation it is necessary first to have some kind 
of understanding of what is going on. Fundamentally what is involved is that if 
an object is moved at one place in the air, we observe that there is a disturbance 
which travels through the air. If we ask what kind of disturbance, we would say 
that we would expect that the motion of the object produces a change of pressure. 
Of course, if the object is moved gently, the air merely flows around it, but what we 
are concerned with is a rapid motion, so that there is not sufficient time for such 
a flow. Then, with the motion, the air is compressed and a change of pressure is 
produced which pushes on additional air. This air is in turn compressed, which 
leads again to an extra pressure, and a wave is propagated. 

We now want to formulate such a process. We have to decide what variables 
we need. In our particular problem we would need to know how much the air 
has moved, so that the air displacement in the sound wave is certainly one relevant 
variable. In addition we would like to describe how the air density changes as it is 
displaced. The air pressure also changes, so this is another variable of interest. 
Then, of course, the air has a velocity , so that we shall have to describe the velocity 
of the air particles. The air particles also have accelerations — but as we list these 
many variables we soon realize that the velocity and acceleration would be known 
if we knew how the air displacement varies with time. 

As we said, we shall consider the wave in one dimension. We can do this if 
we are sufficiently far from the source that what we call the wavefronts are very 
nearly planes. We thus make our argument simpler by taking the least complicated 
example. We shall then be able to say that the displacement, x, depends only on 
x and /, and not on y and z. Therefore the description of the air is given by x(x, t). 

Is this description complete? It would appear to be far from complete, for 
we know none of the details of how the air molecules are moving. They are moving 
in all directions, and this state of affairs is certainly not described by means of this 
function X(x, t). From the point of view of kinetic theory, if we have a higher 
density of molecules at one place and a lower density adjacent to that place, the 
molecules would move away from the region of higher density to the one of lower 
density, so as to equalize this difference. Apparently we would not get an oscillation 
and there would be no sound. What is necessary to get the sound wave is this 
situation: as the molecules rush out of the region of higher density and higher 
pressure, they give momentum to the molecules in the adjacent region of lower 
density. For sound to be generated, the regions over which the density and pressure 
change must be much larger than the distance the molecules travel before colliding 
with other molecules. This distance is the mean free path, and the distance between 
pressure crests and troughs must be much larger than this. Otherwise the molecules 
would move freely from the crest to the trough and immediately smear out the wave. 

It is clear that we are going to describe the gas behavior on a scale large 
compared with the mean free path, and so the properties of the gas will not be 
described in terms of the individual molecules. The displacement, for example, 
will be the displacement of the center of mass of a small element of the gas, and 
the pressure or density will be the pressure or density in this region. We shall 
call the pressure P and the density p, and they will be functions of x and t. We 
must keep in mind that this description is an approximation which is valid only 
when these gas properties do not vary too rapidly with distance. 
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47-3 The wave equation 

The physics of the phenomenon of sound waves thus involves three features: 

I. The gas moves and changes the density. 

II. The change in density corresponds to a change in pressure. 

III. Pressure inequalities generate gas motion. 

Let us consider II first. For a gas, a liquid, or a solid, the pressure is some function 
of the density. Before the sound wave arrives, we have equilibrium, with a pressure 
P 0 and a corresponding density p 0 . A pressure P in the medium is connected to 
the density by some characteristic relation P — f(p) and, in particular, the equilib- 
rium pressure P 0 is given by P 0 — f(p 0 ). The changes of pressure in sound from 
the equilibrium value are extremely small. A convenient unit for measuring pressure 
is the bar , where 1 bar = 10 5 n/m 2 . The pressure of 1 standard atmosphere is very 
nearly 1 bar: 1 atm = 1.0133 bars. In sound we use a logarithmic scale of in- 
tensities since the sensitivity of the ear is roughly logarithmic. This scale is the 
decibel scale, in which the acoustic pressure level for the pressure amplitude P 
is defined as 

I (acoustic pressure level) = 20 logi 0 (F/P re f) in db, (47.1) 

where the reference pressure P re f = 2 X 10 — 10 bar. A pressure amplitude of 
P = 10 3 P re f = 2 X 10 -7 bar* corresponds to a moderately intense sound of 
60 decibels. We see that the pressure changes in sound are extremely small com- 
pared with the equilibrium, or mean, pressure of 1 atm. The displacements and 
the density changes are correspondingly extremely small. In explosions we do not 
have such small changes; the excess pressures produced can be greater than 
1 atm. These large pressure changes lead to new effects which we shall consider 
later. In sound we do not often consider acoustic intensity levels over 100 db; 
120 db is a level which is painful to the ear. Therefore, for sound, if we write 

P = Pq + P e , P = Po + Pe, (47.2) 

we shall always have the pressure change P e very small compared with P 0 and the 
density change p e very small compared with p 0 . Then 

Pq + Pe ^ f(pQ + Pe) = f(po) + Pe/'(po), (47.3) 
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Fig. 47-3. The displacement of the 
air at x is X(x, f), and at x + Ax it is 
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where P 0 = f(p 0 ) and /'(p 0 ) stands for the derivative of f(p) evaluated at p = p 0 . 
We can take the second step in this equality only because p e is very small. We find in 
this way that the excess pressure P e is proportional to the excess density p e , and we 
may call the proportionality factor k : 

P e = Kp e , where k = /'(p 0 ) = (dP/dp) 0 . (II) (47.4) 

The relation we needed for II is this very simple one. 

Let us now consider I. We shall suppose that the position of a portion of air 
undisturbed by the sound wave is x and the displacement at the time t due to the 
sound is x(x , /), so that its new position is x + x(x, / ), as in Fig. 47-3. Now the 
undisturbed position of a nearby portion of air is x + Ax, and its new position 
is x + Ax + x(x + Ax, /). We can now find the density changes in the following 
way. Since we are limiting ourselves to plane waves, we can take a unit area 
perpendicular to the x-direction, which is the direction of propagation of the sound 
wave. The amount of air, per unit area, in Ax is then p 0 Ax, where p 0 is the un- 
disturbed, or equilibrium, air density. This air, when displaced by the sound wave, 
now lies between x + x(x, t) and x + Ax + x(x + Ax, /), so that we have the 
same matter in this interval that was in Ax when undisturbed. If p is the new 
density, then 

Po Ax = p[x + Ax + x(x + Ax, /) — x — X(x, 0]* (47.5) 


* With this choice of P re f, the P is not the peak pressure in the sound wave but the “root- 
mean-square” pressure, which is 1/(2 ) 112 times the peak pressure. 
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Since A* is small, we can write X(x + Ax, /) - x(x, t) = (dx/dx) Ax. This 
derivative is a partial derivative, since X depends on the time as well as on x. 
Our equation then is 


or 


PoAx = 


p(^Ax + Ax) 


(47.6) 


Po = (Po + P«) fa + Po + Pe- 


(47.7) 


Now in sound waves all changes are small so that p e is small, x is small, and dx/dx 
is also small. Therefore in the relation that we have just found, 


„ dX dX 

pe = - p «Tx- p *Tx' (47.8) 

we can neglect p e dx/dx compared with p 0 dx/dx. Thus we get the relation we 
needed for I : 

dx 

Pe = ~ Po dx ‘ ^ (47.9) 


This equation is what we would expect physically. If the displacements vary with 
x, then there will be density changes. The sign is also right: if the displacement x 
increases with x, so that the air is stretched out, the density must go down. 

We now need the third equation, which is the equation of the motion pro- 
duced by the pressure. If we know the relation between the force and the pressure, 
we can then get the equation of motion. If we take a thin slab of air of length Ax 
and of unit area perpendicular to x, then the mass of air in this slab is p 0 Ax and 
it has the acceleration d 2 x/ dt' 2 , so the mass times the acceleration for this slab of 
matter is po Ax (d 2 x/dt 2 ). (It makes no difference for small Ax whether the accelera- 
tion d x/dt 2 is evaluated at an edge of the slab or at some intermediate position.) 
If now we find the force on this matter for a unit area perpendicular to x, it will 
then be equal to p 0 Ax (5 2 x/5/ 2 ). We have the force in the +x-direction, at x, of 
amount P(x, t) per unit area, and we have the force in the opposite direction, at 
x + Ax, of amount P(x + Ax, t) per unit area (Fig. 47-4): 

P(x, 0 - P(x + Ax, /) = - Ax = - ^ Ax, (47.10) 

since Ax is small and since the only part of P which changes is the excess pressure 
P e . We now have III : 

d 2 X dP e 

Po M = --frc' O 11 ) (47.11) 
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Fig. 47-4. The net force in the 
positive x-direction produced by the pres- 
sure acting on unit area perpendicular 
to x is — (dP/dx) Ax. 


and so we have enough equations to interconnect things and reduce down to one 
variable, say to X. We can eliminate P e from III by using II, so that we get 


dt 2 * dx ’ 


(47.12) 


and then we can use I to eliminate p e . In this way we find that p 0 cancels out and 
that we are left with 


d 2 x _ d 2 x 
d/2 K 5x2 


(47.13) 


We shall call c 2 = k, so that we can write 


d 2 x _ 1 d 2 x 

dx 2 c2 5/2 

S 


(47.14) 


This is the wave equation which describes the behavior of sound in matter. 
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47-4 Solutions of the wave equation 

We now can see whether this equation really does describe the essential 
properties of sound waves in matter. We want to deduce that a sound pulse, or 
disturbance, will move with a constant speed. We want to verify that two different 
pulses can move through each other — the principle of superposition. We also 
want to verify that sound can go either to the right or to the left. All these properties 
should be contained in this one equation. 

We have remarked that any plane-wave disturbance which moves with a 
constant velocity v has the form f(x — vt ). Now we have to see whether x(x, t) = 
f(x — vt) is a solution of the wave equation. When we calculate dx/dx , we get the 
derivative of the function, dx/dx = f f {x — vt). Differentiating once more, we find 

0 = f%x - Vi). (47.15) 

The differentiation of this same function with respect to t gives — v times the 
derivative of the function, or dx/dt = —vf f (x — vt), and the second time de- 
rivative is 

= »7"(* - «*)• (47.16) 

It is evident that /( x — vt) will satisfy the wave equation provided the wave velocity 
v is equal to c a . 

We find, therefore, from the laws of mechanics that any sound disturbance 
propagates with the velocity c 8 , and in addition we find that 

c, = k 112 = (dP/dp) l l2 , 

and so we have related the wave velocity to a property of the medium. 

If we consider a wave travelling in the opposite direction, so that x(x, t) = 
g(x + vt), it is easy to see that such a disturbance also satisfies the wave equation. 
The only difference between such a wave and one travelling from left to right is 
in the sign of v, but whether we have x + vt or x — vt as the variable in the 
function does not affect the sign of d 2 x/dt 2 , since it involves only v 2 . It follows 
that we have a solution for waves propagating in either direction with speed c 8 . 

An extremely interesting question is that of superposition. Suppose one solu- 
tion of the wave equation has been found, say x x . This means that the second de- 
rivative of Xi with respect to x is equal to l/c 8 times the second derivative of x x 
with respect to t. Now any other solution x 2 has this same property. If we super- 
pose these two solutions, we have 

X(x, t) - Xi(x, 0 + X 2 (x, o, (47.17) 


and we wish to verify that x(x, t) is also a wave, i.e., that X satisfies the wave equa- 
tion. We can easily prove this result, since we have 


d 2 x _ a 2 xi a 2 x 2 

dX* dX* ^ dX^ 

and, in addition, 

dbc = Ai 3^X2 
a/2 a/2 ^ a/2 


(47.18) 

(47.19) 


It follows that d 2 x/dx 2 = (l/c 2 ) d 2 x/dt 2 , so we have verified the principle of 
superposition. The proof of the principle of superposition follows from the fact 
that the wave equation is linear in X. 

We can now expect that a plane light wave propagating in the x-direction, 
polarized so that the electric field is in the ^-direction, will satisfy the wave equation 


d 2 Ey _ 2 

dx 2 C 2 dt 2 


(47.20) 


47-6 



where c is the speed of light. This wave equation is one of the consequences of 
Maxwell’s equations. The equations of electrodynamics will lead to the wave 
equation for light just as the equations of mechanics lead to the wave equation 
for sound. 


47-5 The speed of sound 

Our deduction of the wave equation for sound has given us a formula which 
connects the wave speed with the rate of change of pressure with the density at the 
normal pressure: 



In evaluating this rate of change, it is essential to know how the temperature varies. 
In a sound wave, we would expect that in the region of compression the temperature 
would be raised, and that in the region of rarefaction the temperature would be 
lowered. Newton was the first to calculate the rate of change of pressure with 
density, and he supposed that the temperature remained unchanged. He argued 
that the heat was conducted from one region to the other so rapidly that the 
temperature could not rise or fall. This argument gives the isothermal speed of 
sound, and it is wrong. The correct deduction was given later by Laplace, who put 
forward the opposite idea — that the pressure and temperature change adiabatically 
in a sound wave. The heat flow from the compressed region to the rarefied region is 
negligible so long as the wavelength is long compared with the mean free path. 
Under this condition the slight amount of heat flow in a sound wave does not 
affect the speed, although it gives a small absorption of the sound energy. We can 
expect correctly that this absorption increases as the wavelength approaches the 
mean free path, but these wavelengths are smaller by factors of about a million 
than the wavelengths of audible sound. 

The actual variation of pressure with density in a sound wave is the one that 
allows no heat flow. This corresponds to the adiabatic variation, which we found 
to be PV y = const, where V was the volume. Since the density p varies inversely 
with V , the adiabatic connection between P and p is 


P = const p 7 , 


(47.22) 


from which we get dP/dp = 7 P/p. We then have for the speed of sound the 
relation 


7 P 
P 


(47.23) 


We can also write c 2 = 7 PV/pV and make use of the relation PV ~ NkT . 
Further, we see that pV is the mass of gas, which can also be expressed as Nm , 
or as p, where m is the mass of a molecule aud p is the molecular weight. In this 
way we find that 


7 kT _ 7 RT 
m p 


(47.24) 


from which it is evident that the speed of sound depends only on the gas temperature 
and not on the pressure or the density. We also have observed that 


kT = 


(47.25) 


where (v 2 ) is the mean square of the speed of the molecules. 
= (y/3)(v 2 ), or 


c 8 — 



l>av* 


It follows that 


(47.26) 


This equation states that the speed of sound is some number which is roughly 
1/(3) 1/2 times some average speed, v av , of the molecules (the square root of the 

47-7 



mean square velocity). In other words, the speed of sound is of the same order of 
magnitude as the speed of the molecules, and is actually somewhat less than this 
average speed. 

Of course we could expect such a result, because a disturbance like a change 
in pressure is, after all, propagated by the motion of the molecules. However, 
such an argument does not tell us the precise propagation speed ; it could have 
turned out that sound was carried primarily by the fastest molecules, or by the 
slowest molecules. It is reasonable and satisfying that the speed of sound is roughly 
^ of the average molecular speed v av . 
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48 


Beats 


48-1 Adding two waves 

Some time ago we discussed in considerable detail the properties of light 
waves and their interference — that is, the effects of the superposition of two waves 
from different sources. In all these analyses we assumed that the frequencies of the 
sources were all the same. In this chapter we shall discuss some of the phenomena 
which result from the interference of two sources which have different frequencies. 

It is easy to guess what is going to happen. Proceeding in the same way as we 
have done previously, suppose we have two equal oscillating sources of the same 
frequency whose phases are so adjusted, say, that the signals arrive in phase at 
some point P . At that point, if it is light, the light is very strong; if it is sound, it 
is very loud; or if it is electrons, many of them arrive. On the other hand, if the 
arriving signals were 180° out of phase, we would get no signal at P, because the 
net amplitude there is then a minimum. Now suppose that someone twists the 
“phase knob” of one of the sources and changes the phase at P back and forth, say, 
first making it 0° and then 180°, and so on. Of course, we would then find variations 
in the net signal strength. Now we also see that if the phase of one source is slowly 
changing relative to that of the other in a gradual, uniform manner, starting at 
zero, going up to ten, twenty, thirty, forty degrees, and so on, then what we would 
measure at P would be a series of strong and weak “pulsations,” because when the 
phase shifts through 360° the amplitude returns to a maximum. Of course, to 
say that one source is shifting its phase relative to another at a uniform rate is the 
same as saying that the number of oscillations per second is slightly different for 
the two. 

So we know the answer: if we have two sources at slightly different frequencies 
we should find, as a net result, an oscillation with a slowly pulsating intensity. 
That is all there really is to the subject! 

It is very easy to formulate this result mathematically also. Suppose, for ex- 
ample, that we have two waves, and that we do not worry for the moment about 
all the spatial relations, but simply analyze what arrives at P. From one source, let 
us say, we would have cos coi /, and from the other source, cos c o 2 t, where the two 
co’s are not exactly the same. Of course the amplitudes may not be the same, either, 
but we can solve the general problem later; let us first take the case where the 
amplitudes are equal. Then the total amplitude at P is the sum of these two cosines. 
If we plot the amplitudes of the waves against the time, as in Fig. 48-1, we see that 
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Fig. 48-1. The superposition of two 
cosine waves with frequencies in the 
ratio 8:10, The precise repetition of the 
pattern within each “beat 1 ’ is not typical 
of the general case. 
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where the crests coincide we get a strong wave, and where a trough and crest 
coincide we get practically zero, and then when the crests coincide again we get a 
strong wave again. 

Mathematically, we need only to add two cosines and rearrange the result 
somehow. There exist a number of useful relations among cosines which are not 
difficult to derive. Of course we know that 

e i(a+b) = e ia e ib, ( 4 8 . 1 ) 

and that e ta has a real part, cos a, and an imaginary part, sin a. If we take the real 
part of e lia + b \ we get cos (a + ft ). If we multiply out: 

e ia e xh = (cos a + i sin a)(cos b + i sin ft), 

we get cos a cos b — sin a sin b , plus some imaginary parts. But we now need 
only the real part, so we have 

cos (a + ft) = cos a cos b — sin a sin b. (48.2) 

Now if we change the sign of b , since the cosine does not change sign while the 
sine does, the same equation, for negative ft, is 

cos (a — ft) = cos a cos ft + sin a sin ft. (48.3) 

If we add these two equations together, we lose the sines and we learn that the 
product of two cosines is half the cosine of the sum, plus half the cosine of the 
difference: 

cos a cos ft = \ cos (a + ft) + \ cos ( a — ft). (48.4) 

Now we can also reverse the formula and find a formula for cos a + cos 0 if 
we simply let a — a + ft and 0 = a — b. That is, a = \(a + 0) and ft = 
i(a — 0 ), so that 

COS a + COS 0 = 2 COS J(a + 0) COS J(a — 0). (48.5) 

Now we can analyze our problem. The sum of cos coi* and cos c o 2 f is 

coswi t + cos o) 2 t = 2cos J(cox + w 2 )*cos£(o>i — co 2 )f. (48.6) 

Now let us suppose that the two frequencies are nearly the same, so that J(w i + co 2 ) 
is the average frequency, and is more or less the same as either. But a>i — w 2 is 
much smaller than coi or co 2 because, as we suppose, coi and w 2 are nearly equal. 
That means that we can represent the solution by saying that there is a high- 
frequency cosine wave more or less like the ones we started with, but that its “size” 
is slowly changing — its “size” is pulsating with a frequency which appears to be 
i(ooi — co 2 ). But is this the frequency at which the beats are heard? Although 
(48.6) says that the amplitude goes as cos i(o>i — w 2 ), what it is really telling us is 
that the high-frequency oscillations are contained between two opposed cosine 
curves (shown dotted in Fig. 48-1). On this basis one could say that the amplitude 
varies at the frequency 4(coi — w 2 ), but if we are talking about the intensity of 
the wave we must think of it as having twice this frequency. That is, the modulation 
of the amplitude, in the sense of the strength of its intensity, is at frequency coi — 
co 2 , although the formula tells us that we multiply by a cosine wave at half that 
frequency. The technical basis for the difference is that the high frequency-wave 
has a little different phase relationship in the second half-cycle. 

Ignoring this small complication, we may conclude that if we add two waves 
of frequency coi and co 2 , we will get a net resulting wave of average frequency 
i(u'i + w 2 ) which oscillates in strength with a frequency coi — w 2 . 

If the two amplitudes are different, we can do it all over again by multiplying 
the cosines by different amplitudes A x and A 2i and do a lot of mathematics, re- 
arranging, and so on, using equations like (48.2)-(48.5). However, there are other, 
easier ways of doing the same analysis. For example, we know that it is much 
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easier to work with exponentials than with sines and cosines and that we can repre- 
sent A i cos wit as the real part of A x e iw ^. The other wave would similarly be the 
real part of A 2 e' l03 2 t . If we add the two, we get A x e iu ^ + A 2 e i ”* t . If we then 
factor out the average frequency, we have 

A x e io> i* + A 2 e™* t = + A 2 e~ 1!2 ( 4 gj) 

Again we have the high-frequency wave with a modulation at the lower frequency. 


48-2 Beat notes and modulation 

If we are now asked for the intensity of the wave of Eq. (48.7), we can either 
take the absolute square of the left side, or of the right side. Let us take the left 
side. The intensity then is 

/ = Ai + A\ + 2^x^ 2 cos(wi — co 2 )t. (48.8) 

We see that the intensity swells and falls at a frequency co x — co 2 , varying between 
the limits (A x + A 2 ) 2 and (A x — A 2 ) 2 . If A x p* A 2> the minimum intensity 
is not zero. 

One more way to represent this idea is by means of a drawing, like Fig. 48-2. 
We draw a vector of length A x , rotating at a frequency o) X , to represent one of the 
waves in the complex plane. We draw another vector of length A 2) going around 
at a frequency co 2 , to represent the second wave. If the two frequencies are exactly 
equal, their resultant is of fixed length as it keeps revolving, and we get a definite, 
fixed intensity from the two. But if the frequencies are slightly different, the two 
complex vectors go around at different speeds. Figure 48-3 shows what the 
situation looks like relative to the vector A x e io, ^ t . We see that A 2 is turning slowly 
away from A u and so the amplitude that we get by adding the two is first strong, 
and then, as it opens out, when it gets to the 180° relative position the resultant 
gets particularly weak, and so on. As the vectors go around, the amplitude of the 
sum vector gets bigger and smaller, and the intensity thus pulsates. It is a relatively 
simple idea, and there are many different ways of representing the same thing. 

The effect is very easy to observe experimentally. In the case of acoustics, we 
may arrange two loudspeakers driven by two separate oscillators, one for each 
loudspeaker, so that they each make a tone. We thus receive one note from one 
source and a different note from the other source. If we make the frequencies 
exactly the same, the resulting effect will have a definite strength at a given space 
location. If we then de-tune them a little bit, we hear some variations in the 
intensity. The farther they are de-tuned, the more rapid are the variations of 
sound. The ear has some trouble following variations more rapid than ten or so 
per second. 

We may also see the effect on an oscilloscope which simply displays the sum 
of the currents to the two speakers. If the frequency of pulsing is relatively low, we 
simply see a sinusoidal wave train whose amplitude pulsates, but as we make the 
pulsations more rapid we see the kind of wave shown in Fig. 48-1. As we go to 
greater frequency differences, the “bumps” move closer together. Also, if the 
amplitudes are not equal and we make one signal stronger than the other, then we 
get a wave whose amplitude does not ever become zero, just as we expect. Every- 
thing works the way it should, both acoustically and electrically. 

The opposite phenomenon occurs too! In radio transmission using so-called 
amplitude modulation (am), the sound is broadcast by the radio station as follows: 
the radio transmitter has an ac electric oscillation which is at a very high frequency, 
for example 800 kilocycles per second, in the broadcast band. If this carrier signal 
is turned on, the radio station emits a wave which is of uniform amplitude at 800,000 
oscillations a second. The way the “information” is transmitted, the useless kind 
of information about what kind of car to buy, is that when somebody talks into a 
microphone the amplitude of the carrier signal is changed in step with the vibrations 
of sound entering the microphone. 



Fig. 48—2. The resultant of two com- 
plex vectors of equal frequency. 



Fig. 48-3. The resultant of two com- 
plex vectors of unequal frequency, as 
seen in the rotating frame of reference of 
one vector. Nine successive positions of 
the slowly rotating vector are shown. 
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If we take as the simplest mathematical case the situation where a soprano is 
singing a perfect note, with perfect sinusoidal oscillations of her vocal cords, 
then we get a signal whose strength is alternating as shown in Fig. 48-4. The 
audiofrequency alternation is then recovered in the receiver; we get rid of the 
carrier wave and just look at the envelope which represents the oscillations of the 
vocal cords, or the sound of the singer. The loudspeaker then makes corre- 
sponding vibrations at the same frequency in the air, and the listener is then 
essentially unable to tell the difference, so they say. Because of a number of dis- 
tortions and other subtle effects, it is, in fact, possible to tell whether we are listening 
to a radio or to a real soprano; otherwise the idea is as indicated above. 


Fig. 48—4. A modulated carrier wave. 
In this schematic sketch, co c /co m = 5. 
In an actual radiowave, co r /co m ~ 100. 


48-3 Side bands 


as 


Mathematically, the modulated wave described above would be expressed 
S = (1 + b cos o) m t) cos co c t, (48.9) 


where co c represents the frequency of the carrier and to m is the frequency of the 
audio tone. Again we use all those theorems about the cosines, or we can use 
e % 9 1 it makes no difference— it is easier with e l \ but it is the same thing. We 
then get 

S = cos co c t + \b cos (co c + c c m )t + J b cos (co c — o) m )t. (48.10) 


Fig. 48-5. The frequency spectrum 
of a carrier wave co c modulated by a 
single cosine wave co m . 


So, from another point of view, we can say that the output wave of the system 
consists of three waves added in superposition: first, the regular wave at the fre- 
quency co c , that is, at the carrier frequency, and then two new waves at two new 
frequencies. One is the carrier frequency plus the modulation frequency, and the 
other is the carrier frequency minus the modulation frequency. If, therefore, we 
make some kind of plot of the intensity being generated by the generator as a 
function of frequency, we would find a lot of intensity at the frequency of the carrier, 
naturally, but when a singer started to sing, we would suddenly also find intensity 
proportional to the strength of the singer, Z> 2 , at frequency o> c + and a> c — co m , 
as shown in Fig. 48-5. These are called side bands ; when there is a modulated 
signal from the transmitter, there are side bands. If there is more than one note 
at the same time, say to m and o) m > 9 there are two instruments playing; or if there is 
any other complicated cosine wave, then, of course, we can see from the mathematics 
that we get some more waves that correspond to the frequencies c o c =b co m '. 

Therefore, when there is a complicated modulation that can be represented 
as the sum of many cosines,* we find that the actual transmitter is transmitting 
over a range of frequencies, namely the carrier frequency plus or minus the maxi- 
mum frequency that the modulation signal contains. 

Although at first we might believe that a radio transmitter transmits only at 
the nominal frequency of the carrier, since there are big, superstable crystal 
oscillators in there, and everything is adjusted to be at precisely 800 kilocycles, 
the moment someone announces that they are at 800 kilocycles, he modulates the 
800 kilocycles, and so they are no longer precisely at 800 kilocycles! Suppose that 
the amplifiers are so built that they are able to transmit over a good range of the 
ear’s sensitivity (the ear can hear up to 20,000 cycles per second, but usually radio 
transmitters and receivers do not work beyond 10,000, so we do not hear the 
highest parts), then, when the man speaks, his voice may contain frequencies 
ranging up, say, to 10,000 cycles, so the transmitter is transmitting frequencies 


* A slight side remark : In what circumstances can a curve be represented as a sum of 
a lot of cosines? Answer: In all ordinary circumstances, except for certain cases the 
mathematicians can dream up. Of course, the curve must have only one value at a given 
point, and it must not be a crazy curve which jumps an infinite number of times in an 
infinitesimal distance, or something like that. But aside from such restrictions any 
reasonable curve (one that a singer is going to be able to make by shaking her vocal 
cords) can always be compounded by adding cosine waves together. 
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which may range from 790 to 810 kilocycles per second. Now if there were another 
station at 795 kc/sec, there would be a lot of confusion. Also, if we made our 
receiver so sensitive that it picked up only 800, and did not pick up the 10 kilo- 
cycles on either side, we would not hear what the man was saying, because the 
information would be on these other frequencies! Therefore it is absolutely es- 
sential to keep the stations a certain distance apart, so that their side bands do not 
overlap and, also, the receiver must not be so selective that it does not permit 
reception of the side bands as well as of the main nominal frequency. In the case 
of sound, this problem does not really cause much trouble. We can hear over a 
±20 kc/sec range, and we have usually from 500 to 1500 kc/sec in the broadcast 
band, so there is plenty of room for lots of stations. 

The television problem is more difficult. As the electron beam goes across 
the face of the picture tube, there are various little spots of light and dark. That 
“light” and “dark” is the “signal.” Now ordinarily the beam scans over the whole 
picture, 500 lines, approximately, in a thirtieth of a second. Let us consider that the 
resolution of the picture vertically and horizontally is more or less the same, so 
that there are the same number of spots per inch along a scan line. We want to 
be able to distinguish dark from light, dark from light, dark from light, over, say, 
500 lines. In order to be able to do this with cosine waves, the shortest wavelength 
needed thus corresponds to a wavelength, from maximum to maximum, of one 
250th of the screen size. So we have 250 X 500 X 30 pieces of information per 
second. The highest frequency that we are going to carry, therefore, is close to 
4 megacycles per second. Actually, to keep the television stations apart, we have 
to use a little bit more than this, about 6 mc/sec; part of it is used to carry the sound 
signal, and other information. So, television channels are 6 megacycles per second 
wide. It certainly would not be possible to transmit tv on an 800 kc/sec carrier, 
since we cannot modulate at a higher frequency than the carrier. 

At any rate, the television band starts at 54 megacycles. The first transmission 
channel, which is channel 2 (!), has a frequency range from 54 to 60 mc/sec, which 
is 6 mc/sec wide. “But,” one might say, “we have just proved that there were 
side bands on both sides, and therefore it should be twice that wide.” It turns out 
that the radio engineers are rather clever. If we analyze the modulation signal 
using not just cosine terms, but cosine and sine terms, to allow for phase differences, 
we then see that there is a definite, invariant relationship between the side band on 
the high-frequency side and the side band on the low-frequency side. What we 
mean is that there is no new information on that other side band. So what is done 
is to suppress one side band, and the receiver is wired inside such that the informa- 
tion which is missing is reconstituted by looking at the single side band and the 
carrier. Single side-band transmission is a clever scheme for decreasing the band 
widths needed to transmit information. 

48-4 Localized wave trains 

The next subject we shall discuss is the interference of waves in both space and 
time. Suppose that we have two waves travelling in space. We know, of course, 
that we can represent a wave travelling in space by e iitat ~ kx \ This might be, for 
example, the displacement in a sound wave. This is a solution of the wave equation 
provided that w 2 = k 2 c 2 , where c is the speed of propagation of the wave. In 
this case we can write it as e lk(x ~ ct \ which is of the general form f(x — cf). There- 
fore this must be a wave which is travelling at this velocity, c o/k, and that is c 
and everything is all right. 

Now we want to add two such waves together. Suppose we have a wave that 
is travelling with one frequency, and another wave travelling with another fre- 
quency. We leave to the reader to consider the case where the amplitudes are 
different; it makes no real difference. Thus we want to add + e *(a a t~k a x)^ 

We can add these by the same kind of mathematics we used when we added signal 
waves. Of course, if c is the same for both, this is easy, since it is the same as what 
we did before: 

x/c) _|_ e ita 2 (t—x[c ) _ gfoit* _|_ 


(48.11) 
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except that t f = t — x/c is the variable instead of t. So we get the same kind of 
modulations, naturally, but we see, of course, that those modulations are moving 
along with the wave. In other words, if we added two waves, but these waves were 
not just oscillating, but also moving in space, then the resultant wave would move 
along also, at the same speed. 

Now we would like to generalize this to the case of waves in which the relation- 
ship between the frequency and the wave number k is not so simple. Example: 
material having an index of refraction. We have already studied the theory of the 
index of refraction in Chapter 31, where we found that we could write k = /ico/c, 
where n is the index of refraction. As an interesting example, for x-rays we found 
that the index n is 


Nq 2 e 

2e 0 mo) 2 * 


(48.12) 


We actually derived a more complicated formula in Chapter 31, but this one is as 
good as any, as an example. 

Incidentally, we know that even when co and k are not linearly proportional, 
the ratio c o/k is certainly the speed of propagation for the particular frequency and 
wave number. We call this ratio the phase velocity ; it is the speed at which the phase, 
or the nodes of a single wave, would move along: 

Vp = I • (48.13) 


This phase velocity, for the case of x-rays in glass, is greater than the speed of 
light in vacuum (since n in 48.12 is less than 1), and that is a bit bothersome, 
because we do not think we can send signals faster than the speed of light! 

What we are going to discuss now is the interference of two waves in which 
co and k have a definite formula relating them. The above formula for n says that 
k is given as a definite function of co. To be specific, in this particular problem, the 
formula for k in terms of co is 


k = - - — , (48.14) 

c coc v 7 

where a = Nq 2 e /2e 0 m, a constant. At any rate, for each frequency there is a definite 
wave number, and we want to add two such waves together. 

Let us do it just as we did in Eq. (48.7): 


kiX) 


+ e 




e l/2i[(w 1 +« 2 )(-(A: 1 +A: 2 )x] 


| e l/2i[(«i— w 2 )(— (fci— k 2 )x] 


_|_ e -l/2i[(o >1 


' 2 )t—(ki—k 2 )x] 


} ■ 


(48.15) 


So we have a modulated wave again, a wave which travels with the mean frequency 
and the mean wave number, but whose strength is varying with a form which de- 
pends on the difference frequency and the difference wave number. 

Now let us take the case that the difference between the two waves is relatively 
small. Let us suppose that we are adding two waves whose frequencies are nearly 
equal; then (co x + co 2 )/2 is practically the same as either one of the co’s, and 
similarly for ( k x + k 2 )/ 2. Thus the speed of the wave, the fast oscillations, the 
nodes, is still essentially o>/k. But look, the speed of propagation of the modulation 
is not the same! How much do we have to change jc to account for a certain amount 
of tl The speed of this modulation wave is the ratio 


C0 X — C0 2 




k i — k 2 


(48.16) 


The speed of modulation is sometimes called the group velocity . If we take the 
case that the difference in frequency is relatively small, and the difference in wave 
number is then also relatively small, then this expression approaches, in the limit, 
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c/co 

v ° = dk 


(48.17) 



In other words, for the slowest modulation, the slowest beats, there is a definite 
speed at which they travel which is not the same as the phase speed of the waves — 
what a mysterious thing! 

The group velocity is the derivative of o> with respect to k , and the phase velocity 
is co/k. 


Let us see if we can understand why. Consider two waves, again of slightly 
different wavelength, as in Fig. 48-1. They are out of phase, in phase, out of 
phase, and so on. Now these waves represent, really, the waves in space travelling 
with slightly different frequencies also. Now because the phase velocity, the velocity 
of the nodes of these two waves, is not precisely the same, something new happens. 
Suppose we ride along with one of the waves and look at the other one; if they both 
went at the same speed, then the other wave would stay right where it was relative 
to us, as we ride along on this crest. We ride on that crest and right opposite us 
we see a crest; if the two velocities are equal the crests stay on top of each other. 
But it is not so that the two velocities are really equal. There is only a small differ- 
ence in frequency and therefore only a small difference in velocity, but because of 
that difference in velocity, as we ride along the other wave moves slowly forward, 
say, or behind, relative to our wave. So as time goes on, what happens to the node? 
If we move one wave train just a shade forward, the node moves forward (or back- 
ward) a considerable distance. That is, the sum of these two waves has an envelope, 
and as the waves travel along, the envelope rides on them at a different speed. 
The group velocity is the speed at which modulated signals would be transmitted. 

If we made a signal, i.e., some kind of change in the wave that one could recog- 
nize when he listened to it, a kind of modulation, then that modulation would 
travel at the group velocity, provided that the modulations were relatively slow. 
(When they are fast, it is much more difficult to analyze.) 

Now we may show (at long last), that the speed of propagation of x-rays in a 
block of carbon is not greater than the speed of light, although the phase velocity 
is greater than the speed of light. In order to do that, we must find do/dk, which 
we get by differentiating (48.14): dk/dco = l/c + a/co 2 c. The group velocity, 
therefore, is the reciprocal of this, namely, 


__ c 

Vg ~ 1 + a/co 2 ’ 


(48.18) 


which is smaller than c\ So although the phases can travel faster than the speed of 
light, the modulation signals travel slower, and that is the resolution of the apparent 
paradox! Of course, if we have the simple case that co = kc , then dco/dk is also 
c . So when all the phases have the same velocity, naturally the group has the same 
velocity. 


48-5 Probability amplitudes for particles 

Let us now consider one more example of the phase velocity which is extremely 
interesting. It has to do with quantum mechanics. We know that the amplitude 
to find a particle at a place can, in some circumstances, vary in space and time, let 
us say in one dimension, in this manner: 

* = Ae iiut ~ kx \ (48.19) 

where co is the frequency, which is related to the classical idea of the energy through 
E = hco, and k is the wave number, which is related to the momentum through 
p = hk. We would say the particle had a definite momentum p if the wave number 
were exactly k, that is, a perfect wave which goes on with the same amplitude 
everywhere. Equation (48.19) gives the amplitude, and if we take the absolute 
square, we get the relative probability for finding the particle as a function of posi- 
tion and time. This is a constant, which means that the probability is the same to 
find a particle anywhere. Now suppose, instead, that we have a situation where we 
know that the particle is more likely to be at one place than at another. We would 
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Fig. 48-6. A localized wave train. 


represent such a situation by a wave which has a maximum and dies out on either 
side (Fig. 48-6). (It is not quite the same as a wave like (48.1) which has a series of 
maxima, but it is possible, by adding several waves of nearly the same co and k 
together, to get rid of all but one maximum.) 

Now in those circumstances, since the square of (48.19) represents the chance 
of finding a particle somewhere, we know that at a given instant the particle is 
most likely to be near the center of the “lump,” where the amplitude of the wave 
is maximum. If now we wait a few moments, the waves will move, and after some 
time the “lump” will be somewhere else. If we knew that the particle originally was 
situated somewhere, classically, we would expect that it would later be elsewhere 
as a matter of fact, because it has a speed, after all, and a momentum. The quantum 
theory, then, will go into the correct classical theory for the relationship of mo- 
mentum, energy, and velocity only if the group velocity, the velocity of the modula- 
tion, is equal to the velocity that we would obtain classically for a particle of the 
same momentum. 

It is now necessary to demonstrate that this is, or is not, the case. According 
to the classical theory, the energy is related to the velocity through an equation like 


Similarly, the momentum is 


E = 


me 


Vl — v 2 /c 2 


mv 


Vl “ V 2 /c 2 


(48.20) 

(48.21) 


That is the classical theory, and as a consequence of the classical theory, by eliminat- 
ing v, we can show that 

E 2 — p 2 c 2 = m 2 c 4 . 


That is the four-dimensional grand result that we have talked and talked about, 
that Pup? = m 2 ; that is the relation between energy and momentum in the classical 
theory. Now that means, since these E's and p's are going to become co’s and k's, 
by substitution of E — hca and p — hk, that for quantum mechanics it is necessary 
that 

- tfk 2 = m 2 c 2 . (48.22) 


This, then, is the relationship between the frequency and the wave number of a 
quantum-mechanical amplitude wave representing a particle of mass m . From 
this equation we can deduce that co is 


co = cy/k 2 + m 2 c 2 /h 2 . 

The phase velocity, c c/k, is here again faster than the speed of light! 

Now let us look at the group velocity. The group velocity should be do)/dk, 
the speed at which the modulations move. We have to differentiate a square root, 
which is not very difficult. The derivative is 

do) __ kc 

^ V& 2 + m 2 c 2 /h 2 

Now the square root is, after all, co, so we could write this as du/dk = c 2 k/a ;. 
Further, k/o) is p/E, so 



But from (48.20) and (48.21), c 2 p/E = v, the velocity of the particle, according 
to classical mechanics. So we see that whereas the fundamental quantum-mechani- 
cal relationship E — hco and p = hk , for the identification of co and k with the 
classical E and p , only produces the equation co 2 — A; 2 c 2 = m 2 c*/h 2 , now we 
also understand the relationships (48.20) and (48.21) which connected E and p 
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to the velocity. Of course the group velocity must be the velocity of the particle 
if the interpretation is going to make any sense. If we think the particle is over 
here at one time, and then ten minutes later we think it is over there, as the quantum 
mechanics said, the distance traversed by the “lump,” divided by the time interval, 
must be, classically, the velocity of the particle. 


48-6 Waves in three dimensions 

We shall now bring our discussion of waves to a close with a few general 
remarks about the wave equation. These remarks are intended to give some view 
of the future— not that we can understand everything exactly just now, but rather to 
see what things are going to look like when we study waves a little more. First of 
all, the wave equation for sound in one dimension was 

d 2 x _ 1 d 2 x 

dx 2 C 2 dt 2 

where c is the speed of whatever the wave is — in the case of sound, it is the sound 
speed ; in the case of light, it is the speed of light. We showed that for a sound wave 
the displacements would propagate themselves at a certain speed. But the excess 
pressure also propagates at a certain speed, and so does the excess density. So we 
should expect that the pressure would satisfy the same equation, as indeed it does. 
We shall leave it to the reader to prove that it does. Hint: p e is proportional to the 
rate of change of x with respect to x. Therefore if we differentiate the wave equation 
with respect to x, we will immediately discover that dx/dx satisfies the same equa- 
tion. That is to say, p e satisfies the same equation. But P e is proportional to p e , 
and therefore P e does too. So the pressure, the displacements, everything, satisfy 
the same wave equation. 

Usually one sees the wave equation for sound written in terms of pressure 
instead of in terms of displacement, because the pressure is a scalar and has no 
direction. But the displacement is a vector and has direction, and it is thus easier 
to analyze the pressure. 

The next matter we discuss has to do with the wave equation in three dimen- 
sions. We know that the sound wave solution in one dimension is e ^ at ~ kx \ with 
co = kc sy but we also know that in three dimensions a wave would be repre- 
sented by where, in this case, co 2 = k 2 c 2 , which is, of course, 

(k 2 x + kl + k 2 )c 2 . Now what we want to do is to guess what the correct wave equa- 
tion in three dimensions is. Naturally, for the case of sound this can be deduced by 
going through the same dynamic argument in three dimensions that we made in 
one dimension. But we shall not do that; instead we just write down what comes 
out: the equation for the pressure (or displacement, or anything) is 

d 2 P e , d 2 P e , d 2 P e __ 1 d 2 P e 7 ^ 

dx 2 + dy 2 + dz 2 c 2 dt 2 * 1 * } 


That this is true can be verified by substituting in e i(w(_k * r) . Clearly, every time we 
differentiate with respect to x, we multiply by —ik x . If we differentiate twice, it 
is equivalent to multiplying by -k 2 x , so the first term would become - k 2 x P e , for 
that wave. Similarly, the second term becomes — k 2 P e , and the third term becomes 
— k 2 z P e . On the right, we get — (c o 2 /c 2 )P e . Then, if we take away the P e * s and 
change the sign, we see that the relationship between k and co is the one that we want. 

Working backwards again, we cannot resist writing down the grand equation 
which corresponds to the dispersion equation (48.22) for quantum-mechanical 
waves. If represents the amplitude for finding a particle at position x, y, z, at 
the time t, then the great equation of quantum mechanics for free particles is this: 


= mV 

dx 2 ~dy 2 dz 2 c 2 dt 2 h 2 


(48.24) 


First of all, the relativity character of this expression is suggested by the appearance 
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of *, y, z and t in the nice combination relativity usually involves. Second, it is a 
wave equation which, if we try a plane wave, would produce as a consequence that 
— k 2 + o) 2 /c 2 = m 2 c 2 jh 2 y which is the right relationship for quantum mechanics. 
There is still another great thing contained in the wave equation: the fact that any 
superposition of waves is also a solution. So this equation contains all of the 
quantum mechanics and the relativity that we have been discussing so far, at least 
so long as it deals with a single particle in empty space with no external potentials 
or forces on it! 


48-7 Normal modes 

Now we turn to another example of the phenomenon of beats which is rather 
curious and a little different. Imagine two equal pendulums which have, between 
them, a rather weak spring connection. They are made as nearly as possible the 
same length. If we pull one aside and let go, it moves back and forth, and it pulls 
on the connecting spring as it moves back and forth, and so it really is a machine 
for generating a force which has the natural frequency of the other pendulum. 
Therefore, as a consequence of the theory of resonance, which we studied before, 
when we put a force on something at just the right frequency, it will drive it. So, 
sure enough, one pendulum moving back and forth drives the other. However, in 
this circumstance there is a new thing happening, because the total energy of the 
system is finite, so when one pendulum pours its energy into the other to drive it, 
it finds itself gradually losing energy, until, if the timing is just right along with the 
speed, it loses all its energy and is reduced to a stationary condition! Then, of 
course, it is the other pendulum ball that has all the energy and the first one which 
has none, and as time goes on we see that it works also in the opposite direction, 
and that the energy is passed back into the first ball; this is a very interesting and 
amusing phenomenon. We said, however, that this is related to the theory of 
beats, and we must now explain how we can analyze this motion from the point of 
view of the theory of beats. 

We note that the motion of either of the two balls is an oscillation which has 
an amplitude which changes cyclically. Therefore the motion of one of the balls 
is presumably analyzable in a different way, in that it is the sum of two oscillations, 
present at the same time but having two slightly different frequencies. Therefore it 
ought to be possible to find two other motions in this system, and to claim that 
what we saw was a superposition of the two solutions, because this is of course a 
linear system. Indeed, it is easy to find two ways that we could start the motion, 
each one of which is a perfect, single-frequency motion — absolutely periodic. 
The motion that we started with before was not strictly periodic, since it did not 
last; soon one ball was passing energy to the other and so changing its amplitude; 
but there are ways of starting the motion so that nothing changes and, of course, 
as soon as we see it we understand why. For example, if we made both pendulums 
go together, then, since they are of the same length and the spring is not then doing 
anything, they will of course continue to swing like that for all time, assuming no 
friction and that everything is perfect. On the other hand, there is another possible 
motion which also has a definite frequency: that is, if we move the pendulums 
oppositely, pulling them aside exactly equal distances, then again they would be 
in absolutely periodic motion. We can appreciate that the spring just adds a little 
to the restoring force that the gravity supplies, that is all, and the system just keeps 
oscillating at a slightly higher frequency than in the first case. Why higher? 
Because the spring is pulling, in addition to the gravitation, and it makes the system 
a little “stiffer,” so that the frequency of this motion is just a shade higher than that 
of the other. 

Thus this system has two ways in which it can oscillate with unchanging 
amplitude: it can either oscillate in a manner in which both pendulums go the 
same way and oscillate all the time at one frequency, or they could go in opposite 
directions at a slightly higher frequency. 

Now the actual motion of the thing, because the system is linear, can be 
represented as a superposition of the two. (The subject of this chapter, remember, 
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is the effects of adding two motions with different frequencies.) So think what 
would happen if we combined these two solutions. If at t = 0 the two motions 
are started with equal amplitude and in the same phase, the sum of the two motions 
means that one ball, having been impressed one way by the first motion and the 
other way by the second motion, is at zero, while the other ball, having been dis- 
placed the same way in both motions, has a large amplitude. As time goes on, 
however, the two basic motions proceed independently, so the phase of one relative 
to the other is slowly shifting. That means, then, that after a sufficiently long time, 
when the time is enough that one motion could have gone “900^” oscillations, 
while the other went only “900,” the relative phase would be just reversed with 
respect to what it was before. That is, the large-amplitude motion will have fallen 
to zero, and in the meantime, of course, the initially motionless ball will have 
attained full strength! 

So we see that we could analyze this complicated motion either by the idea 
that there is a resonance and that one passes energy to the other, or else by the 
superposition of two constant-amplitude motions at two different frequencies. 
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49 


Modes 


49-1 The reflection of waves 

This chapter will consider some of the remarkable phenomena which are a 
result of confining waves in some finite region. We will be led first to discover a 
few particular facts about vibrating strings, for example, and then the generaliza- 
tion of these facts will give us a principle which is probably the most far-reaching 
principle of mathematical physics. 

Our first example of confining waves will be to confine a wave at one boundary. 
Let us take the simple example of a one-dimensional wave on a string. One could 
equally well consider sound in one dimension against a wall, or other situations 
of a similar nature, but the example of a string will be sufficient for our present 
purposes. Suppose that the string is held at one end, for example by fastening it 
to an “infinitely solid” wall. This can be expressed mathematically by saying that 
the displacement y of the string at the position x — 0 must be zero, because the 
end does not move. Now if it were not for the wall, we know that the general 
solution for the motion is the sum of two functions, F(x — ct) and G(x + ct ), 
the first representing a wave travelling one way in the string, and the second a 
wave travelling the other way in the string: 

y = F(x - ct ) + G(x + ct) (49.1) 

is the general solution for any string. But we have next to satisfy the condition 
that the string does not move at one end. If we put x = 0 in Eq. (49.1) and ex- 
amine y for any value of t , we get y = F(—ct) + G{+ct). Now if this is to be 

zero for all times, it means that the function G{ct) must be — F(— ct). In other 

words, G of anything must be — F of minus that same thing. If this result is put 
back into Eq. (49.1), we find that the solution for the problem is 

y — F(x — ct) — F(— x — ct). (49.2) 

It is easy to check that we will get y = 0 if we set x — 0. 

Figure 49-1 shows a wave travelling in the negative ^-direction near x = 0, 
and a hypothetical wave travelling in the other direction reversed in sign and on 
the other side of the origin. We say hypothetical because, of course, there is no 
string to vibrate on that side of the origin. The total motion of the string is to be 
regarded as the sum of these two waves in the region of positive x. As they reach 
the origin, they will always cancel at x = 0, and finally the second (reflected) 
wave will be the only one to exist for positive x and it will, of course, be travelling 
in the opposite direction. These results are equivalent to the following statement: 
if a wave reaches the clamped end of a string, it will be reflected with a change in 
sign. Such a reflection can always be understood by imagining that what is coming 
to the end of the string comes out upside down from behind the wall. In short, if 
we assume that the string is infinite and that whenever we have a wave going one 
way we have another one going the other way with the stated symmetry, the dis- 
placement at x = 0 will always be zero and it would make no difference if we 
clamped the string there. 

The next point to be discussed is the reflection of a periodic wave. Suppose 
that the wave represented by F(x — ct) is a sine wave and has been reflected ; then 
the reflected wave — F(— x — ct) is also a sine wave of the same frequency, but 
travelling in the opposite direction. This situation can be most simply 
described by using the complex function notation: F(x — ct) — e l0>it ~~ x!c) and 
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Fig. 49-1. Reflection of a wave as a 
superposition of two travelling waves. 
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F{—x — ct) = e lc ° a+x/c) . It can be seen that if these are substituted in (49.2) 
and if x is set equal to 0, then y = 0 for all values of t, so it satisfies the necessary 
condition. Because of the properties of exponentials, this can be written in a 
simpler form: 

y = e itat (e^ x/c - e i(ax/c ) = -2ie i<at sin (cox/c). (49.3) 

There is something interesting and new here, in that this solution tells us that 
if we look at any fixed x, the string oscillates at frequency co. No matter where this 
point is, the frequency is the same! But there are some places, in particular wher- 
ever sin (a )x/c) — 0, where there is no displacement at all. Furthermore, if at 
any time t we take a snapshot of the vibrating string, the picture will be a sine wave. 
However, the displacement of this sine wave will depend upon the time t. From 
inspection of Eq. (49.3) we can see that the length of one cycle of the sine wave is 
equal to the wavelength of either of the superimposed waves : 

X = 2ttc/co. (49.4) 

The points where there is no motion satisfy the condition sin (co x/c) = 0, which 
means that (wx/ c ) = 0, 7r, 27t, . . . , nir . . . These points are called nodes . Between 
any two successive nodes, every point moves up and down sinusoidally, but the 
pattern of motion stays fixed in space. This is the fundamental characteristic of 
what we call a mode . If one can find a pattern of motion which has the property 
that at any point the object moves perfectly sinusoidally, and that all points move 
at the same frequency (though some will move more than others), then we have 
what is called a mode. 


49-2 Confined waves, with natural frequencies 

The next interesting problem is to consider what happens if the string is held 
at both ends, say at x = 0 and x = L. We can begin with the idea of the reflection 
of waves, starting with some kind of a bump moving in one direction. As time goes 
on, we would expect the bump to get near one end, and as time goes still further it 
will become a kind of little wobble, because it is combining with the reversed- 
image bump which is coming from the other side. Finally the original bump will 
disappear and the image bump will move in the other direction to repeat the process 
at the other end. This problem has an easy solution, but an interesting question 
is whether we can have a sinusoidal motion (the solution just described is periodic , 
but of course it is not sinusoidally periodic). Let us try to put a sinusoidally periodic 
wave on a string. If the string is tied at one end, we know it must look like our 
earlier solution (49.3). If it is tied at the other end, it has to look the same at the 
other end. So the only possibility for periodic sinusoidal motion is that the sine 
wave must neatly fit into the string length. If it does not fit into the string length, 
then it is not a natural frequency at which the string can continue to oscillate. In 
short, if the string is started with a sine wave shape that just fits in, then it will 
continue to keep that perfect shape of a sine wave and will oscillate harmonically 
at some frequency. 

Mathematically, we can write sin kx for the shape, where k is equal to the 
factor (co/c) in Eqs. (49.3) and (49.4), and this function will be zero at x = 0. 
However, it must also be zero at the other end. The significance of this is that k 
is no longer arbitrary, as was the case for the half-open string. With the string 
closed at both ends, the only possibility is that sin (kL) = 0, because this is the 
only condition that will keep both ends fixed. Now in order for a sine to be zero, 
the angle must be either 0, 7r, 2t, or some other integral multiple of 7 r. The equation 

kL = mr (49.5) 

will, therefore, give any one of the possible k s, depending on what integer is put in. 
For each of the ks there is a certain frequency co, which, according to (49.3), is 
simply 
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co — kc — mrc/L. 


(49.6) 



So we have found the following: that a string has a property that it can have 
sinusoidal motions, but only at certain frequencies . This is the most important 
characteristic of confined waves. No matter how complicated the system is, it 
always turns out that there are some patterns of motion which have a perfect 
sinusoidal time dependence, but with frequencies that are a property of the particu- 
lar system and the nature of its boundaries. In the case of the string we have many 
different possible frequencies, each one, by definition, corresponding to a mode, 
because a mode is a pattern of motion which repeats itself sinusoidally. Figure 
49-2 shows the first three modes for a string. For the first mode the wavelength 
X is 2L. This can be seen if one continues the wave out to x = 2L to obtain one 
complete cycle of the sine wave. The angular frequency co is 2wc divided by the 
wavelength, in general, and in this case, since X is 2 L, the frequency is ttc/L , which 
is in agreement with (49.6) with n = 1. Let us call the first mode frequency co x . 
Now the next mode shows two loops with one node in the middle. For this mode 
the wavelength, then, is simply L . The corresponding value of k is twice as great 
and the frequency is twice as large; it is 2coi. For the third mode it is 3«i, and so on. 
So all the different frequencies of the string are multiples, 1, 2, 3, 4, and so on, of 
the lowest frequency w x . 

Returning now to the general motion of the string, it turns out that any possible 
motion can always be analyzed by asserting that more than one mode is operating 
at the same time. In fact, for general motion an infinite number of modes must 
be excited at the same time. To get some idea of this, let us illustrate what happens 
when there are two modes oscillating at the same time: Suppose that we have the 
first mode oscillating as shown by the sequence of pictures in Fig. 49-3, which 
illustrates the deflection of the string for equally spaced time intervals extending 
through half a cycle of the lowest frequency. 

Now, at the same time, we suppose that there is an oscillation of the second 
mode also. Figure 49-3 also shows a sequence of pictures of this mode, which at 
the start is 90° out of phase with the first mode. This means that at the start it has 
no displacement, but the two halves of the string have oppositely directed velocities. 
Now we recall a general principle relating to linear systems: if there are any two 
solutions, then their sum is also a solution. Therefore a third possible motion of 
the string would be a displacement obtained by adding the two solutions shown in 
Fig. 49-3. The result, also shown in the figure, begins to suggest the idea of a bump 
running back and forth between the ends of the string, although with only two 
modes we cannot make a very good picture of it; more modes are needed. This 
result is, in fact, a special case of a great principle for linear systems: 




X 
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Fig. 49-2. The first three modes of a 
vibrating string. 



Any motion at all can be analyzed by assuming that it is the sum of the motions pi g 49.3 Two modes combine to 

of all the different modes , combined with appropriate amplitudes and phases. g j ve a travelling wave. 

The importance of the principle derives from the fact that each mode is very 
simple — it is nothing but a sinusoidal motion in time. It is true that even the general 
motion of a string is not really very complicated, but there are other systems, for 
example the whipping of an airplane wing, in which the motion is much more 
complicated. Nevertheless, even with an airplane wing, we find there is a certain 
particular way of twisting which has one frequency and other ways of twisting 
that have other frequencies. If these modes can be found, then the complete 
motion can always be analyzed as a superposition of harmonic oscillations (except 
when the whipping is of such degree that the system can no longer be considered 
as linear). 


49-3 Modes in two dimensions 

The next example to be considered is the interesting situation of modes in two 
dimensions. Up to this point we have talked only about one-dimensional situations 
— a stretched string or sound waves in a tube. Ultimately we should consider three 
dimensions, but an easier step will be that to two dimensions. Consider for definite- 
ness a rectangular rubber drumhead which is confined so as to have no displace- 
ment anywhere on the rectangular edge, and let the dimensions of the rectangle 
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be a and b , as shown in Fig. 49—4. Now the question is, what are the characteristics 
of the possible motion? We can start with the same procedure used for the string. 
If we had no confinement at all, we would expect waves travelling along with some 
kind of wave motion. For example, would represent a sine wave 

travelling in some direction which depends on the relative values of k x and k v . 
Now how can we make the x-axis, that is, the line y = 0, a node? Using the ideas 
developed for the one-dimensional string, we can imagine another wave repre- 
sented by the complex function The superposition of these 

waves will give zero displacement at y = 0 regardless of the values of x and t. 
(Although these functions are defined for negative y where there is no drumhead 
to vibrate, this can be ignored, since the displacement is truly zero at y = 0.) 
In this case we can look upon the second function as the reflected wave. 

However, we want a nodal line at y = b as well as at y = 0. How do we do 
that? The solution is related to something we did when studying reflection from 
crystals. These waves which cancel each other at y = 0 will do the same at y = b 
only if 2b sin 0 is an integral multiple of X, where 0 is the angle shown in Fig. 49-4: 

nik = 2b sin 0, m = 0, 1, 2, . . . (49.7) 

Now in the same way we can make the y-axis a nodal line by adding two more 
functions ~{e^ l )(e +ik ^ +ik y y ) and + (e iat )(e +lk ^ ik y«), each representing a 
reflection of one of the other two waves from the x = 0 line. The condition for a 
nodal line at x = a is similar to the one for y = b. It is that 2 a cos 6 must also be 
an integral multiple of X: 

n\ = 2a cos 6. (49.8) 

Then the final result is that the waves bouncing about in the box produce a standing- 
wave pattern, that is, a definite mode. 

So we must satisfy the above two conditions if we are to have a mode. Let us 
first find the wavelength. This can be obtained by eliminating the angle 0 from 
(49.7) and (49.8) to obtain the wavelength in terms of a, b, n and m. The easiest 
way to do that is to divide both sides of the respective equations by 2b and 2a, 
square them, and add the two equations together. The result is sin 2 0 + cos 2 0 = 

1 = («X/2a) 2 + (m\/2b) 2 , which can be solved for X: 



Fig. 49-4. Vibrating rectangular plate. 


J_ _ m 2 

X2 ~ 4a2 + 4p ' 


(49.9) 


In this way we have determined the wavelength in terms of two integers, and from 
the wavelength we immediately get the frequency w, because, as we know, the 
frequency is equal to 2irc divided by the wavelength. 

This result is interesting and important enough that we should deduce it by a 
purely mathematical analysis instead of by an argument about the reflections. 
Let us represent the vibration by a superposition of four waves chosen so that the 
four lines x = 0, x = a, y = 0, and y = b are all nodes. In addition we shall 
require that all waves have the same frequency, so that the resulting motion will 
represent a mode. From our earlier treatment of light reflection we know that 
(e l0,t )(e~ xk z x+tk y v ) represents a wave travelling in the direction indicated in Fig. 
49-4. Equation (49.6), that is, k = oj/c, still holds, provided 

k 2 = kl + k 2 y . (49.10) 

It is clear from the figure that k x = k cos 0 and k y = k sin 0. 

Now our equation for the displacement, say c/>, of the rectangular drumhead 
takes on the grand form 

0 = [ e iwt ][ e ( — lk x x +i k yy) _ e i + ikx x + ik yV) __ g( ~ifc z x—ik y y) g( + ik x x—ik y y)^ 

(49.11a) 

Although this looks rather a mess, the sum of these things now is not very hard. 
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Table 49-1 



The exponentials can be combined to give sine functions, so that the displacement 
turns out to be 


4> — [— 4 sin k x x sin k y y][e tu>t ]. 


(49.11b) 


In other words, it is a sinusoidal oscillation, all right, with a pattern that is also 
sinusoidal in both the x- and the ^-direction. Our boundary conditions are of 
course satisfied at x = 0 and y = 0. We also want <j> to be zero when x = a and 
when y = b. Therefore we have to put in two other conditions: k x a must be an 
integral multiple of 7 r, and k x b must be another integral multiple of 7 r. Since we 
have seen that k x = k cos d and k y = k sin 6 , we immediately get equations 
(49.7) and (49.8) and from these the final result (49.9). 

Now let us take as an example a rectangle whose width is twice the height. 
If we take a = 2b and use Eqs. (49.4) and (49.9), we can calculate the frequencies 
of all of the modes: 


7tA 2 4m 2 + n 2 
TJ 4 


(49.12) 


Table 49-1 lists a few of the simple modes and also shows their shape in a qualitative 
way. 

The most important point to be emphasized about this particular case is that 
the frequencies are not multiples of each other, nor are they multiples of any num- 
ber. The idea that the natural frequencies are harmonically related is not generally 
true. It is not true for a system of more than one dimension, nor is it true for one- 
dimensional systems which are more complicated than a string with uniform 
density and tension. A simple example of the latter is a hanging chain in which 
the tension is higher at the top than at the bottom. If such a chain is set in harmonic 
oscillation, there are various modes and frequencies, but the frequencies are not 
simple multiples of any number, nor are the mode shapes sinusoidal. 

The modes of more complicated systems are still more elaborate. For example, 
inside the mouth we have a cavity above the vocal cords, and by moving the 
tongue and the lips, and so forth, we make an open-ended pipe or a closed-ended 
pipe of different diameters and shapes; it is a terribly complicated resonator, but 
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it is a resonator nevertheless. Now when one talks with the vocal cords, they are 
made to produce some kind of tone. The tone is rather complicated and there are 
many sounds coming out, but the cavity of the mouth further modifies that tone 
because of the various resonant frequencies of the cavity. For instance, a singer 
can sing various vowels, a, or o, or oo, and so forth, at the same pitch, but they 
sound different because the various harmonics are in resonance in this cavity to 
different degrees. The very great importance of the resonant frequencies of a cavity 
in modifying the voice sounds can be demonstrated by a simple experiment. Since 
the speed of sound goes as the reciprocal of the square root of the density, the 
speed of sound may be varied by using different gases. If one uses helium instead 
of air, so that the density is lower, the speed of sound is much higher, and all the 
frequencies of a cavity will be raised. Consequently if one fills one’s lungs with 
helium before speaking, the character of his voice will be drastically altered even 
though the vocal cords may still be vibrating at the same frequency. 
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Fig. 49-5. Two coupled pendulums, 


49-4 Coupled pendulums 

Finally we should emphasize that not only do modes exist for complicated 
continuous systems, but also for very simple mechanical systems. A good example 
is the system of two coupled pendulums discussed in the preceding chapter. In 
that chapter it was shown that the motion could be analyzed as a superposition of 
two harmonic motions with different frequencies. So even this system can be an- 
alyzed in terms of harmonic motions or modes. The string has an infinite number of 
modes and the two-dimensional surface also has an infinite number of modes. In 
a sense it is a double infinity, if we know how to count infinities. But a simple 
mechanical thing which has only two degrees of freedom, and requires only two 
variables to describe it, has only two modes. 

Let us make a mathematical analysis of these two modes for the case where the 
pendulums are of equal length. Let the displacement of one be x, and the displace- 
ment of the other be y, as shown in Fig. 49-5. Without a spring, the force on the 
first mass is proportional to the displacement of that mass, because of gravity. 
There would be, if there were no spring, a certain natural frequency w 0 for this one 
alone. The equation of motion without a spring would be 


m 


d 2 x 
dt 2 


= — mwox. 


(49.13) 


The other pendulum would swing in the same way if there were no spring. In 
addition to the force of restoration due to gravitation, there is an additional force 
pulling the first mass. That force depends upon the excess distance of x over y and 
is proportional to that difference, so it is some constant which depends on the 
geometry, times (x — y). The same force in reverse sense acts on the second mass. 
The equations of motion that have to be solved are therefore 

m W = ~ mw ° x ~ k ( x ~ m 7i^ = ~ mo >oy ~ k(y - x). (49.14) 


In order to find a motion in which both of the masses move at the same fre- 
quency, we must determine how much each mass moves. In other words, 
pendulum x and pendulum y will oscillate at the same frequency, but their ampli- 
tudes must have certain values, A and B, whose relation is fixed. Let us try this 
solution: 

x = Ae ial , y = Be iut . (49.15) 

If these are substituted in Eqs. (49.14) and similar terms are collected, the results are 
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(49.16) 
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The equations as written have had the common factor e Uat removed and have been 
divided by m. 

Now we see that we have two equations for what looks like two unknowns. 
But there really are not two unknowns, because the whole size of the motion is 
something that we cannot determine from these equations. The above equations 
can determine only the ratio of A to B, but they must both give the same ratio . The 
necessity for both of these equations to be consistent is a requirement that the 
frequency be something very special. 

In this particular case this can be worked out rather easily. If the two equa- 
tions are multiplied together, the result is 
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(49.17) 


The term AB can be removed from both sides unless A and B are zero, which means 
there is no motion at all. If there is motion, then the other terms must be equal, 
giving a quadratic equation to solve. The result is that there are two possible 
frequencies : 
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Furthermore, if these values of frequency are substituted back into Eq. (49.16), 
we find that for the first frequency A = B, and for the second frequency A = — B. 
These are the “mode shapes/’ as can be readily verified by experiment. 

It is clear that in the first mode, where A — B, the spring is never stretched, 
and both masses oscillate at the frequency co 0j as though the spring were absent. 
In the other solution, where A — —B, the spring contributes a restoring force and 
raises the frequency. A more interesting case results if the pendulums have different 
lengths. The analysis is very similar to that given above, and is left as an exercise 
for the reader. 


49-5 Linear systems 

Now let us summarize the ideas discussed above, which are all aspects of what 
is probably the most general and wonderful principle of mathematical physics. If 
we have a linear system whose character is independent of the time, then the motion 
does not have to have any particular simplicity, and in fact may be exceedingly 
complex, but there are very special motions, usually a series of special motions, 
in which the whole pattern of motion varies exponentially with the time. For the 
vibrating systems that we are talking about now, the exponential is imaginary, and 
instead of saying “exponentially” we might prefer to say “sinusoidally” with time. 
However, one can be more general and say that the motions will vary exponentially 
with the time in very special modes, with very special shapes. The most general 
motion of the system can always be represented as a superposition of motions 
involving each of the different exponentials. 

This is worth stating again for the case of sinusoidal motion: a linear system 
need not be moving in a purely sinusoidal motion, i.e., at a definite single frequency, 
but no matter how it does move, this motion can be represented as a superposition 
of pure sinusoidal motions. The frequency of each of these motions is a character- 
istic of the system, and the pattern or waveform of each motion is also a character- 
istic of the system. The general motion in any such system can be characterized 
by giving the strength and the phase of each of these modes, and adding them all 
together. Another way of saying this is that any linear vibrating system is equivalent 
to a set of independent harmonic oscillators, with the natural frequencies corre- 
sponding to the modes. 

We conclude this chapter by remarking on the connection of modes with 
quantum mechanics. In quantum mechanics the vibrating object, or the thing that 
varies in space, is the amplitude of a probability function that gives the probability 
of finding an electron, or system of electrons, in a given configuration. This ampli- 
tude function can vary in space and time, and satisfies, in fact, a linear equation. 
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But in quantum mechanics there is a transformation, in that what we call frequency 
of the probability amplitude is equal, in the classical idea, to energy. Therefore 
we can translate the principle stated above to this case by taking the word frequency 
and replacing it with energy. It becomes something like this: a quantum-mechanical 
system, for example an atom, need not have a definite energy, just as a simple 
mechanical system does not have to have a definite frequency; but no matter how 
the system behaves, its behavior can always be represented as a superposition of 
states of definite energy. The energy of each state is a characteristic of the atom, 
and so is the pattern of amplitude which determines the probability of finding 
particles in different places. The general motion can be described by giving the 
amplitude of each of these different energy states. This is the origin of energy 
levels in quantum mechanics. Since quantum mechanics is represented by waves, 
in the circumstance in which the electron does not have enough energy to ulti- 
mately escape from the proton, they are confined waves. Like the confined waves 
of a string, there are definite frequencies for the solution of the wave equation for 
quantum mechanics. The quantum-mechanical interpretation is that these are 
definite energies. Therefore a quantum-mechanical system, because it is represented 
by waves, can have definite states of fixed energy; examples are the energy levels of 
various atoms. 
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Harmonies 


50-1 Musical tones 

Pythagoras is said to have discovered the fact that two similar strings under 
the same tension and differing only in length, when sounded together give an 
effect that is pleasant to the ear if the lengths of the strings are in the ratio of two 
small integers. If the lengths are as one is to two, they then correspond to the 
octave in music. If the lengths are as two is to three, they correspond to the in- 
terval between C and G, which is called a fifth. These intervals are generally 
accepted as “pleasant” sounding chords. 

Pythagoras was so impressed by this discovery that he made it the basis of a 
school — Pythagoreans they were called — which held mystic beliefs in the great 
powers of numbers. It was believed that something similar would be found out 
about the planets — or “spheres.” We sometimes hear the expression: “the music 
of the spheres.” The idea was that there would be some numerical relationships 
between the orbits of the planets or between other things in nature. People usually 
think that this is just a kind of superstition held by the Greeks. But is it so different 
from our own scientific interest in quantitative relationships? Pythagoras’ dis- 
covery was the first example, outside geometry, of any numerical relationship in 
nature. It must have been very surprising to suddenly discover that there was a 
fact of nature that involved a simple numerical relationship. Simple measurements 
of lengths gave a prediction about something which had no apparent connection to 
geometry — the production of pleasant sounds. This discovery led to the extension 
that perhaps a good tool for understanding nature would be arithmetic and mathe- 
matical analysis. The results of modern science justify that point of view. 

Pythagoras could only have made his discovery by making an experimental 
observation. Yet this important aspect does not seem to have impressed him. 
If it had, physics might have had a much earlier start. (It is always easy to look 
back at what someone else has done and to decide what he should have done!) 

We might remark on a third aspect of this very interesting discovery: that the 
discovery had to do with two notes that sound pleasant to the ear. We may question 
whether we are any better off than Pythagoras in understanding why only certain 
sounds are pleasant to our ear. The general theory of aesthetics is probably no 
further advanced now than in the time of Pythagoras. In this one discovery of the 
Greeks, there are the three aspects: experiment, mathematical relationships, and 
aesthetics. Physics has made great progress on only the first two parts. This 
chapter will deal with our present-day understanding of the discovery of Pythagoras. 

Among the sounds that we hear, there is one kind that we call noise. Noise 
corresponds to a sort of irregular vibration of the eardrum that is produced by the 
irregular vibration of some object in the neighborhood. If we make a diagram to 
indicate the pressure of the air on the eardrum (and, therefore, the displacement 
of the drum) as a function of time, the graph which corresponds to a noise might 
look like that shown in Fig. 50-1 (a). (Such a noise might correspond roughly to 
the sound of a stamped foot.) The sound of music has a different character. Music 
is characterized by the presence of more-or-less sustained tones — or musical 
“notes.” (Musical instruments may make noises as well!) The tone may last for a 
relatively short time, as when a key is pressed on a piano, or it may be sustained 
almost indefinitely, as when a flute player holds a long note. 

What is the special character of a musical note from the point of view of the 
pressure in the air? A musical note differs from a noise in that there is a periodicity 
in its graph. There is some uneven shape to the variation of the air pressure with 
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time, and the shape repeats itself over and over again. An example of a pressure- 
time function that would correspond to a musical note is shown in Fig. 50- 1(b). 

Musicians will usually speak of a musical tone in terms of three character- 
istics: the loudness, the pitch, and the “quality.” The “loudness” is found to 
correspond to the magnitude of the pressure changes. The “pitch” corresponds to 
the period of time for one repetition of the basic pressure function. (“Low” 
notes have longer periods than “high” notes.) The “quality” of a tone has to do 
with the differences we may still be able to hear between two notes of the same 
loudness and pitch. An oboe, a violin, or a soprano are still distinguishable even 
when they sound notes of the same pitch. The quality has to do with the structure 
of the repeating pattern. 

Let us consider, for a moment, the sound produced by a vibrating string. If 
we pluck the string, by pulling it to one side and releasing it, the subsequent motion 
will be determined by the motions of the waves we have produced. We know that 
these waves will travel in both directions, and will be reflected at the ends. They 
will slosh back and forth for a long time. No matter how complicated the wave is, 
however, it will repeat itself. The period of repetition is just the time T required 
for the wave to travel two full lengths of the string. For that is just the time re- 
quired for any wave, once started, to reflect off each end and return to its starting 
position, and be proceeding in the original direction. The time is the same for 
waves which start out in either direction. Each point on the string will, then, return 
to its starting position after one period, and again one period later, etc. The 
sound wave produced must also have the same repetition. We see why a plucked 
string produces a musical tone. 

50-2 The Fourier series 

We have discussed in the preceding chapter another way of looking at 
the motion of a vibrating system. We have seen that a string has various natural 
modes of oscillation, and that any particular kind of vibration that may be set 
up by the starting conditions can be thought of as a combination — in suitable 
proportions — of several of the natural modes, oscillating together. For a string 
we found that the normal modes of oscillation had the frequencies co 0 , 2co 0 , 
3a>o, . . . The most general motion of a plucked string, therefore, is composed of 
the sum of a sinusoidal oscillation at the fundamental frequency co 0) another at the 
second harmonic frequency 2co 0 , another at the third harmonic 3co 0 , etc. Now the 
fundamental mode repeats itself every period T x = 27r/co 0 . The second harmonic 
mode repeats itself every T 2 = 27t/2co 0 . It also repeats itself every T x = 2 T 2 , 
after two of its periods. Similarly, the third harmonic mode repeats itself after a 
time T i which is 3 of its periods. We see again why a plucked string repeats its 
whole pattern with a periodicity of TV It produces a musical tone. 

We have been talking about the motion of the string. But the sound, which is 
the motion of the air, is produced by the motion of the string, so its vibrations too 
must be composed of the same harmonics — though we are no longer thinking about 
the normal modes of the air. Also, the relative strength of the harmonics may be 
different in the air than in the string, particularly if the string is “coupled” to the 
air via a sounding board. The efficiency of the coupling to the air is different for 
different harmonics. 

If we let/(0 represent the air pressure as a function of time for a musical tone 
[such as that in Fig. 50— 1(b)], then we expect that /(/) can be written as the sum of 
a number of simple harmonic functions of time — like cos oot — for each of the various 
harmonic frequencies. If the period of the vibration is T, the fundamental angular 
frequency will be co = 2tt/T , and the harmonics will be 2co, 3co, etc. 

There is one slight complication. For each frequency we may expect that the 
starting phases will not necessarily be the same for all frequencies. We should, 
therefore, use functions like cos (cot + <t>). It is, however, simpler to use instead 
both the sine and cosine functions for each frequency. We recall that 

cos (cot + 0) = (cos <t> cos cot — sin <£ sin co/) (50.1) 
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and since <j> is a constant, any sinusoidal oscillation at the frequency co can be 
written as the sum of a term with cos co/ and another term with sin cot. 

We conclude, then, that any function /(/) that is periodic with the period T 
can be written mathematically as 


m = flo 

+ a i cos cot + bi sin co/ 

+ 02 cos 2 c o/ + & 2 sin 2 co/ 

+ 03 cos 3co/ + b$ sin 3co/ 

+ . . . + . . . (50.2) 

where co = 2w/T and the 0 ’s and b ’ s are numerical constants which tell us how 
much of each component oscillation is present in the oscillation /(/). We have 
added the “zero-frequency” term a 0 so that our formula will be completely general, 
although it is usually zero for a musical tone. It represents a shift of the average 
value (that is, the “zero” level) of the sound pressure. With it our formula can 
take care of any case. The equality of Eq. (50.2) is represented schematically in 
Fig. 50-2. (The amplitudes, a n and fe n , of the harmonic functions must be suitably 
chosen. They are shown schematically and without any particular scale in the 
figure.) The series (50.2) is called the Fourier series for /(/). 

We have said that any periodic function can be made up in this way. We 
should correct that and say that any sound wave, or any function we ordinarily 
encounter in physics, can be made up of such a sum. The mathematicians can 
invent functions which cannot be made up of simple harmonic functions — for 
instance, a function that has a “reverse twist” so that it has two values for some 
values of /! We need not worry about such functions here. 
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Fig. 50-2. Any periodic function f(t) 
is equal to a sum of simple harmonic 
functions. 


50-3 Quality and consonance 

Now we are able to describe what it is that determines the “quality” of a 
musical tone. It is the relative amounts of the various harmonics — the values of 
the 0 ’s and 5’s. A tone with only the first harmonic is a “pure” tone. A tone 
with many strong harmonics is a “rich” tone. A violin produces a different pro- 
portion of harmonics than does an oboe. 

We can “manufacture” various musical tones if we connect several “oscilla- 
tors” to a loudspeaker. (An oscillator usually produces a nearly pure simple 
harmonic function.) We should choose the frequencies of the oscillators to be co, 
2co, 3co, etc. Then by adjusting the volume control on each oscillator, we can add 
in any amount we wish of each harmonic — thereby producing tones of different 
quality. An electric organ works in much this way. The “keys” select the frequency 
of the fundamental oscillator and the “stops” are switches that control the relative 
proportions of the harmonics. By throwing these switches, the organ can be made 
to sound like a flute, or an oboe, or a violin. 

It is interesting that to produce such “artificial” tones we need only one oscilla- 
tor for each frequency — we do not need separate oscillators for the sine and cosine 
components. The ear is not very sensitive to the relative phases of the harmonics. 
It pays attention mainly to the total of the sine and cosine parts of each frequency. 
Our analysis is more accurate than is necessary to explain the subjective aspect of 
music. The response of a microphone or other physical instrument does depend 
on the phases, however, and our complete analysis may be needed to treat such 
cases. 

The “quality” of a spoken sound also determines the vowel sounds that we 
recognize in speech. The shape of the mouth determines the frequencies of the 
natural modes of vibration of the air in the mouth. Some of these modes are set 
into vibration by the sound waves from the vocal chords. In this way, the ampli- 
tudes of some of the harmonics of the sound are increased with respect to others. 
When we change the shape of our mouth, harmonics of different frequencies are 
given preference. These effects account for the difference between an “e-e-e” 
sound and an “a-a-a” sound. 
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We all know that a particular vowel sound — say “e-e-e” — still “sounds like” 
the same vowel whether we say (or sing) it at a high or a low pitch. From the 
mechanism we describe, we would expect that particular frequencies are emphasized 
when we shape our mouth for an “e-e-e,” and that they do not change as we change 
the pitch of our voice. So the relation of the important harmonics to the funda- 
mental — that is, the “quality” — changes as we change pitch. Apparently the mech- 
anism by which we recognize speech is not based on specific harmonic relation- 
ships. 

What should we say now about Pythagoras’ discovery? We understand that 
two similar strings with lengths in the ratio of 2 to 3 will have fundamental fre- 
quencies in the ratio 3 to 2. But why should they “sound pleasant” together? 
Perhaps we should take our clue from the frequencies of the harmonics. The 
second harmonic of the lower shorter string will have the same frequency as the 
third harmonic of the longer string. (It is easy to show — or to believe — that a 
plucked string produces strongly the several lowest harmonics.) 

Perhaps we should make the following rules. Notes sound consonant when 
they have harmonics with the same frequency. Notes sound dissonant if their upper 
harmonics have frequencies near to each other but far enough apart that there are 
rapid beats between the two. Why beats do not sound pleasant, and why unison 
of the upper harmonics does sound pleasant, is something that we do not know 
how to define or describe. We cannot say from this knowledge of what sounds 
good, what ought, for example, to smell good. In other words, our understanding 
of it is not anything more general than the statement that when they are in unison 
they sound good. It does not permit us to deduce anything more than the proper- 
ties of concordance in music. 

It is easy to check on the harmonic relationships we have described by some 
simple experiments with a piano. Let us label the 3 successive C’s near the middle 
of the keyboard by C, C', and C", and the G’s just above by G, G', and G". Then 
the fundamentals will have relative frequencies as follows: 

C -2 G - 3 

C' -4 G' - 6 

C"-8 G"-12 

These harmonic relationships can be demonstrated in the following way: Suppose 
we press C' slowly — so that it does not sound but we cause the damper to be 
lifted. If we then sound C, it will produce its own fundamental and some second 
harmonic. The second harmonic will set the strings of C' into vibration. If we 
now release C (keeping C' pressed) the damper will stop the vibration of the C 
strings, and we can hear (softly) the note C' as it dies away. In a similar way, the 
third harmonic of C can cause a vibration of G'. Or the sixth of C (now getting 
much weaker) can set up a vibration in the fundamental of G ". 

A somewhat different result is obtained if we press G quietly and then sound 
C'. The third harmonic of C' will correspond to the fourth harmonic of G, so 
only the fourth harmonic of G will be excited. We can hear (if we listen closely) 
the sound of G", which is two octaves above the G we have pressed! It is easy to 
think up many more combinations for this game. 

We may remark in passing that the major scale can be defined just by the 
condition that the three major chords (F-A-C); (C-E-G); and (G-B-D) each 
represent tone sequences with the frequency ratio (4: 5: 6). These ratios — plus 
the fact that octave (C-C', B-B', etc.) has the ratio 1 : 2 — determine the whole 
scale for the “ideal” case, or for what is called “just intonation.” Keyboard in- 
struments like the piano are not usually tuned in this manner, but a little “fudging” 
is done so that the frequencies are approximately correct for all possible starting 
tones. For this tuning, which is called “tempered,” the octave (still 1 : 2) is divided 
into 12 equal intervals for which the frequency ratio is (2) i; 12 . A fifth no longer 
has the frequency ratio 3/2, but 2 7/ 12 = 1.499, which is apparently close enough 
for most ears. 
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We have stated a rule for consonance in terms of the coincidence of harmonics. 
Is this coincidence perhaps the reason that two notes are consonant? One worker 
has claimed that two pure tones — tones carefully manufactured to be free of har- 
monics — do not give the sensations of consonance or dissonance as the relative 
frequencies are placed at or near the expected ratios. (Such experiments are difficult 
because it is difficult to manufacture pure tones, for reasons that we shall see later.) 
We cannot still be certain whether the ear is matching harmonics or doing arith- 
metic when we decide that we like a sound. 


50-4 The Fourier coefficients 

Let us return now to the idea that any note — that is, a periodic sound — can be 
represented by a suitable combination of harmonics. We would like to show how 
we can find out what amount of each harmonic is required. It is, of course, easy 
to compute /(0, using Eq. (50.2), if we are given all the coefficients a and b . The 
question now is, if we are given fit) how can we know what the coefficients of the 
various harmonic terms should be? (It is easy to make a cake from a recipe; but 
can we write down the recipe if we are given a cake?) 

Fourier discovered that it was not really very difficult. The term a 0 is certainly 
easy. We have already said that it is just the average value of f(t) over one period 
(from t = 0 to t = T). We can easily see that this is indeed so. The average value 
of a sine or cosine function over one period is zero. Over two, or three, or any whole 
number of periods, it is also zero. So the average value of all of the terms on the 
right-hand side of Eq. (50.2) is zero, except for a 0 . (Recall that we must choose 
co = 2 t t/T.) 

Now the average of a sum is the sum of the averages. So the average of fit) is 
just the average of a 0 . But a 0 is a constant , so its average is just the same as its 
value. Recalling the definition of an average, we have 

f T 

a ° = TJo mdL (50 ' 3) 

The other coefficients are only a little more difficult. To find them we can use 
a trick discovered by Fourier. Suppose we multiply both sides of Eq. (50.2) by 
some harmonic function— say by cos loot. We have then 

fit) * cos loot = a 0 * cos 7c ot 

+ a i cos cot • cos loot + bi sin cot * cos loot 
+ a 2 cos loot • cos loot + b 2 sin loot • cos loot 

+ a 7 cos loot * cos loot + b 7 sin loot • cos loot 
+ • • • + * * * (50.4) 

Now let us average both sides. The average of a o cos loot over the time T is pro- 
portional to the average of a cosine over 7 whole periods. But that is just zero. 
The average of almost all of the rest of the terms is also zero. Let us look at the 
a x term. We know, in general, that 

cos A cos B — J cos iA + B) + \ cos iA — B). (50.5) 

The a i term becomes 

Jo i (cos Scot + cos 6 cot), (50.6) 

We thus have two cosine terms, one with 8 full periods in T and the other with 6. 
They both average to zero. The average of the a\ term is therefore zero. 

For the a 2 term, we would find a 2 cos 9 cot and a 2 cos 5 cot, each of which also 
averages to zero. For the a Q term, we would find cos I6oot and cos (— 2cot). But 
cos (—2 cot) is the same as cos loot , so both of these have zero averages. It is clear 
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that all of the a terms will have a zero average except one. And that one is the 
a 7 term. For this one we have 

£a 7 ( cos 14wf + cos 0). (50.7) 

The cosine of zero is one, and its average, of course, is one. So we have the result 
that the average of all of the a terms of Eq. (50.4) equals %a 7 . 

The b terms are even easier. When we multiply by any cosine term like cos neat , 
we can show by the same method that all of the b terms have the average value zero. 

We see that Fourier’s “trick” has acted like a sieve. When we multiply by 
cos lost and average, all terms drop out except a 7 , and we find that 

Average [f(i ) * cos 7 ost] = a 7 / 2, (50.8) 

or 

f T 

a 7 = f(i) • cos lost dl. (50.9) 

We shall leave it for the reader to show that the coefficient b 7 can be obtained 
by multiplying Eq. (50.2) by sin lot and averaging both sides. The result is 

/ T 

( /(<) • sin lut dt. (50.10) 

Now what is true for 7 we expect is true for any integer. So we can summarize 
our proof and result in the following more elegant mathematical form. If m and 
n are integers other than zero, and if co = 2ir/T , then 


I. 

/ sin nut cos mut dt = 0. 
Jo 


(50.11) 
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II. 

/ cos nost cos mo)t dt ~ 
Jo 

> < 

(0 if n t* m . 

(50.12) 
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\T/2 if n — m. 


III. 

/ sin nwt sin most dt — 
Jo ) 




IV. 

00 

f(t) = flo + On COS nut 

00 

+ ^ b n sin nost. 

(50.13) 
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(50.14) 
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an = T J o ^ COS no)t dt * 


(50.15) 
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bn = T Jq ^ ’ Sin no)i dU 


(50.16) 


In earlier chapters it was convenient to use the exponential notation for repre- 
senting simple harmonic motion. Instead of cos cat we used Re e i<at , the real part 
of the exponential function. We have used cosine and sine functions in this 
chapter because it made the derivations perhaps a little clearer. Our final result of 
Eq. (50.13) can, however, be written in the compact form 

f(t) = Re £ a n e inut , (50.17) 

n— 0 

where a n is the complex number a n — ib n (with b 0 ~ 0). If we wish to use the 
same notation throughout, we can write also 

an = fL /{t)e ~ in “ tdt (n ~ 0- (5°. IS) 
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We now know how to “analyze” a periodic wave into its harmonic compon- 
ents. The procedure is called Fourier analysis, and the separate terms are called 
Fourier components. We have not shown, however, that once we find all of the 
Fourier components and add them together, we do indeed get back our fit). The 
mathematicians have shown, for a wide class of functions, in fact for all that are 
of interest to physicists, that if we can do the integrals we will get back /(/). There 
is one minor exception. If the function f{t) is discontinuous, i.e., if it jumps suddenly 
from one value to another, the Fourier sum will give a value at the breakpoint 
halfway between the upper and lower values at the discontinuity. So if we have the 
strange function fit) = 0, 0 < t < to, and fit) = 1 for t$ < t < T, the Fourier 
sum will give the right value everywhere except at to, where it will have the value ^ 
instead of 1. It is rather unphysical anyway to insist that a function should be 
zero up to to, but 1 right at t 0 . So perhaps we should make the “rule” for physicists 
that any discontinuous function (which can only be a simplification of a real 
physical function) should be defined with halfway values at the discontinuities. 
Then any such function — with any finite number of such jumps — as well as all 
other physically interesting functions, are given correctly by the Fourier sum. 

As an exercise, we suggest that the reader determine the Fourier series for 
the function shown in Fig. 50-3. Since the function cannot be written in an explicit 
algebraic form, you will not be able to do the integrals from zero to T in the usual 
way. The integrals are easy, however, if we separate them into two parts: the 
integral from zero to T/2 (over which f(t) = 1) and the integral from T/2 to T 
(over which /(*) = — 1). The result should be 

4 

fit) = — (sin o)t + J sin 3 cot + £ sin Sot + • •), (50.19) 
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Fig. 50-3. Square- wave function, 

f(t) = + 1 for 0 < f < t/2, 
f(t) = - 1 for T/2 < t < T. 


where o = 2i r/T. We thus find that our square wave (with the particular phase 
chosen) has only odd harmonics, and their amplitudes are in inverse proportion 
to their frequencies. 

Let us check that Eq. (50.19) does indeed give us back fit) for some value of /. 
Let us choose t = 7/4, or cot = 7t/2. We have 


3ir 1 . 5tt . 

sm T 5 sin T + 


r/.\ 4 / . 7T . 1 . 

f(t) = - ^ s,n - 2 - + 3 si 


•) 


(50.20) 

(50.21) 


The series* has the value ir/4, and we find that f(t) = 1. 


50-5 The energy theorem 

The energy in a wave is proportional to the square of its amplitude. For a 
wave of complex shape, the energy in one period will be proportional to J 0 / 2 (0 dt. 
We can also relate this energy to the Fourier coefficients. We write 


fT fT 

/ f\t)dt = 

Jo Jo 


a 0 + ^2 cin cos not + ^ b n sin not 


n = l 


n = l 


dt . 


(50.22) 


When we expand the square of the bracketed term we will get all possible cross 
terms, such as a 6 cos Sot * b 7 cos lot . We have shown above, however, [Eqs. 
(50.11) and (50.12)] that the integrals of all such terms over one period is zero. 


* The series can be evaluated in the following way. First we remark that 
J 0 [dx/{ 1 + x 2 )] ~ tan^ 1 *. Second, we expand the integrand in a series 1/(1 + x 2 ) - 
1 — x 2 + x* — x 6 + . . . We integrate the series term by term (from zero to x) to 
obtain tan" 1 x = 1 — x 3 /3 + x 5 /5 — x 7 /! + . . . Setting x - 1, we have the stated 
result, since tan" 1 1 = tt/4. 
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We have left only the square terms like a\ cos 2 Soot. The integral of any cosine 
squared or sine squared over one period is equal to T/2, so we get 

f T 

J ^ fO) dt — Ta% + 2 ( fl i + a 2 + • ■ • + b\ + b\ + • * •) 

= Ta% + ~ (a* + bl). (50.23) 

n=l 

This equation is called the “energy theorem,” and says that the total energy in a 
wave is just the sum of the energies in all of the Fourier components. For example, 
applying this theorem to the series (50.19), since [f(i) ] 2 = 1 we get 



so we learn that the sum of the squares of the reciprocals of the odd integers is 
7t 2 /8. In a similar way, by first obtaining the Fourier series for the function and 
using the energy theorem, we can prove that 1 + 1/2 4 + 1/3 4 + * • • is 7 t 4 / 90 , 
a result we needed in Chapter 45. 

50-6 Nonlinear responses 

Finally, in the theory of harmonics there is an important phenomenon which 
should be remarked upon because of its practical importance — that of nonlinear 
effects. In all the systems that we have been considering so far, we have supposed 
that everything was linear, that the responses to forces, say the displacements or 
the accelerations, were always proportional to the forces. Or that the currents in 
the circuits were proportional to the voltages, and so on. We now wish to consider 
cases where there is not a strict proportionality. We think, at the moment, of some 
device in which the response, which we will call x out at the time t , is determined 
by the input x in at the time t. For example, x in might be the force and x ou t might 
be the displacement. Or x in might be the current and x ou t the voltage. If the device 
is linear, we would have 

*out(0 = **in(0, (50.24) 

Fig. 50-4. Linear and nonlinear re- where K is a constant independent of t and of X[ n . Suppose, however, that the 
sponses. device is nearly, but not exactly, linear, so that we can write 

*out(0 = K[x ul (t) + «?„(/)], (50.25) 

where e is small in comparison with unity. Such linear and nonlinear responses are 
shown in the graphs of Fig. 50-4. 

Nonlinear responses have several important practical consequences. We 
shall discuss some of them now. First we consider what happens if we apply a 
pure tone at the input. We let x in = cos ait. If we plot x out as a function of time 
we get the solid curve shown in Fig. 50-5. The dashed curve gives, for comparison, 
the response of a linear system. We see that the output is no longer a cosine 
function. It is more peaked at the top and flatter at the bottom. We say that the 
output is distorted. We know, however, that such a wave is no longer a pure 
tone, that it will have harmonics. We can find what the harmonics are. Using 
Xi n = cos ait with Eq. (50.25), we have 

x out = K(cos ait + e cos 2 ait). (50.26) 

From the equality cos 2 0 = J(1 — cos 20), we have 

x ou t = K ^cos ait + - — | cos 2c ot^ * (50.27) 

The output has not only a component at the fundamental frequency, that was 
present at the input, but also has some of its second harmonic. There has also 



^ -'-"LINEAR v. 


Fig. 50-5. The response of a non- 
linear device to the input cos cot. A 
linear response is shown for comparison. 
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appeared at the output a constant term K(e/ 2), which corresponds to the shift of 
the average value, shown in Fig. 50-5. The process of producing a shift of the 
average value is called rectification. 

A nonlinear response will rectify and will produce harmonics of the frequencies 
at its input. Although the nonlinearity we assumed produced only second harmon- 
ics, nonlinearities of higher order— those which have terms like xf n and jc? n , for 
example— will produce harmonics higher than the second. 

Another effect which results from a nonlinear response is modulation . If our 
input function contains two (or more) pure tones, the output will have not only 
their harmonics, but still other frequency components. Let jt in = A cos co^ + 
B cos <*) 2 t, where now o)\ and co 2 are not intended to be in a harmonic relation. 
In addition to the linear term (which is K times the input) we shall have a compo- 
nent in the output given by 

Xout = Ke(A cos 0 >i/ + Bcoso) 2 t ) 2 (50.28) 

~ Ke (A 2 cos 2 a) 1 1 + B 2 cos 2 co 2 / + 2 AB cos 03 X t cos o> 2 t). (50.29) 

The first two terms in the parentheses of Eq. (50.29) are just those which gave 
the constant terms and second harmonic terms we found above. The last term is 
new. 

We can look at this new “cross term” AB cos o) X t cos c o 2 t in two ways. First, 
if the two frequencies are widely different (for example, if <a x is much greater than 
0 ) 2 ) we can consider that the cross term represents a cosine oscillation of varying 
amplitude. That is, we can think of the factors in this way: 

AB cos <v 1 t cos o) 2 t = C(t) coso) it, (50.30) 

with 

C(0 = AB cos w 2 t. (50.31) 

We say that the amplitude of cos coi is modulated with the frequency co 2 . 

Alternatively, we can write the cross term in another way: 

AB cos o)it cos a) 2 t = [cos (co x + c o 2 )t + cos(o) X - W 2 )/J. (50.32) 

We would now say that two new components have been produced, one at the sum 
frequency (o>i + co 2 ), another at the difference frequency (coi — o> 2 ). 

We have two different, but equivalent, ways of looking at the same result. 
In the special case that o>i ^>> co 2 , we can relate these two different views by re- 
marking that since (co x + 03 2 ) and (00 x — a > 2 ) are near to each other we would 
expect to observe beats between them. But these beats have just the effect of 
modulating the amplitude of the average frequency o) X by one-half the difference 
frequency 2 oj 2 - We see, then, why the two descriptions are equivalent. 

In summary, we have found that a nonlinear response produces several effects: 
rectification, generation of harmonics, and modulation, or the generation of 
components with sum and difference frequencies. 

We should notice that all these effects (Eq. 50.29) are proportional not only 
to the nonlinearity coefficient e, but also to the product of two amplitudes— either 
A 2 , B 2 , or AB. We expect these effects to be much more important for strong 
signals than for weak ones. 

The effects we have been describing have many practical applications. First, 
with regard to sound, it is believed that the ear is nonlinear. This is believed to 
account for the fact that with loud sounds we have the sensation that we hear 
harmonics and also sum and difference frequencies even if the sound waves contain 
only pure tones. 

The components which are used in sound-reproducing equipment — amplifiers, 
loudspeakers, etc. — always have some nonlinearity. They produce distortions in 
the sound they generate harmonics, etc. — which were not present in the original 
sound. These new components are heard by the ear and are apparently objection- 
able. It is for this reason that “Hi-Fi” equipment is designed to be as linear as 
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possible. (Why the nonlinearities of the ear are not “objectionable” in the same 
way, or how we even know that the nonlinearity is in the loudspeaker rather than 
in the ear is not clear!) 

Nonlinearities are quite necessary , and are, in fact, intentionally made large 
in certain parts of radio transmitting and receiving equipment. In an am trans- 
mitter the “voice” signal (with frequencies of some kilocycles per second) is 
combined with the “carrier” signal (with a frequency of some megacycles per 
second) in a nonlinear circuit called a modulator , to produce the modulated 
oscillation that is transmitted. In the receiver, the components of the received signal 
are fed to a nonlinear circuit which combines the sum and difference frequencies 
of the modulated carrier to generate again the voice signal. 

When we discussed the transmission of light, we assumed that the induced 
oscillations of charges were proportional to the electric field of the light — that the 
response was linear. That is indeed a very good approximation. It is only within 
the last few years that light sources have been devised (lasers) which produce an 
intensity of light strong enough so that nonlinear effects can be observed. It is 
now possible to generate harmonics of light frequencies. When a strong red light 
passes through a piece of glass, a little bit of blue light — second harmonic — 
comes out! 


50-10 



SI 


Waves 


51-1 Bow waves 


Although we have finished our quantitative analyses of waves, this added 
chapter on the subject is intended to give some appreciation, qualitatively, for 
various phenomena that are associated with waves, which are too complicated to 
analyze in detail here. Since we have been dealing with waves for several chapters, 
more properly the subject might be called “some of the more complex phenomena 
associated with waves.” 

The first topic to be discussed concerns the effects that are produced by a 
source of waves which is moving faster than the wave velocity, or the phase velocity. 
Let us first consider waves that have a definite velocity, like sound and light. If 
we have a source of sound which is moving faster than the speed of sound, then 
something like this happens: Suppose at a given moment a sound wave is generated 
from the source at point x x in Fig. 51-1 ; then, in the next moment, as the source 
moves to x 2 , the wave from x x expands by a radius r x smaller than the distance 
that the source moves; and, of course, another wave starts from x 2 . When the 
sound source has moved still farther, to x 3 , and a wave is starting there, the wave 
from x 2 has now expanded to r 2 , and the one from x x has expanded to r 3 . Of 
course the thing is done continuously, not in steps, and therefore, we have a series 
of wave circles with a common tangent line which goes through the center of the 
source. We see that instead of a source generating spherical waves, as it would if it 
were standing still, it generates a wavefront which forms a cone in three dimensions, 
or a pair of lines in two dimensions. The angle of the cone is very easy to figure 
out. In a given amount of time the source moves a distance, say x 3 — x x , pro- 
portional to v, the velocity of the source. In the meantime the wavefront has moved 
out a distance r 3 , proportional to c w , the speed of the wave. Therefore it is clear 
that the half-angle of opening has a sine equal to the ratio of the speed of the waves, 
divided by the speed of the source, and this sine has a solution only if c w is less 
than v, or the speed of the object is faster than the speed of the wave : 

sine = c f- (51.1) 

Incidentally, although we implied that it is necessary to have a source of 
sound, it turns out, very interestingly, that once the object is moving faster than the 
speed of sound, it will make sound. That is, it is not necessary that it have a certain 
tone vibrational character. Any object moving through a medium faster than the 
speed at which the medium carries waves will generate waves on each side, auto- 
matically, just from the motion itself. This is simple in the case of sound, but it 
also occurs in the case of light. At first one might think nothing can move faster 
than the speed of light. However, light in glass has a phase velocity less than the 
speed of light in a vacuum, and it is possible to shoot a charged particle of very 
high energy through a block of glass such that the particle velocity is close to the 
speed of light in a vacuum, while the speed of light in the glass may be only % 
the speed of light in the vacuum. A particle moving faster than the speed of light 
in the medium will produce a conical wave of light with its apex at the source, 
like the wave wake from a boat (which is from the same effect, as a matter of 
fact). By measuring the cone angle, we can determine the speed of the particle. 
This is used technically to determine the speeds of particles as one of the methods 
of determining their energy in high-energy research. The direction of the light is 
all that needs to be measured. 
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Fig. 51-1. The shock wave front lies 
on a cone with apex at the source and 
half-angle 6 = sin -1 v/c„. 
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Fig. 51-2. A shock wave induced in 
a gas by a projectile moving faster than 
sound. 

This light is sometimes called (Jerehkov radiation, because it was first observed 
by (Jerenkov. How intense this light should be was analyzed theoretically by Frank 
and Tamm. The 1958 Nobel Prize for physics was awarded jointly to all three for 
this work. 

The corresponding circumstances in the case of sound are illustrated in Fig. 
51-2, which is a photograph of an object moving through a gas at a speed greater 
than the speed of sound. The changes in pressure produce a change in refractive 
index, and with a suitable optical system the edges of the waves can be made 
visible. We see that the object moving faster than the speed of sound does, indeed, 
produce a conical wave. But closer inspection reveals that the surface is actually 
curved. It is straight asymptotically, but it is curved near the apex, and we have 
now to discuss how that can be, which brings us to the second topic of this chapter. 
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Fig. 51-3. Wavefront "snapshots" at 
successive instants in time. 


51-2 Shock waves 

Wave speed often depends on the amplitude, and in the case of sound the speed 
depends upon the amplitude in the following way. An object moving through the 
air has to move the air out of the way, so the disturbance produced in this case 
is some kind of a pressure step, with the pressure higher behind the wavefront than 
in the undisturbed region not yet reached by the wave (running along at the normal 
speed, say). But the air that is left behind, after the wavefront passes, has been 
compressed adiabatically, and therefore the temperature is increased. Now the 
speed of sound increases with the temperature, so the speed in the region behind 
the jump is faster than in the air in front. That means that any other disturbance 
that is made behind this step, say by a continuous pushing of the body, or any other 
disturbance, will ride faster than the front, the speed increasing with higher pressure. 
Figure 51-3 illustrates the situation, with some little bumps of pressure added to 
the pressure contour to aid visualization. We see that the higher pressure regions 
at the rear overtake the front as time goes on, until ultimately the compressional 
wave develops a sharp front. If the strength is very high, “ultimately” means 
right away ; if it is rather weak, it takes a long time ; it may be, in fact, that the sound 
is spreading and dying out before it has time to do this. 

The sounds we make in talking ate extremely weak relative te the atme^h^ffc 
pressure — only 1 part in a million or so. But for pressure changes of the order of 1 
atmosphere, the wave velocity increases by about twenty percent, and the wavefront 
sharpens up at a correspondingly high rate. In nature nothing happens infinitely 
rapidly, presumably, and what we call a “sharp” front has, actually, a very slight 
thickness; it is not infinitely steep. The distances over which it is varying are of the 
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order of one mean free path, in which the theory of the wave equation begins to 
fail because we did not consider the structure of the gas. 

Now, referring again to Fig. 51-2, we see that the curvature can be understood 
if we appreciate that the pressures near the apex are higher than they are farther 
back, and so the angle 6 is greater. That is, the curve is the result of the fact that the 
speed depends upon the strength of the wave. Therefore the wave from an atomic 
bomb explosion travels much faster than the speed of sound for a while, until it gets 
so far out that it is weakened to such an extent from spreading that the pressure 
bump is small compared with atmospheric pressure. The speed of the bump then 
approaches the speed of sound in the gas into which it is going. (Incidentally, it 
always turns out that the speed of the shock is higher than the speed of sound in 
the gas ahead, but is lower than the speed of sound in the gas behind. That is, 
impulses from the back will arrive at the front, but the front rides into the medium 
in which it is going faster than the normal speed of signals. So one cahnot tell, 
acoustically, that the shock is coming until it is too late. The light from the bomb 
arrives first, but one cannot tell that the shock is coming until it arrives, because 
there is no sound signal coming ahead of it.) 

This is a very interesting phenomenon, this piling up of waves, and the main 
point on which it depends is that after a wave is present, the speed of the resulting 
wave should be higher. Another example of the same phenomenon is the following. 
Consider water flowing in a long channel with finite width and finite depth. If a 
piston, or a wall across the channel, is moved along the channel fast enough, water 
piles up, like snow before a snow plow. Now suppose the situation is as shown in 
Fig. 51-4, with a sudden step in water height somewhere in the channel. It can be 
demonstrated that long waves in a channel travel faster in deeper water than they 
do in shallow water. Therefore any new bumps or irregularities in energy supplied 
by the piston run off forward and pile up at the front. Again, ultimately what we 
have is just water with a sharp front, theoretically. However, as Fig. 51-4 shows, 
there are complications. Pictured is a wave coming up a channel; the piston is at 
the far right end of the channel. At first it might have appeared like a well-behaved 
wave, as one might expect, but farther along the channel, it has become sharper and 
sharper until the events pictured occurred. There is a terrible churning at the 
surface, as the pieces of water fall down, but it is essentially a very sharp rise with 
no disturbance of the water ahead. 

Actually water is much more complicated than sound. However, just to illus- 
trate a point, we will try to analyze the speed of such a so-called bore , in a channel 
The point here is not that this is of any basic importance for our purposes — it is 
not a great generalization — it is only to illustrate that the laws of mechanics that 
we already know are capable of explaining the phenomenon. 

Imagine, for a moment, that the water does look something like Fig. 51-5(a), 
that water at the higher height h 2 is moving with a velocity v, and that the front is 
moving with velocity u into undisturbed water which is at height h x . We would like 
to determine the speed at which the front moves. In a time At a vertical plane 
initially at Xi moves a distance v At to x 2 , while the front of the wave has moved 
u At. 

Now we apply the equations of conservation of matter and momentum. First, 
the former: Per unit channel width, we see that the amount h 2 v At of matter that 
has moved past x\ (shown shaded) is compensated by the other shaded region, 
which amounts to ( h 2 - hi)u At. So, dividing by At, vh 2 = u[h 2 - hi). That 
does not yet give us enough, because although we have h 2 and h\, we do not know 
either u or v; we are trying to get both of them. 

Now the next step is to use conservation of momentum. We have not discussed 
the problems of water pressure, or anything in hydrodynamics, but it is clear any- 
way that the pressure of water at a given depth is just enough to hold up the column 
of water above it. Therefore the pressure of water is equal to p, the density of water, 
times g, times the depth below the surface. Since the pressure increases linearly 
with depth, the average pressure over the plane at Xi, say, is %pgh 2 , which is also 
the average force per unit width and per unit height pushing the plane toward x 2 . 
So we multiply by another h 2 to get the total force which is acting on the water 



Figure 51-4 



Fig. 51-5. Two cross sections of a 
bore in a channel, with (b) at an interval 
At later than (a). 
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pushing from the left. On the other hand, there is pressure in the water on the 
right also, exerting an opposite force on the region in question, which is, by the 
same kind of analysis, %pgh\. Now we must balance the forces against the rate of 
change of the momentum. Thus we have to figure out how much more momentum 
there is in situation (b) in Fig. 51-5 than there was in (a). We see that the additional 
mass that has acquired the speed v is just ph 2 u At — ph 2 v At (per unit width), and 
multiplying this by v gives the additional momentum to be equated to the impulse 
F At: 

{ph 2 u At — ph 2 v At)v = (%pgh\ — \pgh\) At . 

If we eliminate v from this equation by substituting vh 2 *=^ u(h 2 — h i), already 
found, and simplify, we get finally that u 2 = gh 2 (h x + h 2 )/2h x . 

If the height difference is very small, so that hi and h 2 are nearly equal, this 
says that the velocity = Vgh. As we will see later, that is only true provided the 
wavelength of the wave is longer than the depth of the channel. 

We could also do the analogous thing for sound waves — including the con- 
servation of internal energy, not the conservation of entropy, because the shock is 
irreversible. In fact, if one checks the conservation of energy in the bore problem, 
one finds that energy is not conserved. If the height difference is small, it is almost 
perfectly conserved, but as soon as the height difference becomes very appreciable, 
there is a net loss of energy. This is manifested as the falling water and the churning 
shown in Fig. 51-4. 

In shock waves there is a corresponding apparent loss of energy, from the 
point of view of adiabatic reactions. The energy in the sound wave, behind the 
shock, goes into heating of the gas after shock passes, corresponding to churning of 
the water in the bore. In working it out, three equations for the sound case turn 
out to be necessary for solution, and the temperature behind the shock is not the 
same as the temperature in front, as we have seen. 

If we try to make a bore that is upside down (h 2 < h i), then we find that the 
energy loss per second is negative. Since energy is not available from anywhere, 
that bore cannot then maintain itself; it is unstable. If we were to start a wave of 
that sort, it would flatten out, because the speed dependence on height that resulted 
in sharpening in the case we discussed would now have the opposite effect. 

51-3 Waves in solids 

The next kind of waves to be discussed are the more complicated waves in 
solids. We have already discussed sound waves in gas and in liquid, and there is a 
direct analog to a sound wave in a solid. If a sudden push is applied to a solid, it is 
compressed. It resists the compression, and a wave analogous to sound is started. 
However there is another kind of wave that is possible in a solid, and which is not 
possible in a fluid. If a solid is distorted by pushing it sideways (called shearing ), 
then it tries to pull itself back. That is by definition what distinguishes a solid from 
a liquid: if we distort a liquid (internally), hold it a minute so that it calms down, 
and then let go, it will stay that way, but if we take a solid and push it, like shearing 
a piece of “Jello,” and let it go, it flies back and starts a shear wave, travelling in 
the same way the compressions travel. In all cases, the shear wave speed is less 
than the speed of longitudinal waves. The shear waves are somewhat more anal- 
ogous, so far as their polarizations are concerned, to light waves. Sound has no 
polarization, it is just a pressure wave. Light has a characteristic orientation per- 
pendicular to its direction of travel. 

In a solid, the waves are of both kinds. First, there is a compression wave, 
analogous to sound, that runs at one speed. If the solid is not crystalline, then a 
shear wave polarized in any direction will propagate at a characteristic speed. 
(Of course all solids are crystalline, but if we use a block made up of microcrystals 
of all orientations, the crystal anisotropies average out.) 

Another interesting question concerning sound waves is the following: What 
happens if the wavelength in a solid gets shorter, and shorter, and shorter? How 
short can it get? It is interesting that it cannot get any shorter than the space 
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between the atoms, because if there is supposed to be a wave in which one point 
goes up and the next down, etc., the shortest possible wavelength is clearly the 
atom spacing. In terms of the modes of oscillation, we say that there are longi- 
tudinal modes, and transverse modes, long wave modes, short wave modes. As 
we consider wavelengths comparable to the spacing between the atoms, then the 
speeds are no longer constant ; there is a dispersion effect where the velocity is not 
independent of the wave number. But, ultimately, the highest mode of transverse 
waves would be that in which every atom is doing the opposite of neighboring 
atoms. 

Now from the point of view of atoms, the situation is like the two pendulums 
that we were talking about, for which there are two modes, one in which they both 
go together, and the other in which they go apart. It is possible to analyze the 
solid waves another way, in terms of a system of coupled harmonic oscillators, like 
an enormous number of pendulums, with the highest mode such that they oscillate 
oppositely, and lower modes with different relationships of the timing. 

The shortest wavelengths are so short that they are not usually available 
technically. However they are of great interest because, in the theory of thermo- 
dynamics of a solid, the heat properties of a solid, for example specific heats, can 
be analyzed in terms of the properties of the short sound waves. Going to the 
extreme of sound waves of ever shorter wavelength, one necessarily comes to 
the individual motions of the atoms; the two things are the same ultimately. 

A very interesting example of sound waves in a solid, both longitudinal and 
transverse, are the waves that are in the solid earth. Who makes the noises we do 
not know, but inside the earth, from time to time, there are earthquakes — some 
rock slides past some other rock. That is like a little noise. So waves like sound 
waves start out from such a source very much longer in wavelength than one usu- 
ally considers in sound waves, but still they are sound waves, and they travel around 
in the earth. The earth is not homogeneous, however, and the properties of pressure, 
density, compressibility, and so on, change with depth, and therefore the speed 
varies with depth. Then the waves do not travel in straight lines — there is a kind 
of index of refraction and they go in curves. The longitudinal waves and the 
transverse waves have different speeds, so there are different solutions for the differ- 
ent speeds. Therefore if we place a seismograph at some location and watch the 
way the thing jiggles after there has been an earthquake somewhere else, then we do 
not just get an irregular jiggling. We might get a jiggling, and a quieting down, and 
then another jiggling — what happens depends upon the location. If it were close 
enough, we would first receive longitudinal waves from the disturbance, and then, 
a few moments later, transverse waves, because they travel more slowly. By 
measuring the time difference between the two, we can tell how far away the earth- 
quake is, if we know enough about the speeds and composition of the interior 
regions involved. 

An example of the behavior pattern of waves in the earth is shown in Fig. 51-6. 
The two kinds of waves are represented by different symbols. If there were an earth- 
quake at the place marked “source/’ the transverse waves and longitudinal waves 
would arrive at different times at the station by the most direct routes, and there 
would also be reflections at discontinuities, resulting in other paths and times. It 
turns out that there is a core in the earth which does not carry transverse waves. 
If the station is opposite the source, transverse waves still arrive, but the timing is 
not right. What happens is that the transverse wave comes to the core, and when- 
ever the transverse waves come to a surface which is oblique, between two materials, 
two new waves are generated, one transverse and one longitudinal. But inside the 
core of the earth, a transverse wave is not propagated (or at least, there is no evi- 
dence for it, only for a longitudinal wave); it comes out again in both forms and 
comes to the station. 

It is from the behavior of these earthquake waves that it has been determined 
that transverse waves cannot be propagated within the inner circle. This means 
that the center of the earth is liquid in the sense that it cannot propagate transverse 
waves. The only way we know what is inside the earth is by studying earthquakes. 
So, by using a large number of observations of many earthquakes at different 
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Fig. 51-6. Schematic of the earth, 
showing paths of longitudinal and trans- 
verse sound waves. 



stations, the details have been worked out— the speed, the curves, etc. are all known. 
We know what the speeds of various kinds of waves are at every depth. Knowing 
that, therefore, it is possible to figure out what the normal modes of the earth are, 
because we know the speed of propagation of sound waves — in other words, the 
elastic properties of both kinds of waves at every depth. Suppose the earth were 
distorted into an ellipsoid and let go. It is just a matter of superposing waves 
travelling around in the ellipsoid to determine the period and shapes in a free mode. 
We have figured out that if there is a disturbance, there are a lot of modes, from the 
lowest, which is ellipsoidal, to higher modes with more structure. 



Fig. 51-7. Power versus frequency 
as detected at seismographs in Nana, 
Peru, and Isabella, California. The 
coherence is a measure of the coupling 
between the stations. [From Benioff, Press 
and Smith, J. Geoph. Research 66, 605 
(1961)]. 



Fig. 51-8. High-resolution analysis of 
one of the seismograph records, showing 
spectral doublet. 


The Chilean earthquake of May 1960 made a loud enough “noise” that the 
signals went around the earth many times, and new seismographs of great delicacy 
were made just in time to determine the frequencies of the fundamental modes of 
the earth and to compare them with the values that were calculated from the theory 
of sound with the known velocities, as measured from the independent earthquakes. 
The result of this experiment is illustrated in Fig. 51-7, which is a plot of the strength 
of the signal versus the frequency of its oscillation (a Fourier analysis). Note that 
at certain particular frequencies there is much more being received than at other 
frequencies; there are very definite maxima. These are the natural frequencies of 
the earth, because these are the main frequencies at which the earth can oscillate. 
In other words, if the entire motion of the earth is made up of many different modes, 
we would expect to obtain, for each station, irregular bumpings which indicate a 
superposition of many frequencies. If we analyze this in terms of frequencies, we 
should be able to find the characteristic frequencies of the earth. The vertical dark 
lines in the figure are the calculated frequencies, and we find a remarkable agree- 
ment, an agreement due to the fact that the theory of sound is right for the inside 
of the earth. 

A very curious point is revealed in Fig. 51-8, which shows a very careful 
measurement, with better resolution of the lowest mode, the ellipsoidal mode of 
the earth. Note that it is not a single maximum, but a double one, 54.7 minutes and 
53.1 minutes — slightly different. The reason for the two different frequencies was 
not known at the time that it was measured, although it may have been found in 
the meantime. There are at least two possible explanations: One would be that 
there may be asymmetry in the earth’s distribution, which would result in two simi- 
lar modes. Another possibility, which is even more interesting, is this: Imagine 
the waves going around the earth in two directions from the source. The speeds 
will not be equal because of effects of the rotation of the earth in the equations of 
motion, which have not been taken into account in making the analysis. Motion 
in a rotating system is modified by Coriolis forces, and these may cause the ob- 
served splitting. 

Regarding the method by which these quakes have been analyzed, what is 
obtained on the seismograph is not a curve of amplitude as a function of frequency, 
but displacement as a function of time, always a very irregular tracing. To find 
the amount of all the different sine waves for all different frequencies, we know that 
the trick is to multiply the data by a sine wave of a given frequency and integrate, 
i.e., average it, and in the average all other frequencies disappear. The figures 
were thus plots of the integrals found when the data were multiplied by sine 
waves of different cycles per minute, and integrated. 
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51-4 Surface waves 

Now, the next waves of interest, that are easily seen by everyone and which 
are usually used as an example of waves in elementary courses, are water waves. 
As we shall soon see, they are the worst possible example, because they are in no 
respects like sound and light; they have all the complications that waves can have. 
Let us start with long water waves in deep water. If the ocean is considered in- 
finitely deep and a disturbance is made on the surface, waves are generated. All 
kinds of irregular motions occur, but the sinusoidal type motion, with a very small 
disturbance, might look like the common smooth ocean waves coming in toward 
the shore. Now with such a wave, the water, of course, on the average, is standing 
still, but the wave moves. What is the motion, is it transverse or longitudinal? 
It must be neither ; it is not transverse, nor is it longitudinal. Although the water at 
a given place is alternately trough or hill, it cannot simply be moving up and down, 
by the conservation of water. That is, if it goes down, where is the water going 
to go? The water is essentially incompressible. The speed of compression of 
waves — that is, sound in the water — is much, much higher, and we are not con- 
sidering that now. Since water is incompressible on this scale, as a hill comes 
down the water must move away from the region. What actually happens is that 
particles of water near the surface move approximately in circles. When smooth 
swells are coming, a person floating in a tire can look at a nearby object and see 
it going in a circle. So it is a mixture of longitudinal and transverse, to add to the 
confusion. At greater depths in the water the motions are smaller circles until, 
reasonably far down, there is nothing left of the motion (Fig. 51-9). 


Fig. 51-9. Deep-water waves are 
formed from particles moving in circles. 
Note the systematic phase shifts from 
circle to circle. How would a floating 
object move? 



To find the velocity of such waves is an interesting problem: it must be some 
combination of the density of the water, the acceleration of gravity, which is the 
restoring force that makes the waves, and possibly of the wavelength and of the 
depth. If we take the case where the depth goes to infinity, it will no longer depend 
on the depth. Whatever formula we are going to get for the velocity of the phases 
of the waves must combine the various factors to make the proper dimensions, 
and if we try this in various ways, we find only one way to combine the density, 
g, and X in order to make a velocity, namely, VgX, which does not include the den- 
sity at all. Actually, this formula for the phase velocity is not exactly right, but 
a complete analysis of the dynamics, which we will not go into, shows that the 
factors are as we have them, except for \Z2tt: 

*> P hase = V g\/2ir (for gravity waves). 

It is interesting that the long waves go faster than the short waves. Thus if a boat 
makes waves far out, because there is some sports-car driver in a motorboat 
travelling by, then after a while the waves come to shore with slow sloshings at 
first and then more and more rapid sloshings, because the first waves that come are 
long. The waves get shorter and shorter as the time goes on, because the velocities 
go as the square root of the wavelength. 

One may object, “That is not right, we must look at the group velocity in order 
to figure it out!” Of course that is true. The formula for the phase velocity does 
not tell us what is going to arrive first; what tells us is the group velocity. So we 
have to work out the group velocity, and it is left as a problem to show it to be 
one-half of the phase velocity, assuming that the velocity goes as the square root 
of the wavelength, which is all that is needed. The group velocity also goes as the 
square root of the wavelength. How can the group velocity go half as fast as the 
phase? If one looks at the bunch of waves that are made by a boat travelling 
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Fig. 51-1 1. Phase velocity vs. wave- 
length for water. 


along, following a particular crest, he finds that it moves forward in the group 
and gradually gets weaker and dies out in the front, and mystically and mysteriously 
a weak one in the back works its way forward and gets stronger. In short, the waves 
are moving through the group while the group is only moving at half the speed 
that the waves are moving. 



Fig. 51-10. The wake of a boat. 


Because the group velocities and phase velocities are not equal, then the waves 
that are produced by an object moving through are no longer simply a cone, but 
it is much more interesting. We can see that in Fig. 51-10, which shows the waves 
produced by an object moving through the water. Note that it is quite different 
than what we would have for sound, in which the velocity is independent of wave- 
length, where we would have wavefronts only along the cone, travelling outward. 
Instead of that, we have waves in the back with fronts moving parallel to the motion 
of the boat, and then we have little waves on the sides at other angles. This entire 
pattern of waves can, with ingenuity, be analyzed by knowing only this: that the 
phase velocity is proportional to the square root of the wavelength. The trick is 
that the pattern of waves is stationary relative to the (constant-velocity) boat; any 
other pattern would get lost from the boat. 

The water waves that we have been considering so far were long waves in which 
the force of restoration is due to gravitation. But when waves get very short in 
the water, the main restoring force is capillary attraction, i.e., the energy of the 
surface, the surface tension. For surface tension waves, it turns out that the phase 
velocity is 

*Wse = V2irT/\p (for ripples), 

where T is the surface tension and p the density. It is the exact opposite: the phase 
velocity is higher , the shorter the wavelength, when the wavelength gets very small. 
When we have both gravity and capillary action, as we always do, we get the com- 
bination of these two together: 

Vphase = VTk/p + g/k, 

where k = 2ir/\ is the wave number. So the velocity of the waves of water is 
really quite complicated. The phase velocity as a function of the wavelength is 
shown in Fig. 51-11 ; for very short waves it is fast, for very long waves it is fast, 
and there is a minimum speed at which the waves can go. The group velocity can 
be calculated from the formula: it goes to f the phase velocity for ripples and ^ 
the phase velocity for gravity waves. To the left of the minimum the group velocity 
is higher than the phase velocity; to the right, the group velocity is less than the 
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phase velocity. There are a number of interesting phenomena associated with 
these facts. In the first place, since the group velocity is increasing so rapidly as 
the wavelength goes down, if we make a disturbance there will be a slowest end of 
the disturbance going at the minimum speed with the corresponding wavelength, 
and then in front, going at higher speed, will be a short wave and a very long wave. 
It is very hard to see the long ones, but it is easy to see the short ones in a water tank. 

So we see that the ripples often used to illustrate simple waves are quite inter- 
esting and complicated; they do not have a sharp wavefront at all, as is the case for 
simple waves like sound and light. The main wave has little ripples which run out 
ahead. A sharp disturbance in the water does not produce a sharp wave because 
of the dispersion. First come the very fine waves. Incidentally, if an object moves 
through the water at a certain speed, a rather complicated pattern results, because 
all the different waves are going at different speeds. One can demonstrate this with 
a tray of water and see that the fastest ones are the fine capillary waves. There are 
slowest waves, of a certain kind, which go behind. By inclining the bottom, one 
sees that where the depth is lower, the speed is lower. If a wave comes in at an 
angle to the line of maximum slope, it bends and tends to follow that line. In this 
way one can show various things, and we conclude that waves are more compli- 
cated in water than in air. 

The speed of long waves in water with circulational motions is slower when the 
depth is less, faster in deep water. Thus as water comes toward a beach where the 
depth lessens, the waves go slower. But where the water is deeper, the waves are 
faster, so we get the effects of shock waves. This time, since the wave is not so 
simple, the shocks are much more contorted, and the wave over-curves itself, in 
the familiar way shown in Fig. 51-12. This is what happens when waves come into 
the shore, and the real complexities in nature are well revealed in such a circum- 
sf.mce. No one has yet been able to figure out what shape the wave should take 
as it breaks. It is easy enough when the waves are small, but when one gets large 
and breaks, then it is much more complicated. 



Fig. 51-12. A water wave. 


An interesting feature about capillary waves can be seen in the disturbances 
made by an object moving through the water. From the point of view of the object 
itself, the water is flowing past, and the waves which ultimately sit around it are 
always the waves which have just the right speed to stay still with the object in the 
water. Similarly, around an object in a stream, with the stream flowing by, the 
pattern of waves is stationary, and at just the right wavelengths to go at the same 
speed as the water going by. But if the group velocity is less than the phase velocity, 
then the disturbances propagate out backwards in the stream, because the group 
velocity is not quite enough to keep up with the stream. If the group velocity is 
faster than the velocity of the phase, the pattern of waves will appear in front of 
the object. If one looks closely at objects in a stream, one can see that there are 
little ripples in front and long “slurps” in the back. 
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Another interesting feature of this sort can be observed in pouring liquids. 
If milk is poured fast enough out of a bottle, for instance, a large number of lines 
can be.seen crossing both ways in the outgoing stream. They are waves starting 
from the disturbance at the edges and running out, much like the waves about an 
object in a stream. There are effects from both sides which produce the crossed 
pattern. 

We have investigated some of the interesting properties of waves and the vari- 
ous complications of dependence of phase velocity on wavelength, the speed of 
the waves on depth, and so forth, that produce the really complex, and therefore 
interesting, phenomena of nature. 
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Symmetry in Physical Laws 


52-1 Symmetry operations 

The subject of this chapter is what we may call symmetry in physical laws . 
We have already discussed certain features of symmetry in physical laws in con- 
nection with vector analysis (Chapter 1 1), the theory of relativity (Chapter 16), and 
rotation (Chapter 20). 

Why should we be concerned with symmetry? In the first place, symmetry is 
fascinating to the human mind, and everyone likes objects or patterns that are in 
some way symmetrical. It is an interesting fact that nature often exhibits certain 
kinds of symmetry in the objects we find in the world around us. Perhaps the 
most symmetrical object imaginable is a sphere, and nature is full of spheres — 
stars, planets, water droplets in clouds. The crystals found in rocks exhibit many 
different kinds of symmetry, the study of which tells us some important things about 
the structure of solids. Even the animal and vegetable worlds show some degree of 
symmetry, although the symmetry of a flower or of a bee is not as perfect or as 
fundamental as is that of a crystal. 

But our main concern here is not with the fact that the objects of nature are 
often symmetrical. Rather, we wish to examine some of the even more remarkable 
symmetries of the universe — the symmetries that exist in the basic laws themselves 
which govern the operation of the physical world. 

First, what is symmetry? How can a physical law be “symmetrical”? The 
problem of defining symmetry is an interesting one and we have already noted that 
Weyl gave a good definition, the substance of which is that a thing is symmetrical 
if there is something we can do to it so that after we have done it, it looks the same 
as it did before. For example, a symmetrical vase is of such a kind that if we reflect 
or turn it, it will look the same as it did before. The question we wish to consider 
here is what we can do to physical phenomena, or to a physical situation in an 
experiment, and yet leave the result the same. A list of the known operations under 
which various physical phenomena remain invariant is shown in Table 52-1. 


52-2 Symmetry in space and time 

The first thing we might try to do, for example, is to translate the phenomenon 
in space. If we do an experiment in a certain region, and then build another ap- 
paratus at another place in space (or move the original one over) then, whatever 
went on in one apparatus, in a certain order in time, will occur in the same way if 
we have arranged the same condition, with all due attention to the restrictions that 
we mentioned before: that all of those features of the environment which make it 
not behave the same way have also been moved over — we talked about how to 
define how much we should include in those circumstances, and we shall not go 
into those details again. 

In the same way, we also believe today that displacement in time will have no 
effect on physical laws. (That is, as far as we know today — all of these things are 
as far as we know today!) That means that if we build a certain apparatus and start 
it at a certain time, say on Thursday at 10:00 a.m., and then build the same appara- 
tus and start it, say, three days later in the same condition, the two apparatuses will 
go through the same motions in exactly the same way as a function of time no 
matter what the starting time, provided again, of course, that the relevant features 
of the environment are also modified appropriately in time . That symmetry means, 
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laws 
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52-5 Polar and axial vectors 
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52-7 Parity is not conserved! 

52-8 Antimatter 
52-9 Broken symmetries 


Table 52-1 
Symmetry Operations 

Translation in space 
Translation in time 
Rotation through a fixed angle 

Uniform velocity in a straight 
line (Lorentz transformation) 

Reversal of time 
Reflection of space 

Interchange of identical atoms 
or identical particles 

Quantum-mechanical phase 
Matter-antimatter (charge conjugation) 



of course, that if one bought General Motors stock three months ago, the same 
thing would happen to it if he bought it now! 

We have to watch out for geographical differences too, for there are, of 
course, variations in the characteristics of the earth’s surface. So, for example, 
if we measure the magnetic field in a certain region and move the apparatus to 
some other region, it may not work in precisely the same way because the magnetic 
field is different, but we say that is because the magnetic field is associated with the 
earth. We can imagine that if we move the whole earth and the equipment, it would 
make no difference in the operation of the apparatus. 

Another thing that we discussed in considerable detail was rotation in space: 
if we turn an apparatus at an angle it works just as well, provided we turn every- 
thing else that is relevant along with it. In fact, we discussed the problem of sym- 
metry under rotation in space in some detail in Chapter 11, and we invented a 
mathematical system called vector analysis to handle it as neatly as possible. 

On a more advanced level we had another symmetry — the symmetry under 
uniform velocity in a straight line. That is to say — a rather remarkable effect — that 
if we have a piece of apparatus working a certain way and then take the same ap- 
paratus and put it in a car, and move the whole car, plus all the relevant surround- 
ings, at a uniform velocity in a straight line, then so far as the phenomena inside 
the car are concerned there is no difference: all the laws of physics appear the same. 
We even know how to express this more technically, and that is that the mathe- 
matical equations of the physical laws must be unchanged under a Lorentz trans- 
formation . As a matter of fact, it was a study of the relativity problem that concen- 
trated physicists’ attention most sharply on symmetry in physical laws. 

Now the above-mentioned symmetries have all been of a geometrical nature, 
time and space being more or less the same, but there are other symmetries of a 
different kind. For example, there is a symmetry which describes the fact that we 
can replace one atom by another of the same kind ; to put it differently, there are 
atoms of the same kind. It is possible to find groups of atoms such that if we change 
a pair around, it makes no difference — the atoms are identical. Whatever one 
atom of oxygen of a certain type will do, another atom of oxygen of that type will 
do. One may say, “That is ridiculous, that is the definition of equal types!” That 
may be merely the definition, but then we still do not know whether there are any 
“atoms of the same type”; the fact is that there are many, many atoms of the same 
type. Thus it does mean something to say that it makes no difference if we replace 
one atom by another of the same type. The so-called elementary particles of which 
the atoms are made are also identical particles in the above sense — all electrons 
are the same; all protons are the same; all positive pions are the same; and 
so on. 

After such a long list of things that can be done without changing the phe- 
nomena, one might think we could do practically anything; so let us give some 
examples to the contrary, just to see the difference. Suppose that we ask: “Are 
the physical laws symmetrical under a change of scale?” Suppose we build a 
certain piece of apparatus, and then build another apparatus five times bigger in 
every part, will it work exactly the same way? The answer is, in this case, no\ 
The wavelength of light emitted, for example, by the atoms inside one box of 
sodium atoms and the wavelength of light emitted by a gas of sodium atoms five 
times in volume is not five times longer, but is. in fact exactly the same as the other. 
So the ratio of the wavelength to the size of the emitter will change. 

Another example: we see in the newspaper, every once in a while pictures of a 
great cathedral made with little matchsticks — a tremendous work of art by some 
retired fellow who keeps gluing matchsticks together. It is much more elaborate 
and wonderful than any real cathedral. If we imagine that this wooden cathedral 
were actually built on the scale of a real cathedral, we see where the trouble is; 
it would not last — the whole thing would collapse because of the fact that scaled-up 
matchsticks are just not strong enough. “Yes,” one might say, “but we also know 
that when there is an influence from the outside, it also must be changed in pro- 
portion!” We are talking about the ability of the object to withstand gravitation. 
So what we should do is first to take the model cathedral of real matchsticks and 
52-2 



the real earth, and then we know it is stable. Then we should take the larger cathe- 
dral and take a bigger earth. But then it is even worse, because the gravitation is 
increased still more! 

Today, of course, we understand the fact that phenomena depend on the scale 
on the grounds that matter is atomic in nature, and certainly if we built an appara- 
tus that was so small there were only five atoms in it, it would clearly be something 
we could not scale up and down arbitrarily. The scale of an individual atom is not 
at all arbitrary — it is quite definite. 

The fact that the laws of physics are not unchanged under a change of scale 
was discovered by Galileo. He realized that the strengths of materials were not in 
exactly the right proportion to their sizes, and he illustrated this property that we 
were just discussing, about the cathedral of matchsticks, by drawing two bones, 
the bone of one dog, in the right proportion for holding up his weight, and the 
imaginary bone of a “super dog” that would be, say, ten or a hundred times 
bigger — that bone was a big, solid thing with quite different proportions. We do 
not know whether he ever carried the argument quite to the conclusion that the 
laws of nature must have a definite scale, but he was so impressed with this dis- 
covery that he considered it to be as important as the discovery of the laws of 
motion, because he published them both in the same volume, called “On Two 
New Sciences.” 

Another example in which the laws are not symmetrical, that we know quite 
well, is this : a system in rotation at a uniform angular velocity does not give the 
same apparent laws as one that is not rotating. If we make an experiment and 
then put everything in a space ship and have the space ship spinning in empty space, 
all alone at a constant angular velocity, the apparatus will not work the same way 
because, as we know, things inside the equipment will be thrown to the outside, 
and so on, by the centrifugal or coriolis forces, etc. In fact, we can tell that the earth 
is rotating by using a so-called Foucault pendulum, without looking outside. 

Next we mention a very interesting symmetry which is obviously false, i.e., 
reversibility in time . The physical laws apparently cannot be reversible in time, 
because, as we know, all obvious phenomena are irreversible on a large scale: 
“The moving finger writes, and having writ, moves on.” So far as we can tell, this 
irreversibility is due to the very large number of particles involved, and if we could 
see the individual molecules, we would not be able to discern whether the machinery 
was working forward or backwards. To make it more precise: we build a small 
apparatus in which we know what all the atoms are doing, in which we can watch 
them jiggling. Now we build another apparatus like it, but which starts its motion 
in the final condition of the other one, with all the velocities precisely reversed. 
It will then go through the same motions , but exactly in reverse . Putting it another 
way: if we take a motion picture, with sufficient detail, of all the inner works of a 
piece of material and shine it on a screen and run it backwards, no physicist will 
be able to say, “That is against the laws of physics, that is doing something wrong!” 
If we do not see all the details, of course, the situation will be perfectly clear. If 
we see the egg splattering on the sidewalk and the shell cracking open, and so on, 
then we will surely say, “That is irreversible, because if we run the moving picture 
backwards the egg will all collect together and the shell will go back together, and 
that is obviously ridiculous!” But if we look at the individual atoms themselves, 
the laws look completely reversible. This is, of course, a much harder discovery 
to have made, but apparently it is true that the fundamental physical laws, on a 
microscopic and fundamental level, are completely reversible in time! 

52-3 Symmetry and conservation laws 

The symmetries of the physical laws are very interesting at this level, but they 
turn out, in the end, to be even more interesting and exciting when we come to 
quantum mechanics. For a reason which we cannot make clear at the level of the 
present discussion — a fact that most physicists still find somewhat staggering, a 
most profound and beautiful thing, is that, in quantum mechanics, for each of 
the rules of symmetry there is a corresponding conservation law; there is a definite 
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connection between the laws of conservation and the symmetries of physical laws. 
We can only state this at present, without any attempt at explanation. 

The fact, for example, that the laws are symmetrical for translation in space 
when we add the principles of quantum mechanics, turns out to mean that mo- 
mentum is conserved . 

That the laws are symmetrical under translation in time means, in quantum 
mechanics, that energy is conserved. 

Invariance under rotation through a fixed angle in space corresponds to the 
conservation of angular momentum. These connections are very interesting and 
beautiful things, among the most beautiful and profound things in physics. 

Incidentally, there are a number of symmetries which appear in quantum 
mechanics which have no classical analog, which have no method of description in 
classical physics. One of these is as follows: If is the amplitude for some process 
or other, we know that the absolute square of \p is the probability that the process 
will occur. Now if someone else were to make his calculations, not with this but 
with a \f/ f which differs merely by a change in phase (let A be some constant, and 
multiply e tA times the old £), the absolute square of which is the probability of 
the event, is then equal to the absolute square of \j/\ 

*' = 1 * 1 2 = 1 * 1 2 . ( 52 . 1 ) 

Therefore the physical laws are unchanged if the phase of the wave function is 
shifted by an arbitrary constant. That is another symmetry. Physical laws must 
be of such a nature that a shift in the quantum-mechanical phase makes no differ- 
ence. As we have just mentioned, in quantum mechanics there is a conservation 
law for every symmetry. The conservation law which is connected with the quan- 
tum-mechanical phase seems to be the conservation of electrical charge. This is 
altogether a very interesting business! 

52-4 Mirror reflections 

Now the next question, which is going to concern us for most of the rest of 
this chapter, is the question of symmetry under reflection in space. The problem 
is this: Are the physical laws symmetrical under reflection? We may put it this 
way: Suppose we build a piece of equipment, let us say a clock, with lots of wheels 
and hands and numbers; it ticks, it works, and it has things wound up inside. 
We look at the clock in the mirror. How it looks in the mirror is not the question. 
But let us actually build another clock which is exactly the same as the first clock 
looks in the mirror — every time there is a screw with a right-hand thread in one, 
we use a screw with a left-hand thread in the corresponding place of the other; 
where one is marked “2” on the face, we mark a “£” on the face of the other; 
each coiled spring is twisted one way in one clock and the other way in the mirror- 
image clock; when we are all finished, we have two clocks, both physical, which 
bear to each other the relation of an object and its mirror image, although they 
are both actual, material objects, we emphasize. Now the question is: If the two 
clocks are started in the same condition, the springs wound to corresponding tight- 
nesses, will the two clocks tick and go around, forever after, as exact mirror images? 
(This is a physical question, not a philosophical question.) Our intuition about the 
laws of physics would suggest that they would. 

We would suspect that, at least in the case of these clocks, reflection in space 
is one of the symmetries of physical laws, that if we change everything from “right” 
to “left” and leave it otherwise the same, we cannot tell the difference. Let us, 
then, suppose for a moment that this is true. If it is true, then it would be impossible 
to distinguish “right” and “left” by any physical phenomenon, just as it is, for 
example, impossible to define a particular absolute velocity by a physical phe- 
nomenon. So it should be impossible, by any physical phenomenon, to define 
absolutely what we mean by “right” as opposed to “left,” because the physical 
laws should be symmetrical. 

Of course, the world does not have to be symmetrical. For example, using what 
we may call “geography,” surely “right” can be defined. For instance, we stand 
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in New Orleans and look at Chicago, and Florida is to our right (when our feet 
are on the ground!). So we can define “right” and “left” by geography. Of course, 
the actual situation in any system does not have to have the symmetry that we are 
talking about; it is a question of whether the laws are symmetrical — in other words, 
whether it is against the physical laws to have a sphere like the earth with “left- 
handed dirt” on it and a person like ourselves standing looking at a city like 
Chicago from a place like New Orleans, but with everything the other way around, 
so Florida is on the other side. It clearly seems not impossible, not against the 
physical laws, to have everything changed left for right. 

Another point is that our definition of “right” should not depend on history. 
An easy way to distinguish right from left is to go to a machine shop and pick up a 
screw at random. The odds are it has a right-hand thread — not necessarily, but 
it is much more likely to have a right-hand thread than a left-hand one. This is a 
question of history or convention, or the way things happen to be, and is again 
not a question of fundamental laws. As we can well appreciate, everyone could 
have started out making left-handed screws! 

So we must try to find some phenomenon in which “right hand” is involved 
fundamentally. The next possibility we discuss is the fact that polarized light 
rotates its plane of polarization as it goes through, say, sugar water. As we saw 
in Chapter 33, it rotates, let us say, to the right in a certain sugar solution. That 
is a way of defining “right-hand,” because we may dissolve some sugar in the water 
and then the polarization goes to the right. But sugar has come from living things, 
and if we try to make the sugar artificially, then we discover that it does not rotate 
the plane of polarization! But if we then take that same sugar which is made 
artificially and which does not rotate the plane of polarization, and put bacteria 
in it (they eat some of the sugar) and then filter out the bacteria, we find that we 
still have sugar left (almost half as much as we had before), and this time it does 
rotate the plane of polarization, but the other way\ It seems very confusing, but 
is easily explained. 


Fig. 52-1. (a) L-alanine (left), and (b) 
D-alanine (right). 



Take another example: One of the substances which is common to all living 
creatures and that is fundamental to life is protein. Proteins consist of chains of 
amino acids. Figure 52-1 shows a model of an amino acid that comes out of a 
protein. This amino acid is called alanine, and the molecular arrangement would 
look like that in Fig, 52-1 (a) if it came out of a protein of a real living thing. On 
the other hand, if we try to make alanine from carbon dioxide, ethane, and am- 
monia (and we can make it, it is not a complicated molecule), we discover that we 
are making equal amounts of this molecule and the one shown in Fig. 52— 1(b)! 
The first molecule, the one that comes from the living thing, is called L-alanine. 
The other one, which is the same chemically, in that it has the same kinds of 
atoms and the same connections of the atoms, is a “right-hand” molecule, com- 
pared with the “left-hand” L-alanine, and it is called D-alanine . The interesting 
thing is that when we make alanine at home in a laboratory from simple gases, 
we get an equal mixture of both kinds. However, the only thing that life uses is 
L-alanine. (This is not exactly true. Here and there in living creatures there is a 
special use for D-alanine, but it is very rare. All proteins use L-alanine exclusively.) 
Now if we make both kinds, and we feed the mixture to some animal which likes 
to “eat,” or use up, alanine, it cannot use D-alanine, so it only uses the L-alanine; 
that is what happened to our sugar — after the bacteria eat the sugar that works 
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well for them, only the “wrong” kind is left! (Left-handed sugar tastes sweet, but 
not the same as right-handed sugar.) 

So it looks as though the phenomena of life permit a distinction between 
“right” and “left,” or chemistry permits a distinction, because the two molecules 
are chemically different. But no, it does not! So far as physical measurements can 
be made, such as of energy, the rates of chemical reactions, and so on, the two 
kinds work exactly the same way if we make everything else in a mirror image too. 
One molecule will rotate light to the right, and the other will rotate it to the left 
in precisely the same amount, through the same amount of fluid. Thus, so far as 
physics is concerned, these two amino acids are equally satisfactory. So far as we 
understand things today, the fundamentals of the Schrodinger equation have it 
that the two molecules should behave in exactly corresponding ways, so that one 
is to the right as the other is to the left. Nevertheless, in life it is all one way! 

It is presumed that the reason for this is the following. Let us suppose, for 
example, that life is somehow at one moment in a certain condition in which all 
the proteins in some creatures have left-handed amino acids, and all the enzymes 
are lopsided — every substance in the living creature is lopsided — it is not symmet- 
rical. So when the digestive enzymes try to change the chemicals in the food from 
one kind to another, one kind of chemical “fits” into the enzyme, but the other 
kind does not (like Cinderella and the slipper, except that it is a “left foot” that 
we are testing). So far as we know, in principle, we could build a frog, for example, 
in which every molecule is reversed, everything is like the “left-hand” mirror image 
of a real frog; we have a left-hand frog. This left-hand frog would go on all right 
for a while, but he would find nothing to eat, because if he swallows a fly, his 
enzymes are not built to digest it. The fly has the wrong “kind” of amino acids 
(unless we give him a left-hand fly). So as far as we know, the chemical and life 
processes would continue in the same manner if everything were reversed. 

If life is entirely a physical and chemical phenomenon, then we can understand 
that the proteins are all made in the same corkscrew only from the idea that at the 
very beginning some living molecules, by accident, got started and a few won. 
Somewhere, once, one organic molecule was lopsided in a certain way, and from 
this particular thing the “right” happened to evolve in our particular geography; 
a particular historical accident was one-sided, and ever since then the lopsidedness 
has propagated itself. Once having arrived at the state that it is in now, of course, 
it will always continue — all the enzymes digest the right things, manufacture the 
right things: when the carbon dioxide and the water vapor, and so on, go in the 
plant leaves, the enzymes that make the sugars make them lopsided because the 
enzymes are lopsided. If any new kind of virus or living thing were to originate at 
a later time, it would survive only if it could “eat” the kind of living matter already 
present. Thus it, too, must be of the same kind. 

There is no conservation of the number of right-handed molecules. Once 
started, we could keep increasing the number of right-handed molecules. So 
the presumption is, then, that the phenomena in the case of life do not show a lack 
of symmetry in physical laws, but do show, on the contrary, the universal nature 
and the commonness of ultimate origin of all creatures on earth, in the sense de- 
scribed above. 

52-5 Polar and axial vectors 

Now we go further. We observe that in physics there are a lot of other places 
where we have “right” and “left” hand rules. As a matter of fact, when we learned 
about vector analysis we learned about the right-hand rules we have to use in order 
to get the angular momentum, torque, magnetic field, and so on, to come out right. 
The force on a charge moving in a magnetic field, for example, is F = qs X B. 
In a given situation, in which we know F, v, and B, isn’t that equation enough to 
define right-handedness? As a matter of fact, if we go back and look at where the 
vectors came from, we know that the “right-hand rule” was merely a convention; 
it was a trick. The original quantities, like the angular momenta and the angular 
velocities, and things of this kind, were not really vectors at all! They are all 
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somehow associated with a certain plane, and it is just because there are three 
dimensions in space that we can associate the quantity with a direction perpendicu- 
lar to that plane. Of the two possible directions, we chose the “right-hand” 
direction. 

So if the laws of physics are symmetrical, we should find that if some demon 
were to sneak into all the physics laboratories and replace the word “right” for 
left in every book in which right-hand rules” are given, and instead we were to 
use all “left-hand rules,” uniformly, then it should make no difference whatever 
in the physical laws. 

Let us give an illustration. There are two kinds of vectors. There are “honest” 
vectors, for example a step Ar in space. If in our apparatus there is a piece here and 
something else there, then in a mirror apparatus there will be the image piece and 
the image something else, and if we draw a vector from the “piece” to the “some- 
thing else,” one vector is the mirror image of the other (Fig. 52-2). The vector arrow 
changes its head, just as the whole space turns inside out; such a vector we call a 
polar vector . 

But the other kind of vector, which has to do with rotations, is of a different 
nature. For example, suppose that in three dimensions something is rotating as 
shown in Fig. 52-3. Then if we look at it in a mirror, it will be rotating as indicated, 
namely, as the mirror image of the original rotation. Now we have agreed to repre- 
sent the mirror rotation by the same rule, it is a “vector” which, on reflection, does 
not change about as the polar vector does, but is reversed relative to the polar 
vectors and to the geometry of the space; such a vector is called an axial vector. 

Now if the law of reflection symmetry is right in physics, then it must be true 
that the equations must be so designed that if we change the sign of each axial 
vector and each cross-product of vectors, which would be what corresponds to 
reflection, nothing will happen. For instance, when we write a formula which says 
that the angular momentum is L = r X p, that equation is all right, because if 
we change to a left-hand coordinate system, we change the sign of L, but p and r 
do not change; the cross-product sign is changed, since we must change from a 
right-hand rule to a left-hand rule. As another example, we know that the force 
on a charge moving in a magnetic field is F = qv X B, but if we change from a 
right- to a left-handed system, since F and v are known to be polar vectors the sign 
change required by the cross-product must be cancelled by a sign change in B, 
which means that B must be an axial vector. In other words, if we make such a 
reflection, B must go to — B. So if we change our coordinates from right to left, 
we must also change the poles of magnets from north to south. 

Let us see how that works in an example. Suppose that we have two magnets, 
as in Fig. 52-4. One is a magnet with the coils going around a certain way, and 
with current in a given direction. The other magnet looks like the reflection of 
the first magnet in a mirror the coil will wind the other way, everything that 
happens inside the coil is exactly reversed, and the current goes as shown. Now, 
from the laws for the production of magnetic fields, which we do not know yet 
officially, but which we most likely learned in high school, it turns out that the 
magnetic field is as shown in the figure. In one case the pole is a south magnetic 
pole, while in the other magnet the current is going the other way and the magnetic 
field is reversed it is a north magnetic pole. So we see that when we go from right 
to left we must indeed change from north to south! 

Never mind changing north to south; these too are mere conventions. Let 
us talk about phenomena. Suppose, now, that we have an electron moving through 
one field, going into the page. Then, if we use the formula for the force, v X B 
(remember the charge is minus), we find that the electron will deviate in the indi- 
cated direction according to the physical law. So the phenomenon is that we have 
a coil with a current going in a specified sense and an electron curves in a certain 
way that is the physics — never mind how we label everything. 

Now let us do the same experiment with a mirror : we send an electron through 
in a corresponding direction and now the force is reversed, if we calculate it from 
the same rule, and that is very good because the corresponding motions are then 
mirror images! 



0 


a 



Fig. 52-2. A step in space and its 
mirror image. 



Fig. 52-3. A rotating wheel and its 
mirror image. Note that the angular 
velocity "vector" is not reversed in 
direction. 
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Fig. 52-4. A magnet and its mirror 
image. 
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52-6 Which hand is right? 

So the fact of the matter is that in studying any phenomenon there are always 
two right-hand rules, or an even number of them, and the net result is that the 
phenomena always look symmetrical. In short, therefore, we cannot tell right from 
left if we also are not able to tell north from south. However, it may seem that we 
can tell the north pole of a magnet. The north pole of a compass needle, for ex- 
ample, is one that points to the north. But of course that is again a local property 
that has to do with geography of the earth; that is just like talking about in which 
direction is Chicago, so it does not count. If we have seen compass needles, we 
may have noticed that the north-seeking pole is a sort of bluish color. But that is 
just due to the man who painted the magnet. These are all local, conventional 
criteria. 

However, if a magnet were to have the property that if we looked at it closely 
enough we would see small hairs growing on its north pole but not on its south 
pole, if that were the general rule, or if there were any unique way to distinguish 
the north from the south pole of a magnet, then we could tell which of the two 
cases we actually had, and that would be the end of the law of reflection symmetry. 

To illustrate the whole problem still more clearly, imagine that we were talking 
to a Martian, or someone very far away, by telephone. We are not allowed to send 
him any actual samples to inspect; for instance, if we could send light, we could 
send him right-hand circularly polarized light and say, “That is right-hand light- 
just watch the way it is going.” But we cannot give him anything, we can only talk 
to him. He is far away, or in some strange location, and he cannot see anything 
we can see. For instance, we cannot say, “Look at Ursa major; now see how 
those stars are arranged. What we mean by ‘right’ is . . .” We are only allowed 
to telephone him. 

Now we want to tell him all about us. Of course, first we start defining num- 
bers, and say, “Tick, tick, two , tick, tick, tick, three . . . ,” so that gradually he can 
understand a couple of words, and so on. After a while we may become very famil- 
iar with this fellow, and he says, “What do you guys look like?” We start to de- 
scribe ourselves, and say, “Well, we are six feet tall.” He says, “Wait a minute, 
what is six feet?” Is it possible to tell him what six feet is? Certainly! We say, 
“You know about the diameter of hydrogen atoms— we are 17,000,000,000 
hydrogen atoms high!” That is possible because physical laws are not variant 
under change of scale, and therefore we can define an absolute length. And so we 
define the size of the body, and tell him what the general shape is — it has prongs 
with five bumps sticking out on the ends, and so on, and he follows us along, and 
we finish describing how we look on the outside, presumably without encountering 
any particular difficulties. He is even making a model of us as we go along. He says, 
“My, you are certainly very handsome fellows; now what is on the inside?” So we 
start to describe the various organs on the inside, and we come to the heart, and 
we carefully describe the shape of it, and say, “Now put the heart on the left side.” 
He says, “Duhhh — the left side?” Now our problem is to describe to him which 
side the heart goes on without his ever seeing anything that we see, and without 
our ever sending any sample to him of what we mean by “right” — no standard 
right-handed object. Can we do it? 


52-7 Parity is not conserved! 

It turns out that the laws of gravitation, the laws of electricity and magnetism, 
nuclear forces, all satisfy the principle of reflection symmetry, so these laws, or 
anything derived from them, cannot be used. But associated with the many par- 
ticles that are found in nature there is a phenomenon called beta decay , or weak 
decay. One of the examples of weak decay, in connection with a particle discovered 
in about 1954, posed a strange puzzle. There was a certain charged particle which 
disintegrated into three 7r-mesons, as shown schematically in Fig. 52-5. This 
particle was called, for a while, a r-meson. Now in Fig. 52-5 we also see another 
particle which disintegrates into two mesons; one must be neutral, from the con- 
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servation of charge. This particle was called a 0-meson. So on the one hand we 
have a particle called a t, which disintegrates into three 7r-mesons, and a 0, which 
disintegrates into two 7r-mesons. Now it was soon discovered that the r and the 
0 are almost equal in mass; in fact, within the experimental error, they are equal. 
Next, the length of time it took for them to disintegrate into three 7r’s and two 
7r’s was found to be almost exactly the same; they live the same length of time. 
Next, whenever they were made, they were made in the same proportions, say, 
14 percent r’s to 86 percent 0’s. 

Anyone in his right mind Realizes immediately that they must be the same 
particle, that we merely produce an object which has two different ways of dis- 
integrating — not two different particles. This object that can disintegrate in two 
different ways has, therefore, the same lifetime and the same production ratio 
(because this is simply the ratio of the odds with which it disintegrates into these 
two kinds). 

However, it was possible to prove (and we cannot here explain at all how), from 
the principle of reflection symmetry in quantum mechanics, that it was impossible 
to have these both come from the same particle — the same particle could not 
disintegrate in both of these ways. The conservation law corresponding to the 
principle of reflection symmetry is something which has no classical analog, and 
so this kind of quantum-mechanical conservation was called the conservation of 
parity . So, it was a result of the conservation of parity or, more precisely, from the 
symmetry of the quantum-mechanical equations of the weak decays under reflec- 
tion, that the same particle could not go into both, so it must be some kind of 
coincidence of masses, lifetimes, and so on. But the more it was studied, the more 
remarkable the coincidence, and the suspicion gradually grew that possibly the 
deep law of the reflection symmetry of nature may be false. 

As a result of this apparent failure, the physicists Lee and Yang suggested that 
other experiments be done in related decays to try to test whether the law was 
correct in other cases. The first such experiment was carried out by Miss Wu from 
Columbia, and was dorje as follows. Using a very strong magnet at a very low 
temperature, it turns out that a certain isotope of cobalt, which disintegrates by 
emitting an electron, is magnetic, and if the temperature is low enough that the 
thermal oscillations do not jiggle the atomic magnets about too much, they line 
up in the magnetic field. So the cobalt atoms will all line up in this strong field. 
They then disintegrate, emitting an electron, and it was discovered that when the 
atoms were lined up in a field whose B vector points upward, most of the electrons 
were emitted in a downward direction. 

If one is not really “hep” to the world, such a remark does not sound like 
anything of significance, but if one appreciates the problems and interesting things 
in the world, then he sees that it is a most dramatic discovery: When we put cobalt 
atoms in an extremely strong magnetic field, more disintegration electrons go down 
than up. Therefore if we were to put it in a corresponding experiment in a “mirror,” 
in which the cobalt atoms would be lined up in the opposite direction, they would 
spit their electrons up, not down; the action is unsymmetrical. The magnet has grown 
hairs! The south pole of a magnet is of such a kind that the electrons in a ^-dis- 
integration tend to go away from it; that distinguishes, in a physical way, the north 
pole from the south pole. 

After this, a lot of other experiments were done: the disintegration of the it 
into p and v\ p into an electron and two neutrinos; nowadays, the A into proton 
and 7 r; disintegration of S’s; and many other disintegrations. In fact, in almost 
all cases where it could be expected, all have been found not to obey reflection 
symmetry! Fundamentally, the law of reflection symmetry, at this level in physics, 
is incorrect. 

In short, we can tell a Martian where to put the heart: we say, “Listen, build 
yourself a magnet, and put the coils in, and put the current on, and then take some 
cobalt and lower the temperature. Arrange the experiment so the electrons go from 
the foot to the head, then the direction in which the current goes through the coils 
is the direction that goes in on what we call the right and comes out on the left.” 
So it is possible to define right and left, now, by doing an experiment of this kind. 
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Fig. 52-5. A schematic diagram of 
the disintegration of a and a 0+ 
particle. 



There are a lot of other features that were predicted. For example, it turns out 
that the spin, the angular momentum, of the cobalt nucleus before disintegration 
is 5 units of h , and after disintegration it is 4 units. The electron carries spin angular 
momentum, and there is also a neutrino involved. It is easy to see from this that 
the electron must carry its spin angular momentum aligned along its direction of 
motion, the neutrino likewise. So it looks as though the electron is spinning to the 
left, and that was also checked. In fact, it was checked right here at Caltech by 
Boehm and Wapstra, that the electrons spin mostly to the left. (There were some 
other experiments that gave the opposite answer, but they were wrong!) 

The next problem, of course, was to find the law of the failure of parity con- 
servation. What is the rule that tells us how strong the failure is going to be? 
The rule is this: it occurs only in these very slow reactions, called weak decays, and 
when it occurs, the rule is that the particles which carry spin, like the electron, 
neutrino, and so on, come out with a spin tending to the left. That is a lopsided 
rule; it connects a polar vector velocity and an axial vector angular momentum, 
and says that the angular momentum is more likely to be opposite to the velocity 
than along it. 

Now that is the rule, but today we do not really understand the whys and 
wherefores of it. Why is this the right rule, what is the fundamental reason for it, 
and how is it connected to anything else? At the moment we have been so shocked 
by the fact that this thing is unsymmetrical that we have not been able to recover 
enough to understand what it means with regard to all the other rules. However, 
the subject is interesting, modern, and still unsolved, so it seems appropriate that 
we discuss some of the questions associated with it. 


52-8 Antimatter 

The first thing to do when one of the symmetries is lost is to immediately go 
back over the list of known or assumed symmetries and ask whether any of the 
others are lost. Now we did not mention one operation on our list, which must 
necessarily be questioned, and that is the relation between matter and antimatter. 
Dirac predicted that in addition to electrons there must be another particle, called 
the positron (discovered at Caltech by Anderson), that is necessarily related to 
the electron. All the properties of these two particles obey certain rules of corre- 
spondence: the energies are equal; the masses are equal; the charges are reversed; 
but, more important than anything, the two of them, when they come together, 
can annihilate each other and liberate their entire mass in the form of energy, say 
Y-rays. The positron is called an antiparticle to the electron, and these are the 
characteristics of a particle and its antiparticle. It was clear from Dirac’s argument 
that all the rest of the particles in the world should also have corresponding anti- 
particles. For instance, for the proton there should be an antiproton, which is now 
symbolized by a p. The p would have a negative electrical charge and the same 
mass as a proton, and so on. The most important feature, however, is that a proton 
and an antiproton coming together can annihilate each other. The reason we 
emphasize this is that people do not understand it when we say there is a neutron 
and also an antineutron, because they say, “A neutron is neutral, so how can it 
have the opposite charge?” The rule of the “anti” is not just that it has the opposite 
charge, it has a certain set of properties, the whole lot of which are opposite. The 
antineutron is distinguished from the neutron in this way: if we bring two neutrons 
together, they just stay as two neutrons, but if we bring a neutron and an anti- 
neutron together, they annihilate each other with a great explosion of energy being 
liberated, with various 7r-mesons, Y-rays, and whatnot. 

Now if we have antineutrons, antiprotons, and antielectrons, we can make 
antiatoms, in principle. They have not been made yet, but it is possible in principle. 
For instance, a hydrogen atom has a proton in the center with an electron going 
around outside. Now imagine that somewhere we can make an antiproton with a 
positron going around, would it go around? Well, first of all, the antiproton is 
electrically negative and the antielectron is electrically positive, so they attract each 
other in a corresponding manner — the masses are all the same; everything is the 
52-10 



same. It is one of the principles of the symmetry of physics, the equations seem to 
show, that if a clock, say, were made of matter on one hand, and then we made the 
same clock of antimatter, it would run in this way. (Of course, if we put the clocks 
together, they would annihilate each other, but that is different.) 

An immediate question then arises. We can build, out of matter, two clocks, 
one which is “left-hand” and one which is “right-hand.” For example, we could 
build a clock which is not built in a simple way, but has cobalt and magnets and 
electron detectors which detect the presence of /3-decay electrons and count them. 
Each time one is counted, the second hand moves over. Then the mirror clock, 
receiving fewer electrons, will not run at the same rate. So evidently we can make 
two clocks such that the left-hand clock does not agree with the right-hand one. 
Let us make, out of matter, a clock which we call the standard or right-hand 
clock. Now let us make, also out of matter, a clock which we call the left-hand clock. 
We have just discovered that, in general, these two will not run the same way; 
prior to that famous physical discovery, it was thought that they would. Now it 
was also supposed that matter and antimatter were equivalent. That is, if we made 
an antimatter clock, right-hand, the same shape, then it would run the same as the 
right-hand matter clock, and if we made the same clock to the left it would run 
the same. In other words, in the beginning it was believed that all four of these 
clocks were the same; now of course we know that the right-hand and left-hand 
matter are not the same. Presumably, therefore, the right-handed antimatter and 
the left-handed antimatter are not the same. 

So the obvious question is, which goes with which, if either? In other words, 
does the right-handed matter behave the same way as the right-handed antimatter? 
Or does the right-handed matter behave the same as the left-handed antimatter? 
/3-decay experiments, using positron decay instead of electron decay, indicate that 
this is the interconnection: matter to the “right” works the same way as antimatter 
to the “left.” 

Therefore, at long last, it is really true that right and left symmetry is still 
maintained! If we made a left-hand clock, but made it out of the other kind of 
matter, antimatter instead of matter, it would run in the same way. So what has 
happened is that instead of having two independent rules in our list of symmetries, 
two of these rules go together to make a new rule, which says that matter to the 
right is symmetrical with antimatter to the left. 

So if our Martian is made of antimatter and we give him instructions to make 
this “right” handed model like us, it will, of course, come out the other way around. 
What would happen when, after much conversation back and forth, we each have 
taught the other to make space ships and we meet halfway in empty space? We 
have instructed each other on our traditions, and so forth, and the two of us come 
rushing out to shake hands. Well, if he puts out his left hand, watch out! 


52-9 Broken symmetries 

The next question is, what can we make out of laws which are nearly sym- 
metrical? The marvelous thing about it all is that for such a wide range of impor- 
tant, strong phenomena — nuclear forces, electrical phenomena, and even weak ones 
like gravitation — over a tremendous range of physics, all the laws for these seem 
to be symmetrical. On the other hand, this little extra piece says, “No, the laws 
are not symmetrical!” How is it that nature can be almost symmetrical, but not 
perfectly symmetrical? What shall we make of this? First, do we have any other 
examples? The answer is, we do, in fact, have a few other examples. For instance, 
the nuclear part of the force between proton and proton, between neutron and 
neutron, and between neutron and proton, is all exactly the same — there is a 
symmetry for nuclear forces, a new one, that we can interchange neutron and 
proton — but it evidently is not a general symmetry, for the electrical repulsion 
between two protons at a distance does not exist for neutrons. So it is not generally 
true that we can always replace a proton with a neutron, but only to a good ap- 
proximation. Why good? Because the nuclear forces are much stronger than the 
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electrical forces. So this is an “almost” symmetry also. So we do have examples in 
other things. 

We have, in our minds, a tendency to accept symmetry as some kind of per- 
fection. In fact it is like the old idea of the Greeks that circles were perfect, and it 
was rather horrible to believe that the planetary orbits were not circles, but only 
nearly circles. The difference between being a circle and being nearly a circle is 
not a small difference, it is a fundamental change so far as the mind is concerned. 
There is a sign of perfection and symmetry in a circle that is not there the moment 
the circle is slightly off— that is the end of it — it is no longer symmetrical. Then 
the question is why it is only nearly a circle — that is a much more difficult question. 
The actual motion of the planets, in general, should be ellipses, but during the ages, 
because of tidal forces, and so on, they have been made almost symmetrical. Now 
the question is whether we have a similar problem here. The problem from the 
point of view of the circles is if they were perfect circles there would be nothing to 
explain, that is clearly simple. But since they are only nearly circles, there is a lot 
to explain, and the result turned out to be a big dynamical problem, and now our 
problem is to explain why they are nearly symmetrical by looking at tidal forces 
and so on. 

So our problem is to explain where symmetry comes from. Why is nature so 
nearly symmetrical? No one has any idea why. The only thing we might suggest 
is something like this: There is a gate in Japan, a gate in Neiko, which is sometimes 
called by the Japanese the most beautiful gate in all Japan; it was built in a time 
when there was great influence from Chinese art. This gate is very elaborate, with 
lots of gables and beautiful carving and lots of columns and dragon heads and 
princes carved into the pillars, and so on. But when one looks closely he sees that 
in the elaborate and complex design along one of the pillars, one of the small 
design elements is carved upside down; otherwise the thing is completely symmet- 
rical. If one asks why this is, the story is that it was carved upside down so that 
the gods will not be jealous of the perfection of man. So they purposely put 
an error in there, so that the gods would not be jealous and get angry with human 
beings. 

We might like to turn the idea around and think that the true explanation of 
the near symmetry of nature is this: that God made the laws only nearly symmet- 
rical so that we should not be jealous of His perfection! 
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Feynman's Preface 


These are the lectures in physics that I gave last year and the year before to the 
freshman and sophomore classes at Caltech. The lectures are, of course, not 
verbatim — they have been edited, sometimes extensively and sometimes less so. 
The lectures form only part of the complete course. The whole group of 180 
students gathered in a big lecture room twice a week to hear these lectures and 
then they broke up into small groups of 15 to 20 students in recitation sections 
under the guidance of a teaching assistant. In addition, there was a laboratory 
session once a week. 

The special problem we tried to get at with these lectures was to maintain the 
interest of the very enthusiastic and rather smart students coming out of the high 
schools and into Caltech. They have heard a lot about how interesting and excit- 
ing physics is — the theory of relativity, quantum mechanics, and other modern 
ideas. By the end of two years of our previous course, many would be very dis- 
couraged because there were really very few grand, new, modern ideas presented 
to them. They were made to study inclined planes, electrostatics, and so forth, 
and after two years it was quite stultifying. The problem was whether or not we 
could make a course which would save the more advanced and excited student by 
maintaining his enthusiasm. 

The lectures here are not in any way meant to be a survey course, but are very 
serious. 1 thought to address them to the most intelligent in the class and to make 
sure, if possible, that even the most intelligent student was unable to completely 
encompass everything that was in the lectures — by putting m suggestions of appli- 
cations of the ideas and concepts in various directions outside the main line of 
attack. For this reason, though, I tried very hard to make all the statements as 
accurate as possible, to point out in every case where the equations and ideas fitted 
into the body of physics, and how — when they learned more — things would be 
modified. I also felt that for such students it is important to indicate what it is 
that they should — if they are sufficiently clever — be able to understand by deduc- 
tion from what has been said before, and what is being put in as something new. 
When new ideas came in, I would try either to deduce them if they were deducible, 
or to explain that it was a new idea which hadn’t any basis in terms of things they 
had already learned and which was not supposed to be provable — but was just 
added in 

At the start of these lectures, I assumed that the students knew something when 
they came out of high school — such things as geometrical optics, simple chemistry 
ideas, and so on. I also didn’t see that there was any reason to make the lectures 


3 


in a definite order, in the sense that I would not be allowed to mention something 
until I was ready to discuss it in detail. There was a great deal of mention of things 
to come, without complete discussions. These more complete discussions would 
come later when the preparation became more advanced. Examples are the dis- 
cussions of inductance, and of energy levels, which are at first brought in in a 
very qualitative way and are later developed more completely. 

At the same time that I was aiming at the more active student, I also wanted 
to take care of the fellow for whom the extra fireworks and side applications are 
merely disquieting and who cannot be expected to learn most of the material in 
the lecture at all. For such students I wanted there to be at least a central core or 
backbone of material which he could get. Even if he didn’t understand everything 
in a lecture, I hoped he wouldn’t get nervous. I didn’t expect him to understand 
everything, but only the central and most direct features. It takes, of course, a 
certain intelligence on his part to see which are the central theorems and central 
ideas, and which are the more advanced side issues and applications which he may 
understand only in later years. 

In giving these lectures there was one serious difficulty: in the way the course 
was given, there wasn’t any feedback from the students to the lecturer to indicate 
how well the lectures were going over. This is indeed a very serious difficulty, 
and I don’t know how good the lectures really are. The whole thing was essentially 
an experiment. And if I did it again I wouldn’t do it the same way — I hope I 
don't have to do it again! I think, though, that things worked out — so far as the 
physics is concerned — quite satisfactorily in the first year. 

In the second year I was not so satisfied. In the first part of the course, dealing 
with electricity and magnetism, I couldn’t think of any really unique or different 
way of doing it — of any way that would be particularly more exciting than the 
usual way of presenting it. So I don’t think I did very much in the lectures on 
electricity and magnetism. At the end of the second year I had originally intended 
to go on, after the electricity and magnetism, by giving some more lectures on the 
properties of materials, but mainly to take up things like fundamental modes, 
solutions of the diffusion equation, vibrating systems, orthogonal functions, . . . 
developing the first stages of what are usually called “the mathematical methods of 
physics.” In retrospect, I think that if I were doing it again I would go back to 
that original idea. But since it was not planned that I would be giving these lec- 
tures again, it was suggested that it might be a good idea to try to give an introduc- 
tion to the quantum mechanics — what you will find in Volume 111. 

It is perfectly clear that students who will major m physics can wait until their 
third year for quantum mechanics. On the other hand, the argument was made 
that many of the students in our course study physics as a background for their 
primary interest in other fields. And the usual way of dealing with quantum 
mechanics makes that subject almost unavailable for the great majority of students 
because they have to take so long to learn it. Yet, in its real applications — espe- 
cially in its more complex applications, such as in electrical engineering and chem- 
istry — the full machinery of the differential equation approach is not actually 
used. So I tried to describe the principles of quantum mechanics in a way which 
wouldn’t require that one first know the mathematics of partial differential equa- 
tions. Even for a physicist I think that is an interesting thing to try to do — to 
present quantum mechanics in this reverse fashion — for several reasons which 
may be apparent in the lectures themselves. However, I think that the experiment 
in the quantum mechanics part was not completely successful — in large part 
because I really did not have enough time at the end (I should, for instance, have 
had three or four more lectures in order to deal more completely with such matters 
as energy bands and the spatial dependence of amplitudes). Also, I had never 
presented the subject this way before, so the lack of feedback was particularly 
serious. I now believe the quantum mechanics should be given at a later time. 
Maybe I’ll have a chance to do it again someday. Then I’ll do it right. 

The reason there are no lectures on how to solve problems is because there were 
recitation sections. Although I did put in three lectures in the first year on how to 
solve problems, they are not included here. Also there was a lecture on inertial 
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guidance which certainly belongs after the lecture on rotating systems, but which 
was, unfortunately, omitted. The fifth and sixth lectures are actually due to 
Matthew Sands, as I was out of town. 

The question, of course, is how well this experiment has succeeded. My own 
point of view — which, however, does not seem to be shared by most of the people 
who worked with the students — is pessimistic. I don't think 1 did very well by the 
students. When 1 look at the way the majority of the students handled the problems 
on the examinations, I think that the system is a failure Of course, my friends 
point out to me that there were one or two dozen students who — very surprisingly 
— understood almost everything in all of the lectures, and who were quite active 
in working with the material and worrying about the many points in an excited 
and interested way. These people have now, I believe, a first-rate background in 
physics — and they are, after all, the ones I was trying to get at. But then, “The 
power of instruction is seldom of much efficacy except in those happy dispositions 
where it is almost superfluous ” (Gibbons) 

Still, I didn’t want to leave any student completely behind, as perhaps I did. 
I think one way we could help the students more would be by putting more hard 
work into developing a set of problems which would elucidate some of the ideas 
m the lectures. Problems give a good opportunity to fill out the material of the 
lectures and make more realistic, more complete, and more settled in the mind 
the ideas that have been exposed. 

1 think, however, that there isn’t any solution to this problem of education 
other than to realize that the best teaching can be done only when there is a direct 
individual relationship between a student and a good teacher — a situation m which 
the student discusses the ideas, thinks about the things, and talks about the things. 
It’s impossible to learn very much by simply sitting in a lecture, or even by simply 
doing problems that are assigned. But in our modern times we have so many 
students to teach that we have to try to find some substitute for the ideal. Perhaps 
my lectures can make some contribution. Perhaps in some small place where 
there are individual teachers and students, they may get some inspiration or some 
ideas from the lectures. Perhaps they will have fun thinking them through — or 
going on to develop some of the ideas further, 

Richard P. Feynman 

June , 1963 
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Foreword 


For some forty years Richard P. Feynman focussed his curiosity on the mys- 
terious workings of the physical world, and bent his intellect to searching out the 
order in its chaos. Now, he has given two years of his ability and his energy to 
his Lectures on Physics for beginning students. For them he has distilled the 
essence of his knowledge, and has created in terms they can hope to grasp a 
picture of the physicist’s universe. To his lectures he has brought the brilliance 
and clarity of his thought, the originality and vitality of his approach, and the 
contagious enthusiasm of his delivery. It was a joy to behold. 

The first year’s lectures formed the basis for the first volume of this set of 
books. We have tried in this the second volume to make some kind of a record 
of a part of the second year’s lectures — which were given to the sophomore 
class during the 1962-1963 academic year. The rest of the second year’s lec- 
tures will make up Volume III. 

Of the second year of lectures, the first two-thirds were devoted to a fairly 
complete treatment of the physics of electricity and magnetism. Its presentation 
was intended to serve a dual purpose. We hoped, first, to give the students a 
complete view of one of the great chapters of physics — from the early gropings 
of Franklin, through the great synthesis of Maxwell, on to the Lorentz electron 
theory of material properties, and ending with the still unsolved dilemmas of 
the electromagnetic self-energy. And we hoped, second, by introducing at the 
outset ihe calculus of vector fields, to give a solid introduction to the mathe- 
matics of field theories To emphasize the general utility of the mathematical 
methods, related subjects from other parts of physics were sometimes analyzed 
together with their electric counterparts. We continually tried to drive home 
the generality of the mathematics. (“The same equations have the same solu- 
tions.”) And we emphasized this point by the kinds of exercises and examina- 
tions we gave with the course. 

Following the electromagnetism there are two chapters each on elasticity and 
fluid flow. In the first chapter of each pair, the elementary and practical aspects 
are treated. The second chapter on each subject attempts to give an overview of 
the whole complex range of phenomena which the subject can lead to. These 
four chapters can well be omitted without serious loss, since they are not at all a 
necessary preparation for Volume III. 

The last quarter, approximately, of the second year was dedicated to an intro- 
duction to quantum mechanics. This material has been put into the third volume. 

In this record of the Feynman Lectures we wished to do more than provide a 
transcription of what was said. We hoped to make the written version as clear 
an exposition as possible of the ideas on which the original lectures were based 
For some of the lectures this could be done by making only minor adjustments 
of the wording in the original transcript. For others of the lectures a major re- 
working and rearrangement of the material was required. Sometimes we felt 
we should add some new material to improve the clarity or balance of the pres- 
entation. Throughout the process we benefitted from the continual help and 
advice of Professor Feynman 

The translation of over 1,000,000 spoken words into a coherent text on a 
tight schedule is a formidable task, particularly when it is accompanied by the 
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other onerous burdens which come with the introduction of a new course — 
preparing for recitation sections, and meeting students, designing exercises and 
examinations, and grading them, and so on. Many hands — and heads — were 
involved. In some instances we have, I believe, been able to render a faithful 
image — or a tenderly retouched portrait — of the original Feynman. In other 
instances we have fallen far short of this ideal. Our successes are owed to all 
those who helped. The failures, we regret. 

As explained in detail in the Foreword to Volume I, these lectures were but 
one aspect of a program initiated and supervised by the Physics Course Revision 
Committee (R. B. Leighton, Chairman, H. V. Neher, and M. Sands) at the 
California Institute of Technology, and supported financially by the Ford Foun- 
dation. In addition, the following people helped with one aspect or another of 
the preparation of textual material for this second volume: T. K. Caughey, 
M, L. Clayton, J. B. Curcio, J. B. Hartle, T. W. H. Harvey, M. H. Israel, 
W. J. Karzas, R. W. Kavanagh, R. B. Leighton, J. Mathews, M. S. Plesset, 
F. L. Warren, W. Whaling, C. H. Wilts, and B. Zimmerman. Others con- 
tributed indirectly through their work on the course: J. Blue, G. F. Chapline, 
M. J. Clauser, R. Dolen, H. H. Hill, and A. M. Title. Professor Gerry Neuge- 
bauer contributed in all aspects of our task with a diligence and devotion far 
beyond the dictates of duty. 

The story of physics you find here would, however, not have been, except for 
the extraordinary ability and industry of Richard P. Feynman. 

Matthew Sands 

March , 1964 
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Electromagnetism 


1-1 Electrical forces 

Consider a force like gravitation which varies predominantly inversely as the 
square of the distance, but which is about a billion-billion-billion-billion times 
stronger. And with another difference. There are two kinds of “matter,” which 
we can call positive and negative. Like kinds repel and unlike kinds attract — 
unlike gravity where there is only attraction. What would happen? 

A bunch of positives would repel with an enormous force and spread out in 
all directions. A bunch of negatives would do the same. But an evenly mixed 
bunch of positives and negatives would do something completely different. The 
opposite pieces would be pulled together by the enormous attractions. The net 
result would be that the terrific forces would balance themselves out almost per- 
fectly, by forming tight, fine mixtures of the positive and the negative, and between 
two separate bunches of such mixtures there would be practically no attraction or 
repulsion at all. 

There is such a force: the electrical force. And all matter is a mixture of posi- 
tive protons and negative electrons which are attracting and repelling with this 
great force. So perfect is the balance, however, that when you stand near someone 
else you don’t feel any force at all. If there were even a little bit of unbalance you 
would know it. If you were standing at arm’s length from someone and each of 
you had one percent more electrons than protons, the repelling force would be in- 
credible. How great? Enough to lift the Empire State Building? No! To lift 
Mount Everest? No! The repulsion would be enough to lift a “weight” equal to 
that of the entire earth! 

With such enormous forces so perfectly balanced in this intimate mixture, it 
not hard to understand that matter, trying to keep its positive and negative 
charges in the finest balance, can have a great stiffness and strength. The Empire 
State Building, for example, swings only eight feet in the wind because the electrical 
forces hold every electron and proton more or less in its proper place. On the other 
hand, if we look at matter on a scale small enough that we see only a few atoms, 
any small piece will not, usually, have an equal number of positive and negative 
charges, and so there will be strong residual electrical forces. Even when there are 
equal numbers of both charges in two neighboring small pieces, there may still be 
large net electrical forces because the forces between individual charges vary 
diversely as the square of the distance. A net force can arise if a negative charge of 
gpe piece is closer to the positive than to the negative charges of the other piece. 
The attractive forces can then be larger than the repulsive ones and there can be a 
net attraction between two small pieces with no excess charges. The force that holds 
the atoms together, and the chemical forces that hold molecules together, are 
really electrical forces acting in regions where the balance of charge is not perfect, 
or where the distances are very small. 

You know, of course, that atoms are made with positive protons in the 
nucleus and with electrons outside. You may ask: “If this electrical force is so 
terrific, why don’t the protons and electrons just get on top of each other? If they 
want to be in an intimate mixture, why isn’t it still more intimate?” The answer 
has to do with the quantum effects. If we try to confine our electrons in a region 
)faat is very close to the protons, then according to the uncertainty principle they 
must have some mean square momentum which is larger the more we try to con- 
fine them. It is this motion, required by the laws of quantum mechanics, that keeps 
the electrical attraction from bringing the charges any closer together. 
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1-2 Electric and magnetic fields 

1-3 Characteristics of vector fields 

1-4 The laws of electromagnetism 
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and technology 
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There is another question: “What holds the nucleus together”? In a nucleus 
there are several protons, all of which are positive. Why don’t they push them- 
selves apart? It turns out that in nuclei there are, in addition to electrical forces, 
nonelectrical forces, called nuclear forces, which are greater than the electrical 
forces and which are able to hold the protons together in spite of the electrical 
repulsion. The nuclear forces, however, have a short range — their force falls off 
much more rapidly than l/r 2 . And this has an important consequence. If a 
nucleus has too many protons in it, it gets too big, and it will not stay together. An 
example is uranium, with 92 protons. The nuclear forces act mainly between each 
proton (or neutron) and its nearest neighbor, while the electrical forces act over 
larger distances, giving a repulsion between each proton and all of the others in 
the nucleus. The more protons in a nucleus, the stronger is the electrical repulsion, 
until, as in the case of uranium, the balance is so delicate that the nucleus is almost 
ready to fly apart from the repulsive electrical force. If such a nucleus is just 
“tapped” lightly (as can be done by sending in a slow neutron), it breaks into two 
pieces, each with positive charge, and these pieces fly apart by electrical repulsion. 
The energy which is liberated is the energy of the atomic bomb. This energy is 
usually called “nuclear” energy, but it is really “electrical” energy released when 
electrical forces have overcome the attractive nuclear forces. 

We may ask, finally, what holds a negatively charged electron together (since 
it has no nuclear forces). If an electron is all made of one kind of substance, each 
part should repel the other parts. Why, then, doesn’t it fly apart? But does the 
electron have “parts”? Perhaps we should say that the electron is just a point and 
that electrical forces only act between different point charges, so that the electron 
does not act upon itself. Perhaps. All we can say is that the question of what 
holds the electron together has produced many difficulties in the attempts to form 
a complete theory of electromagnetism. The question has never been answered. 
We will entertain overselves by discussing this subject some more in later chapters. 

As we have seen, we should expect that it is a combination of electrical forces 
and quantum-mechanical effects that will determine the detailed structure of 
materials in bulk, and, therefore, their properties. Some materials are hard, some 
are soft. Some are electrical “conductors” — because their electrons are free to 
move about; others are “insulators” — because their electrons are held tightly to 
individual atoms. We shall consider later how some of these properties come about, 
but that is a very complicated subject, so we will begin by looking at the electrical 
forces only in simple situations. We begin by treating only the laws of electricity — 
including magnetism, which is really a part of the same subject. 

We have said that the electrical force, like a gravitational force, decreases 
inversely as the square of the distance between charges. This relationship is called 
Coulomb’s law. But it is not precisely true when charges are moving — the elec- 
trical forces depend also on the motions of the charges in a complicated way. One 
part of the force between moving charges we call the magnetic force. It is really 
one aspect of an electrical effect. That is why we call the subject “electromag- 
netism.” 

There is an important general principle that makes it possible to treat elec- 
tromagnetic forces in a relatively simple way. We find, from experiment, that the 
force that acts on a particular charge — no matter how many other charges there 
are or how they are moving — depends only on the position of that particular 
charge, on the velocity of the charge, and on the amount of charge. We can write 
the force Fona charge q moving with a velocity v as 

F = q(E+ v X B). (1.1) 

We call E the electric field and B the magnetic field at the location of the charge. 
The important thing is that the electrical forces from all the other charges in the 
universe can be summarized by giving just these two vectors. Their values will 
depend on where the charge is, and may change with time . Furthermore, if we 
replace that charge with another charge, the force on the new charge will be just 
in proportion to the amount of charge so long as all the rest of the charges in the 
1-2 



world do not change their positions or motions. (In real situations, of course, each 
charge produces forces on all other charges in the neighborhood and may cause 
these other charges to move, and so in some cases the fields can change if we replace 
our particular charge by another.) 

We know from Vol. I how to find the motion of a particle if we know the force 
on it. Equation (1.1) can be combined with the equation of motion to give 


d_ mu 

dt [(1 - v 2 /c 2 ) 112 


= F = q(E+ v X B). 


( 1 . 2 ) 


So if E and B are given, we can find the motions. Now we need to know how the 
E’s and E’s are produced. 

One of the most important simplifying principles about the way the fields are 
produced is this: Suppose a number of charges moving in some manner would 
produce a field Eu and another set of charges would produce E 2 . If both sets of 
charges are in place at the same time (keeping the same locations and motions 
they had when considered separately), then the field produced is just the sum 


E - E x + E 2 . (1.3) 

This fact is called the principle of superposition of fields. It holds also for magnetic 
fields. 

This principle means that if we know the law for the electric and magnetic 
fields produced by a single charge moving in an arbitrary way, then all the laws of 
electrodynamics are complete. If we want to know the force on charge A we need 
only calculate the E and B produced by each of the charges B , C, D, etc., and then 
add the E’s and E’s from all the charges to find the fields, and from them the 
forces acting on charge A. If it had only turned out that the field produced by a 
single charge was simple, this would be the neatest way to describe the laws of 
electrodynamics. We have already given a description of this law (Chapter 28, 
Vol. I) and it is, unfortunately, rather complicated. 

It turns out that the form in which the laws of electrodynamics are simplest 
are not what you might expect. It is not simplest to give a formula for the force that 
one charge produces on another. It is true that when charges are standing still the 
Coulomb force law is simple, but when charges are moving about the relations are 
complicated by delays in time and by the effects of acceleration, among others. 
As a result, we do not wish to present electrodynamics only through the force 
laws between charges; we find it more convenient to consider another point of 
view— a point of view in which the laws of electrodynamics appear to be the most 
easily manageable. 


1-2 Electric and magnetic fields 

First, we must extend, somewhat, our ideas of the electric and magnetic 
vectors, E and E. We have defined them in terms of the forces that are felt by a 
charge. We wish now to speak of electric and magnetic fields at a point even when 
there is no charge present. We are saying, in effect, that since there are forces 
“acting on” the charge, there is still “something” there when the charge is removed. 
If a charge located at the point (. x , y, z ) at the time t feels the force F given by 
Eq. (1.1) we associate the vectors E and E with the point in space ( x , y, z). We may 
think of E(x, y, z, t) and B(x t y, z, t) as giving the forces that would be experienced 
at the time t by a charge located at (x, y, z), with the condition that placing the charge 
there did not disturb the positions or motions of all the other charges responsible 
for the fields. 

Following this idea, we associate with every point (jc, y, z) in space two vectors 
E and E, which may be changing with time. The electric and magnetic fields are, 
then, viewed as vector functions of x, y, z, and t. Since a vector is specified by its 
components, each of the fields E{x, y, z, /) and B(x, y, z, t) represent three mathe- 
matical functions of x, y, z, and /. 
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Fig. 1-1. A vector field may be 
represented by drawing a set of arrows 
whose magnitudes and directions indicate 
the values of the vector field at the points 
from which the arrows are drawn. 




Fig. 1-2. A vector field can be 
represented by drawing lines which are 
tangent to the direction of the field vector 
at each point, and by drawing the density 
of lines proportional to the magnitude of 
the field vector. 



Fig. 1-3. The flux of a vector field 
through a surface is defined as the 
average value of the normal component 
of the vector times the area of the surface. 


It is precisely because E (or B) can be specified at every point in space that it is 
called a “field.” A “field” is any physical quantity which takes on different values 
at different points in space. Temperature, for example, is a field — in this case a 
scalar field, which we write as T(x, y, z). The temperature could also vary in time, 
and we would say the temperature field is time-dependent, and write T(x, y, z , t). 
Another example is the “velocity field” of a flowing liquid. We write v(x, y, z, t) 
for the velocity of the liquid at each point in space at the time t. It is a vector field. 

Returning to the electromagnetic fields — although they are produced by 
charges according to complicated formulas, they have the following important 
characteristic: the relationships between the values of the fields at one point and 
the values at a nearby point are very simple. With only a few such relationships in 
the form of differential equations we can describe the fields completely. It is in 
terms of such equations that the laws of electrodynamics are most simply written. 

There have been various inventions to help the mind visualize the behavior of 
fields. The most correct is also the most abstract : we simply consider the fields as 
mathematical functions of position and time. We can also attempt to get a mental 
picture of the field by drawing vectors at many points in space, each of which gives 
the field strength and direction at that point. Such a representation is shown in 
Fig. 1-1. We can go further, however, and draw lines which are everywhere 
tangent to the vectors — which, so to speak, follow the arrows and keep track of 
the direction of the field. When we do this we lose track of the lengths of the 
vectors, but we can keep track of the strength of the field by drawing the lines far 
apart when the field is weak and close together when it is strong. We adopt the 
convention that the number of lines per unit area at right angles to the lines is pro- 
portional to the field strength . This is, of course, only an approximation, and it 
will require, in general, that new lines sometimes start up in order to keep the 
number up to the strength of the field. The field of Fig. 1-1 is represented by 
field lines in Fig. 1-2. 


1-3 Characteristics of vector fields 

There are two mathematically important properties of a vector field which 
we will use in our description of the laws of electricity from the field point of view. 
Suppose we imagine a closed surface of some kind and ask whether we are losing 
“something” from the inside; that is, does the field have a quality of “outflow”? 
For instance, for a velocity field we might ask whether the velocity is always out- 
ward on the surface or, more generally, whether more fluid flows out (per unit 
time) than comes in. We call the net amount of fluid going out through the surface 
per unit time the “flux of velocity” through the surface. The flow through an 
element of a surface is just equal to the component of the velocity perpendicular 
to the surface times the area of the surface. For an arbitrary closed surface, the 
net outward flow — or flux — is the average outward normal component of the 
velocity, times the area of the surface: 

Flux = (average normal component) -(surface area). (1.4) 

In the case of an electric field, we can mathematically define something 
analogous to an outflow, and we again call it the flux, but of course it is not the 
flow of any substance, because the electric field is not the velocity of anything. It 
turns out, however, that the mathematical quantity which is the average normal 
component of the field still has a useful significance. We speak, then, of the 
electric flux — also defined by Eq. (1.4). Finally, it is also useful to speak of the 
flux not only through a completely closed surface, but through any bounded sur- 
face. As before, the flux through such a surface is defined as the average normal 
component of a vector times the area of the surface. These ideas are illustrated in 
Fig. 1-3. 

There is a second property of a vector field that has to do with a line, rather 
than a surface. Suppose again that we think of a velocity field that describes the 
flow of a liquid. We might ask this interesting question: Is the liquid circulating? 
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By that we mean: Is there a net rotational motion around some loop? Suppose 
that we instantaneously freeze the liquid everywhere except inside of a tube which 
is of uniform bore, and which goes in a loop that closes back on itself as in 
Fig. 1-4. Outside of the tube the liquid stops moving, but inside the tube it may 
keep on moving because of the momentum in the trapped liquid — that is, if there is 
more momentum heading one way around the tube than the other. We define a 
quantity called the circulation as the resulting speed of the liquid in the tube times its 
circumference. We can again extend our ideas and define the “circulation” for any 
vector field (even when there isn’t anything moving). For any vector field the 
circulation around any imagined closed curve is defined as the average tangential 
component of the vector (in a consistent sense) multiplied by the circumference 
of the loop (Fig. 1-5). 

Circulation = (average tangential component)* (distance around). (1.5) 

You will see that this definition does indeed give a number which is proportional 
to the circulation velocity in the quickly frozen tube described above. 

With just these two ideas — flux and circulation — we can describe all the laws 
of electricity and magnetism at once. You may not understand the significance of 
the laws right away, but they will give you some idea of the way the physics of 
electromagnetism will be ultimately described. 

1-4 The laws of electromagnetism 

The first law of electromagnetism describes the flux of the electric field: 

The flux of E through any closed surface = the net charge inside # q £) 

€0 

where € 0 is a convenient constant. (The constant e 0 is usually read as “epsilon- 
zero” or “epsifon-naught” ) If there are no charges inside the surface, even though 
there are charges nearby outside the surface, the average normal component of E 
is zero, so there is no net flux through the surface. To show the power of this 
type of statement, we can show that Eq. (1.6) is the same as Coulomb’s law, pro- 
vided only that we also add the idea that the field from a single charge is spherically 
symmetric. For a point charge, we draw a sphere around the charge. Then the 
average normal component is just the value of the magnitude of E at any point, 
since the field must be directed radially and have the same strength for all points on 
the sphere. Our rule now says that the field at the surface of the sphere, times the 
area of the sphere — that is, the outgoing flux — is proportional to the charge inside. 
If we were to make the radius of the sphere bigger, the area would increase as 
the square of the radius. The average normal component of the electric field times 
that area must still be equal to the same charge inside, and so the field must decrease 
as the square of the distance — we get an “inverse square” field. 

If we have an arbitrary curve in space and measure the circulation of the 
electric field around the curve, we will find that it is not, in general, zero (although 
it is for the Coulomb field). Rather, for electricity there is a second law that states: 
for any surface S (not closed) whose edge is the curve C, 

Circulation of E around C = ~ (flux of B through 5). (1.7) 

We can complete the laws of the electromagnetic field by writing two corre- 
sponding equations for the magnetic field B. 

Flux of B through any closed surface = 0. (1.8) 

For a surface S bounded by the curve C, 

^(circulation of B around C) = — (flux of E through S) 

flux of electric current through S ^ ^ 
€0 





Fig. 1-4. (a) The velocity field in a 

liquid. Imagine a tube of uniform cross 
section that follows an arbitrary closed 
curve as in (b). If the liquid were suddenly 
frozen everywhere except inside the 
tube, the liquid in the tube would circulate 
as shown in (c). 



Fig. 1-5. The circulation of a vector 
field is the average tangential compo- 
nent of the vector (in a consistent sense) 
times the circumference of the loop. 
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The constant c 2 that appears in Eq. (1.9) is the square of the velocity of light. 
It appears because magnetism is in reality a relativistic effect of electricity. The 
constant c 0 has been stuck in to make the units of electric current come out in a 
convenient way. 

Equations (1.6) through (1.9), together with Eq. (1.1), are all the laws of 
electrodynamics*. As you remember, the laws of Newton were very simple to 
write down, but they had a lot of complicated consequences and it took us a long 
time to learn about them all. These laws are not nearly as simple to write down, 
which means that the consequences are going to be more elaborate and it will take 
us quite a lot of time to figure them all out. 

We can illustrate some of the laws of electrodynamics by a series of small 
experiments which show qualitatively the interrelationships of electric and 
magnetic fields. You have experienced the first term of Eq. (1.1) when combing 
your hair, so we won’t show that one. The second part of Eq. (1.1) can be demon- 
strated by passing a current through a wire which hangs above a bar magnet, as 
shown in Fig. 1-6. The wire will move when a current is turned on because of the 
force F — qv X B. When a current exists, the charges inside the wire are moving, 
so they have a velocity v, and the magnetic field from the magnet exerts a force on 
them, which results in pushing the wire sideways. 

When the wire is pushed to the left, we would expect that the magnet must 
feel a push to the right. (Otherwise we could put the whole thing on a wagon and 
have a propulsion system that didn’t conserve momentum!) Although the force is 
too small to make movement of the bar magnet visible, a more sensitively sup- 
ported magnet, like a compass needle, will show the movement. 

How does the wire push on the magnet? The current in the wire produces a 
magnetic field of its own that exerts forces on the magnet. According to the last 



* We need only to add a remark about some conventions for the sign of the circulation. 


1-6 








carrying 

other. 


cur- 


term in Eq. (1.9), a current must have a circulation of B — in this case, the lines of 
B are loops around the wire, as shown in Fig. 1-7. This 5-field is responsible for 
the force on the magnet. 

Equation (1 .9) tells us that for a fixed current through the wire the circulation 
of B is the same for any curve that surrounds the wire. For curves — say circles — 
that are farther away from the wire, the circumference is larger, so the tangential 
component of B must decrease. You can see that we would, in fact, expect B to 
decrease linearly with the distance from a long straight wire. 

Now, we have said that a current through a wire produces a magnetic field, 
and that when there is a magnetic field present there is a force on a wire carrying a 
current. Then we should also expect that if we make a magnetic field with a current 
in one wire, it should exert a force on another wire which also carries a current. 
This can be shown by using two hanging wires as shown in Fig. 1-8. When the 
currents are in the same direction, the two wires attract, but when the currents are 
opposite, they repel. 

In short, electrical currents, as well as magnets, make magnetic fields. But wait, 
what is a magnet, anyway? If magnetic fields are produced by moving charges, is 
it not possible that the magnetic field from a piece of iron is really the result of 
currents? It appears to be so. We can replace the bar magnet of our experiment 
with a coil of wire, as shown in Fig. 1-9. When a current is passed through the 
coil — as well as through the straight wire above it — we observe a motion of the 
wire exactly as before, when we had a magnet instead of a coil. In other words, 
the current in the coil imitates a magnet. It appears, then, that a piece of iron acts 
as though it contains a perpetual circulating current. We can, in fact, understand 
magnets in terms of permanent currents in the atoms of the iron. The force on the 
magnet in Fig. 1-7 is due to the second term in Eq. (1.1). 
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Where do the currents come from? One possibility would be from the motion 
of the electrons in atomic orbits. Actually, that is not the case for iron, although 
it is for some materials. In addition to moving around in an atom, an electron 
also spins about on its own axis — something like the spin of the earth — and it is 
the current from this spin that gives the magnetic field in iron. (We say “some- 
thing like the spin of the earth” because the question is so deep in quantum me- 
chanics that the classical ideas do not really describe things too well.) In most 
substances, some electrons spin one way and some spin the other, so the mag- 
netism cancels out, but in iron — for a mysterious reason which we will discuss 
later — many of the electrons are spinning with their axes lined up, and that is the 
source of the magnetism. 

Since the fields of magnets are from currents, we do not have to add any extra 
term to Eqs. (1.8) or (1.9) to take care of magnets. We just take all currents, 
including the circulating currents of the spinning electrons, and then the law is 
right. You should also notice that Eq. (1.8) says that there are no magnetic 
“charges” analogous to the electrical charges appearing on the right side of 
Eq. (1.6). None has been found. 


Fig. 1-10. The circulation of B 
around the curve C is given either by the 
current passing through the surface Si, 
or by the rate of change of the flux of £ 
through the surface $ 2 . 

The first term on the right-hand side of Eq. (1.9) was discovered theoretically 
by Maxwell and is of great importance. It says that changing electric fields produce 
magnetic effects. In fact, without this term the equation would not make sense, 
because without it there could be no currents in circuits that are not complete 
loops. But such currents do exist, as we can see in the following example. Imagine 
a capacitor made of two flat plates. It is being charged by a current that flows 
toward one plate and away from the other, as shown in Fig. 1-10. We draw a 
curve C around one of the wires and fill it in with a surface which crosses the wire, 
as shown by the surface S 1 in the figure. According to Eq. (1.9), the circulation of 
B around C is given by the current in the wire (times c 2 ). But what if we fill in the 
curve with a different surface S 2 , which is shaped like a bowl and passes between 
the plates of the capacitor, staying always away from the wire? There is certainly 
no current through this surface. But, surely, just changing the location of an 
imaginary surface is not going to change a real magnetic field! The circulation of 
B must be what it was before. The first term on the right-hand side of Eq. (1.9) 
does, indeed, combine with the second term to give the same result for the two 
surfaces Si and S 2 . For S 2 the circulation of B is given in terms of the rate of 
change of the flux of E between the plates of the capacitor. And it works out that 
the changing E is related to the current in just the way required for Eq. (1.9) to be 
correct. Maxwell saw that it was needed, and he was the first to write the complete 
equation. * 

With the setup shown in Fig. 1-6 we can demonstrate another of the laws of 
electromagnetism. We disconnect the ends of the hanging wire from the battery 
and connect them to a galvanometer which tells us when there is a current through 
the wire. When we push the wire sideways through the magnetic field of the 
magnet, we observe a current. Such an effect is again just another consequence of 
Eq. (1.1) — the electrons in the wire feel the force F — qv X B. The electrons 
have a sidewise velocity because they move with the wire. This v with a vertical B 
from the magnet results in a force on the electrons directed along the wire, which 
starts the electrons moving toward the galvanometer. 
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Suppose, however, that we leave the wire alone and move the magnet. We 
guess from relativity that it should make no difference, and indeed, we observe a 
similar current in the galvanometer. How does the magnetic field produce forces on 
charges at rest? According to Eq. (1.1) there must be an electric field. A moving 
magnet must make an electric field. How that happens is said quantitatively by 
Eq. (1.7). This equation describes many phenomena of great practical interest, 
such as those that occur in electric generators and transformers. 

The most remarkable consequence of our equations is that the combination of 
Eq. (1.7) and Eq. (1.9) contains the explanation of the radiation of electromag- 
netic effects over large distances. The reason is roughly something like this: 
suppose that somewhere we have a magnetic field which is increasing because, 
say, a current is turned on suddenly in a wire. Then by Eq. (1.7) there must be a 
circulation of an electric field. As the electric field builds up to produce its circula- 
tion, then according to Eq. (1.9) a magnetic circulation will be generated. But the 
building up of this magnetic field will produce a new circulation of the electric 
field, and so on. In this way fields work their way through space without the need 
of charges or currents except at their source. That is the way we see each other! 
It is all in the equations of the electromagnetic fields. 

1-5 What are the fields? 

We now make a few remarks on our way of looking at this subject. You may 
be saying: “All this business of fluxes and circulations is pretty abstract. There are 
electric fields at every point in space; then there are these ‘laws.* But what is 
actually happening? Why can’t you explain it, for instance, by whatever it is that 
goes between the charges.” Well, it depends on your prejudices. Many physicists 
used to say that direct action with nothing in between was inconceivable. (How 
could they find an idea inconceivable when it had already been conceived?) They 
would say: “Look, the only forces we know are the direct action of one piece of 
matter on another. It is impossible that there can be a force with nothing to trans- 
mit it.” But what really happens when we study the “direct action” of one piece of 
matter right against another? We discover that it is not one piece right against 
the other; they are slightly separated, and there are electrical forces acting on a 
tiny scale. Thus we find that we are going to explain so-called direct-contact action 
in terms of the picture for electrical forces. It is certainly not sensible to try to 
insist that an electrical force has to look like the old, familiar, muscular push or 
pull, when it will turn out that the muscular pushes and pulls are going to be inter- 
preted as electrical forces! The only sensible question is what is the most con- 
venient way to look at electrical effects. Some people prefer to represent them as 
the interaction at a distance of charges, and to use a complicated law. Others love 
the field lines. They draw field lines all the time, and feel that writing £’ s and 2Ts 
is too abstract. The field lines, however, are only a crude way of describing a field, 
and it is very difficult to give the correct, quantitative laws directly in terms of field 
lines. Also, the ideas of the field lines do not contain the deepest principle of 
electrodynamics, which is the superposition principle. Even though we know how 
the field lines look for one set of charges and what the field lines look like for an- 
other set of charges, we don’t get any idea about what the field line patterns will 
look like when both sets are present together. From the mathematical stand- 
point, on the other hand, superposition is easy — we simply add the two vectors. 
The field lines have some advantage in giving a vivid picture, but they also have 
some disadvantages. The direct interaction way of thinking has great advantages 
when thinking of electrical charges at rest, but has great disadvantages when dealing 
with charges in rapid motion. 

The best way is to use the abstract field idea. That it is abstract is unfortunate, 
but necessary. The attempts to try to represent the electric field as the motion of 
some kind of gear wheels, or in terms of lines, or of stresses in some kind of mate- 
rial have used up more effort of physicists than it would have taken simply to get 
the right answers about electrodynamics. It is interesting that the correct equations 
for the behavior of light in crystals were worked out by McCullough in 1843. But 

1-9 



people said to him: “Yes, but there is no real material whose mechanical properties 
could possibly satisfy those equations, and since light is an oscillation that must 
vibrate in something , we cannot believe this abstract equation business.” If people 
had been more open-minded, they might have believed in the right equations for 
the behavior of light a lot earlier than they did. 

In the case of the magnetic field we can make the following point: Suppose 
that you finally succeeded in making up a picture of the magnetic field in terms of 
some kind of lines or of gear wheels running through space. Then you try to 
explain what happens to two charges moving in space, both at the same speed and 
parallel to each other. Because they are moving, they will behave like two currents 
and will have a magnetic field associated with them (like the currents in the wires 
of Fig. 1-8). An observer who was riding along with the two charges, however, 
would see both charges as stationary, and would say that there is no magnetic field. 
The “gear wheels” or “lines” disappear when you ride along with the object! All 
we have done is to invent a new problem. How can the gear wheels disappear?! 
The people who draw field lines are in a similar difficulty. Not only is it not pos- 
sible to say whether the field lines move or do not move with charges — they may 
disappear completely in certain coordinate frames 

What we are saying, then, is that magnetism is really a relativistic effect. In 
the case of the two charges we just considered, travelling parallel to each other, we 
would expect to have to make relativistic corrections to their motion, with terms of 
order v 2 /c 2 . These corrections must correspond to the magnetic force. But what 
about the force between the two wires in our experiment (Fig. 1-8). There the 
magnetic force is the whole force. It didn’t look like a “relativistic correction.” 
Also, if we estimate the velocities of the electrons in the wire (you can do this 
yourself), we find that their average speed along the wire is about 0.01 centimeter 
per second. So v 2 /c 2 is about 10” 25 . Surely a negligible “correction.” But no! 
Although the magnetic force is, in this case, 10~ 25 of the “normal” electrical force 
between the moving electrons, remember that the “normal” electrical forces have 
disappeared because of the almost perfect balancing out — because the wires have 
the same number of protons as electrons. The balance is much more precise than 
one part in 10 25 , and the small relativistic term which we call the magnetic force is 
the only term left. It becomes the dominant term. 

It is the near-perfect cancellation of electrical effects which allowed relativity 
effects (that is, magnetism) to be studied and the correct equations — to order 
v 2 /c 2 — to be discovered, even though physicists didn’t know that’s what was 
happening. And that is why, when relativity was discovered, the electromagnetic 
laws didn’t need to be changed. They — unlike mechanics — were already correct 
to a precision of v 2 /c 2 . 

1-6 Electromagnetism in science and technology 

Let us end this chapter by pointing out that among the many phenomena 
studied by the Greeks there were two very strange ones: that if you rubbed a piece 
of amber you could lift up little pieces of papyrus, and that there was a strange 
rock from the island of Magnesia which attracted iron. It is amazing to think that 
these were the only phenomena known to the Greeks in which the effects of elec- 
tricity or magnetism were apparent. The reason that these were the only phe- 
nomena that appeared is due primarily to the fantastic precision of the balancing 
of charges that we mentioned earlier. Study by scientists who came after the Greeks 
uncovered one new phenomena after another that were really some aspect of these 
amber and/or lodestone effects. Now we realize that the phenomena of chemical 
interaction and, ultimately, of life itself are to be understood in terms of electro- 
magnetism. 

At the same time that an understanding of the subject of electromagnetism 
was being developed, technical possibilities that defied the imagination of the people 
that came before were appearing: it became possible to signal by telegraph over 
long distances, and to talk to another person miles away without any connections 
between, and to run huge power systems — a great water wheel, connected by 
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filaments over hundreds of miles to another engine that turns in response to the 
master wheel — many thousands of branching filaments — ten thousand engines in 
ten thousand places running the machines of industries and homes — all turning 
because of the knowledge of the laws of electromagnetism. 

Today we are applying even more subtle effects. The electrical forces, enor- 
mous as they are, can also be very tiny, and we can control them and use them in 
very many ways. So delicate are our instruments that we can tell what a man is 
doing by the way he affects the electrons in a thin metal rod hundreds of miles 
away. All we need to do is to use the rod as an antenna for a television receiver! 

From a long view of the history of mankind — seen from, say, ten thousand 
years from now — there can be little doubt that the most significant event of the 
19th century will be judged as Maxwell’s discovery of the laws of electrodynamics. 
The American Civil War will pale into provincial insignificance in comparison with 
this important scientific event of the same decade. 
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Differential Calculus of Vector Fields 


2-1 Understanding physics 

The physicist needs a facility in looking at problems from several points of 
view. The exact analysis of real physical problems is usually quite complicated, 
and any particular physical situation may be too complicated to analyze directly 
by solving the differential equation. But one can still get a very good idea of the 
behavior of a system if one has some feel for the character of the solution in differ- 
ent circumstances. Ideas such as the field lines, capacitance, resistance, and in- 
ductance are, for such purposes, very useful. So we will spend much of our time 
analyzing them. In this way we will get a feel as to what should happen in different 
electromagnetic situations. On the other hand, none of the heuristic models, such 
as field lines, is really adequate and accurate for all situations. There is only one 
precise way of presenting the laws, and that is by means of differential equations. 
They have the advantage of being fundamental and, so far as we know, precise. 
If you have learned the differential equations you can always go back to them. 
There is nothing to unlearn. 

It will take you some time to understand what should happen in different 
circumstances. You will have to solve the equations. Each time you solve the 
equations, you will learn something about the character of the solutions. To keep 
these solutions in mind, it will be useful also to study their meaning in terms of field 
lines and of other concepts. This is the way you will really “understand” the equa- 
tions. That is the difference between mathematics and physics. Mathematicians, 
or people who have very mathematical minds, are often led astray when “studying” 
physics because they lose sight of the physics. They say: “Look, these differential 
equations — the Maxwell equations — are all there is to electrodynamics; it is 
admitted by the physicists that there is nothing which is not contained in the equa- 
tions. The equations are complicated, but after all they are only mathematical 
equations and if I understand them mathematically inside out, I will understand 
the physics inside out.” Only it doesn’t work that way. Mathematicians who study 
physics with that point of view — and there have been many of them — usually 
make little contribution to physics and, in fact, little to mathematics. They fail 
because the actual physical situations in the real world are so complicated that it is 
necessary to have a much broader understanding of the equations. 

What it means really to understand an equation— that is, in more than a 
strictly mathematical sense — was described by Dirac. He said: “I understand what 
an equation means if I have a way of figuring out the characteristics of its solution 
without actually solving it.” So if we have a way of knowing what should happen 
in given circumstances without actually solving the equations, then we “under- 
stand” the equations, as applied to these circumstances. A physical understanding 
is a completely unmathematical, imprecise, and inexact thing, but absolutely neces- 
sary for a physicist. 

* Ordinarily, a course like this is given by developing gradually the physical 
ideas — by starting with simple situations and going on to more and more compli- 
cated situations. This requires that you continuously forget things you previously 
learned — things that are true in certain situations, but which are not true in general. 
For example, the “law” that the electrical force depends on the square of the 
distance is not always true. We prefer the opposite approach. We prefer to take 
first the complete laws, and then to step back and apply them to simple situa- 
tions, developing the physical ideas as we go along. And that is what we are going 
to do. 


2-1 Understanding physics 

2-2 Scalar and vector fields — T 
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2-3 Derivatives of fields — the 
gradient 

2-4 The operator V 

2-5 Operations with V 

2-6 The differential equation of 
heat flow 

2-7 Second derivatives of vector 
fields 

2-8 Pitfalls 


Review: Chapter 11, Vol. I, Vectors 
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Our approach is completely opposite to the historical approach in which one 
develops the subject in terms of the experiments by which the information was 
obtained. But the subject of physics has been developed over the past 200 years 
by some very ingenious people, and as we have only a limited time to acquire our 
knowledge, we cannot possibly cover everything they did. Unfortunately one of 
the things that we shall have a tendency to lose in these lectures is the historical, 
experimental development. It is hoped that in the laboratory some of this lack can 
be corrected. You can also fill in what we must leave out by reading the Ency- 
clopedia Brittanica, which has excellent historical articles on electricity and on 
other parts of physics. You will also find historical information in many textbooks 
on electricity and magnetism. 


2-2 Scalar and vector fields — T and h 
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We begin now with the abstract, mathematical view of the theory of electricity 
and magnetism. The ultimate idea is to explain the meaning of the laws given in 
Chapter 1 . But to do this we must first explain a new and peculiar notation that 
we want to use. So let us forget electromagnetism for the moment and discuss the 
mathematics of vector fields. It is of very great importance, not only for electro- 
magnetism, but for all kinds of physical circumstances. Just as ordinary differential 
and integral calculus is so important to all branches of physics, so also is the 
differential calculus of vectors. We turn to that subject. 

Listed below are a few facts from the algebra of vectors. It is assumed that 
you already know them. 


A • B = scalar = A X B X + A y B y + A Z B Z 

A X B = vector 

(A X B\ = A x B y - A y B x 
(A X B) x = A y B z - A z B y 
(A X B)y = A Z B X - A X B Z 


( 2 . 1 ) 

( 2 . 2 ) 
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A • (B^X C) = (A X B) C 
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A X (BX C) = B(A'C) - C(A ■ B) 

(2.6) 
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Also we will want to use the two following equalities from the calculus: 
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The first equation (2.7) is, of course, true only in the limit that Ax , Ay , and Az 
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The simplest possible physical field is a scalar field. By a field, you remember, 
we mean a quantity which depends upon position in space. By a scalar field we 
merely mean a field which is characterized at each point by a single number — a 
scalar. Of course the number may change in time, but we need not worry about 
that for the moment. We will talk about what the field looks like at a given instant. 
As an example of a scalar field, consider a solid block of material which has been 
heated at some places and cooled at others, so that the temperature of the body 
varies from point to point in a complicated way. Then the temperature will be a 
function of x> y, and z, the position in space measured in a rectangular coordinate 
system. Temperature is a scalar field. 
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Fig. 2-1. Temperature T is an example of a 
scalar field. With each point (x, y, z) in space 
there is associated a number T(x, y, z). All points on 
the surface marked T = 20° (shown as a curve at 
z = 0) are at the same temperature. The arrows 
are samples of the heat flow vector h . 


One way of thinking about scalar fields is to imagine “contours” which are 
imaginary surfaces drawn through ail points for which the field has the same value, 
just as contour lines on a map connect points with the same height. For a tempera- 
ture field the contours are called “isothermal surfaces” or isotherms. Figure 2-1 
illustrates a temperature field and shows the dependence of T on x and y when 
z = 0. Several isotherms are drawn. 

There are also vector fields. The idea is very simple. A vector is given for each 
point in space. The vector varies from point to point. As an example, consider a 
rotating body. The velocity of the material of the body at any point is a vector 
which is a function of position (Fig. 2-2). As a second example, consider the flow 
of heat in a block of material. If the temperature in the block is high at one place 
and low at another, there will be a flow of heat from the hotter places to the colder. 
The heat will be flowing in different directions in different parts of the block. The 
heat flow is a directional quantity which we call h. Its magnitude is a measure of 
how much heat is flowing. Examples of the heat flow vector are also shown 
in Fig. 2-1. 



Fig. 2-2. The velocity of the atoms 
in a rotating object is an example of a 
vector field. 



Fig. 2-3. Heat flow is a vector field. The vector 
h points along the direction of the flow. Its magni- 
tude is the energy transported per unit time across a 
surface element oriented perpendicular to the flow, 
divided by the area of the surface element. 


Let’s maxe a more precise definition of h : The magnitude of the vector heat 
flow at a point is the amount of thermal energy that passes, per unit time and per 
unit area, through an infinitesimal surface element, at right angles to the direction 
of flow. The vector points in the direction of flow (see Fig. 2-3). In symbols: If A J 
is the thermal energy that passes per unit time through the surface element An, then 

A J 

* = (2.9) 

A a 

where ej is a unit vector in the direction of flow. 

The vector h can be defined in another way — in terms of its components. We 
ask how much heat flows through a small surface at any angle with respect to the 
flow. In Fig. 2-4 we show a small surface Aa 2 inclined with respect to A a it which 
is perpendicular to the flow. The unit vector n is normal to the surface Aa 2 . The 
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Fig. 2-4. The heat flow through Aa 2 
is the same as through Aai. 



angle 0 between it and h is the same as the angle between the surfaces (since h is nor- 
mal to Aa x ). Now what is the heat flow per unit area through A a 2 l The flow 
through A #2 is the same as through A a x ; only the areas are different. In fact, 
Aai = A a 2 cos 0. The heat flow through A a 2 is 


A J 
A a 2 


AJ 

Aa x 


cos 0 — h • n. 


( 2 . 10 ) 


We interpret this equation: the heat flow (per unit time and per unit area) through 
any surface element whose unit normal, is n , is given by h ‘ n. Equally, we could 
say: the component of the heat flow perpendicular to the surface element A a 2 is 
h • n. We can, if we wish, consider that these statements define h. We will be apply- 
ing the same ideas to other vector fields. 


2-3 Derivatives of fields — the gradient 


When fields vary in time, we can describe the variation by giving their deriva- 
tives with respect to t. We want to describe the variations with position in a similar 
way, because we are interested in the relationship between, say, the temperature in 
one place and the temperature at a nearby place. How shall we take the derivative 
of the temperature with respect to position? Do we differentiate the temperature 
with respect to xl Or with respect to y, or z? 

Useful physical laws do not depend upon the orientation of the coordinate 
system. They should, therefore, be written in a form in which either both sides are 
scalars or both sides are vectors. What is the derivative of a scalar field, say 
dT/dxl Is it a scalar, or a vector, or what? It is neither a scalar nor a vector, as 
you can easily appreciate, because if we took a different x-axis, dT/dx would cer- 
tainly be different. But notice: We have three possible derivatives: dT/dx , dT/dy , 
and dT/dz . Since there are three kinds of derivatives and we know that it takes 
three numbers to form a vector, perhaps these three derivatives are the components 
of a vector: 


ST dT 
fix ’ dy * dz ) 


a vector. 


( 2 . 11 ) 


Of course it is not generally true that any three numbers form a vector. It is 
true only if, when we rotate the coordinate system, the components of the vector 
transform among themselves in the correct way. So it is necessary to analyze how 
these derivatives are changed by a rotation of the coordinate system. We shall 
show that (2.11) is indeed a vector. The derivatives do transform in the correct 
way when the coordinate system is rotated. 

We can see this in several ways. One way is to ask a question whose answer is 
independent of the coordinate system, and try to express the answer in an “in- 
variant” form. For instance, if S = A * B, and if A and B are vectors, we know — 
because we proved it in Chapter 1 1 of Vol. I — that 5 is a scalar. We know that S 
is a scalar without investigating whether it changes with changes in coordinate 
systems. It can't, because it’s a dot product of two vectors. Similarly, if we know 
that A is a vector, and we have three numbers B x , B 2 , and B 3 , and we find out that 


A X B X + A y B 2 + A z B 3 = S, (2.12) 

where S is the same for any coordinate system, then it must be that the three 
numbers B x , B 2 , B 3 are the components B x , B y , B z of some vector B . 

Now let’s think of the temperature field. Suppose we take two points P x and 
P 2 , separated by the small interval AR. The temperature at P x is T x and at P 2 is 
T 2 , and the difference AT = T 2 — T x . The temperatures at these real, physical 
points certainly do not depend on what axis we choose for measuring the coordi- 
nates. In particular, AT is a number independent of the coordinate system. It is a 
scalar. 
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If we choose some convenient set of axes, we could write T i = T(x, y, z) and 
T 2 — T(x + Ax,y + Ay, z + A z), where Ax, Ay, and A z are the components of 
the vector AR (Fig. 2-5). Remembering Eq. (2.7), we can write 

The left side of Eq. (2.13) is a scalar. The right side is the sum of three products 
with Ax, Ay, and Az, which are the components of a vector. It follows that the 
three numbers 

dT dT dT 
dx 9 dy 9 dz 

are also the x-, y-, and z-components of a vector. We write this new vector with 
the symbol VT. The symbol V (called “del”) is an upside-down A, and is supposed 
to remind us of differentiation. People read VT in various ways: “del-7",” or 
“gradient of T” or “grad T\” 

gra dr.Tr.(g.f.f).‘ (2.14) 

Using this notation, we can rewrite Eq. (2.13) in the more compact form 

AT = VT ■ AR. (2.15) 

In words, this equation says that the difference in temperature between two nearby 
points is the dot product of the gradient of T and the vector displacement between 
the points. The form of Eq. (2.15) also illustrates clearly our proof above that 
VT is indeed a vector. 

Perhaps you are still not convinced? Let’s prove it in a different way. (Al- 
though if you look carefully, you may be able to see that it’s really the same proof 
in a longer-winded form!) We shall show that the components of VT transform in 
just the same way that components of R do. If they do, VT is a vector according to 
our original definition of a vector in Chapter 11 of Vol. I. We take a new coordi- 
nate system x f , y', z', and in this new system we calculate dT/dx f , dT/dy', and 
dT/dz f . To make things a little simpler, we let z = z', so that we can forget about 
the z-coordinate. (Y ou can check out the more general case for yourself.) 

We take an x'y'-system rotated an angle 9 with respect to the xy -system, as 
in Fig, 2-6(a). For a point (x, y) the coordinates in the prime system are 

x ' = xcos 9 + y sin 9, (2.16) 

y' = —x sin 9 + y cos 9. (2.17) 

Or, solving for x and y, 

x — x' cos 9 — y sin 9 , (2.18) 

y = x f sin 9 + / cos 9. (2.19) 

If any pair of numbers transforms with these equations in the same way that x 
and y do, they are the components of a vector. 

Now let’s look at the difference in temperature between the two nearby 
points Pi and P 2 , chosen as in Fig, 2-6(b). If we calculate with the x- and y- 
coordinates, we would write 

AT = — Ax (2.20) 

dx 

—since Ay is zero. 


* In our notation, the expression (a, b, c ) represents a vector with components a, b , 
and c. If you like to use the unit vectors i,j, and k, you may write 

„ .dT . .dT dT 
XT= 'te +J T y +k Tz- 



Fig. 2-5. The vector AR, whose com- 
ponents are Ax, Ay, and Az. 
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Fig. 2-6. (a) Transformation to a 

rotated coordinate system, (b) Special 
case of an interval AR parallel to the 
x-axis. 
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If we choose some convenient set of axes, we could write T i = T(x, y, z) and 
T 2 = T(x + Ax,y + Ay, z + A z), where Ax, Ay, and A z are the components of 
the vector A R (Fig. 2-5). Remembering Eq, (2.7), we can write 

£J,= Tx Ax + % + < 213 > 

The left side of Eq. (2.13) is a scalar. The right side is the sum of three products 
with Ax, Ay, and Az, which are the components of a vector. It follows that the 
three numbers 

dT dT dT 
dx 9 dy 9 dz 

are also the x-, y-, and z-components of a vector. We write this new vector with 
the symbol VT. The symbol V (called “del”) is an upside-down A, and is supposed 
to remind us of differentiation. People read VT in various ways: “del-77’ or 
“gradient of T,” or “grad 7*;” 

gradr-rr-(g.f.|).* (2.14) 

Using this notation, we can rewrite Eq. (2.13) in the more compact form 

AT = VT ■ AR. (2.15) 

In words, this equation says that the difference in temperature between two nearby 
points is the dot product of the gradient of T and the vector displacement between 
the points. The form of Eq. (2.15) also illustrates clearly our proof above that 
VT is indeed a vector. 

Perhaps you are still not convinced? Let’s prove it in a different way. (Al- 
though if you look carefully, you may be able to see that it’s really the same proof 
in a longer-winded form!) We shall show that the components of VT transform in 
just the same way that components of R do. If they do, VT is a vector according to 
our original definition of a vector in Chapter 11 of Vol. I. We take a new coordi- 
nate system x' , y', z', and in this new system we calculate dT/dx f , dT/dy and 
dT/dz'. To make things a little simpler, we let z = z', so that we can forget about 
the z-coordinate. (You can check out the more general case for yourself.) 

We take an xY-system rotated an angle 9 with respect to the xy -system, as 
in Fig, 2-6(a). For a point (x, y) the coordinates in the prime system are 

x' = xcos 6 + y sin 9, (2.16) 

y = —x sin 6 + y cos 6. (2.17) 

Or, solving for x and y, 

x = x' cos 9 — y sin 9, (2.18) 

y = x' sin 9 + / cos 9. (2.19) 

If any pair of numbers transforms with these equations in the same way that x 
and y do, they are the components of a vector. 

Now let’s look at the difference in temperature between the two nearby 
points Pi and P 2 > chosen as in Fig, 2-6(b). If we calculate with the x- and y- 
coordinates, we would write 

AT Ax (2.20) 

dx 

— since Ay is zero. 


* In our notation, the expression (a, b, c ) represents a vector with components a, b , 
and c. If you like to use the unit vectors i,j, and k, you may write 

„ .dT .dT. .dT 
XT=, te +J T y +k Tz- 



Fig. 2-5. The vector AR, whose com- 
ponents are Ax, Ay, and Az. 
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Fig. 2-6. (a) Transformation to a 

rotated coordinate system, (b) Special 
case of an interval AR parallel to the 
x-axis. 
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What would a computation in the prime system give? We would have written 


AT , dT , dT f 

* t = m Ax 

Looking at Fig. 2-6(b), we see that 

Ax' = Ax cos d 


( 2 . 21 ) 


( 2 . 22 ) 


and 

A y f = —Ax sin 0, (2.23) 

since Ay is negative when Ax is positive. Substituting these in Eq. (2.21), we find 
that 


AT = ~ Ax cos 0 — Ax sin 0 
dx f dy f 

{dT a dT A 

= W cos9 ~ a7 sin( 7 Ax 

Comparing Eq. (2.25) with (2.20), we see that 


(2.24) 

(2-25) 


dT 

dx 


dT 

dx‘ 


cos B — 


dT 
dy ' 


sin 6 . 


(2.26) 


This equation says that dT/dx is obtained from dT/dx ' and dT/dy ', just as x is 
obtained from x f and y f in Eq. (2.18). So dT/dx is the x-component of a vector. 
The same kind of arguments would show that dT/dy and dT/dz are y- and z-com- 
ponents. So VT is definitely a vector. It is a vector field derived from the scalar 
field T, 


2-4 The operator V 

Now we can do something that is extremely amusing and ingenious — and 
characteristic of the things that make mathematics beautiful. The argument that 
grad T, or VT, is a vector did not depend upon what scalar field we were differ- 
entiating. All the arguments would go the same if T were replaced by any scalar 
field . Since the transformation equations are the same no matter what we differ- 
entiate, we could just as well omit the T and replace Eq. (2.26) by the operator 
equation 

i X = Jx' C0 Sd -Jy' S ' ne - (2 ' 27) 


We leave the operators, as Jeans said, “hungry for something to differentiate.” 

Since the differential operators themselves transform as the components of a 
vector should, we can call them components of a vector operator . We can write 



which means, of course, 




d 

dx 9 



(2.28) 

(2.29) 


We have abstracted the gradient away from the T — that is the wonderful idea. 

You must always remember, of course, that V is an operator. Alone, it 
means nothing. If V by itself means nothing, what does it mean if we multiply 
it by a scalar — say T — to get the product TV? (One can always multiply a vector 
by a scalar.) It still does not mean anything. Its .x-component is 


T 


d 

dx ’ 


(2.30) 


which is not a number, but is still some kind of operator. However, according to 
the algebra of vectors we would still call TV a vector. 
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Now let’s multiply V by a scalar on the other side, so that we have the product 
(VT). In ordinary algebra 

TA = AT , (2.31) 

but we have to remember that operator algebra is a little different from ordinary 
vector algebra. With operators we must always keep the sequence right, so that 
the operations make proper sense. You will have no difficulty if you just remember 
that the operator V obeys the same convention as the derivative notation. What is 
to be differentiated must be placed on the right of the V. The order is important. 

Keeping in mind this problem of order, we understand that TV is an operator, 
but the product VT is no longer a hungry operator; the operator is completely 
satisfied. It is indeed a physical vector having a meaning. It represents the spatial 
rate of change of T. The x-component of VT is how fast T changes in the x-direc- 
tion. What is the direction of the vector VT1 We know that the rate of change of 
T in any direction is the component of VT in that direction (see Eq. 2.15). It 
follows that the direction of VT is that in which it has the largest possible com- 
ponent — in other words, the direction in which T changes the fastest. The gradient 
of T has the direction of the steepest uphill slope (in T). 


2-5 Operations with V 

Can we do any other algebra with the vector operator Vl Let us try combining 
it with a vector. We can combine two vectors by making a dot product. We could 
make the products 

(a vector) * V, or V • (a vector). 


The first one doesn’t mean anything yet, because it is still an operator. What it 
might ultimately mean would depend on what it is made to operate on. The 
second product is some scalar field. (A • B is always a scalar.) 

Let's try the dot product of V with a vector field we know, say A. We write 
out the components: 

V • A = VJix + V y h y + V 2 h z (2.32) 

or 


dh x , dh^ , dhz 
dx * By * dz 


(2.33) 


The sum is invariant under a coordinate transformation. If we were to choose a 
different system (indicated by primes), we would have* 


V f h 


dh x * , dhy' , dhz ’ 
dx' ~B? + Hz’ ’ 


(2.34) 


which is the same number as would be gotten from Eq. (2.33), even though it 
looks different. That is, 

V'h =Vh (2.35) 


for every point in space. So V • h is a scalar field, which must represent some 
physical quantity. You should realize that the combination of derivatives in 
V * h is rather special. There are all sorts of other combinations like dh y /dx, 
which are neither scalars nor components of vectors. 

The scalar quantity V • (a vector) is extremely useful in physics. It has been 
given the name the divergence . For example, 

V * k = div h = “divergence of A.” (2.36) 

to we did for VT, we can ascribe a physical significance to V * A. We shall, how- 
ever, postpone that until later. 


* We think of A as a physical quantity that depends on position in space, and not 
strictly as a mathematical function of three variables. When A is “differentiated” with 
respect to x, y , and z, or with respect to x', y\ and z' t the mathematical expression for A 
must first be expressed as a function of the appropriate variables. 
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First, we wish to see what else we can cook up with the vector operator V. 
What about a cross product? We must expect that 


V X h = a vector. 


(2.37) 


It is a vector whose components we can write by the usual rule for cross products 
(see Eq. 2.2): 

(V X A)* = V x h„ - ^ ^ • (2.38) 

Similarly, 

(T X A), = V y h , - \7A = 2*£ _ Mil (2.39) 

and 


(V X /r)^ = VA - V x h z 


dhx _ dfu 
dz dx 


(2.40) 


The combination V X A is called “the m7 of A.” The reason for the name 
and the physical meaning of the combination will be discussed later. 
Summarizing, we have three kinds of combinations with V: 


VT ~ grad T — a vector, 

V * h = div h ~ a scalar, 

V X h — curl A = a vector. 




Fig. 2-7. (a) Heat flow through a 

slab, (b) An infinitesimal slab parallel to 
an isothermal surface in a large block. 


Using these combinations, we can write about the spatial variations of fields in a 
convenient way — in a way that is general, in that it doesn’t depend on any particular 
set of axes. 

As an example of the use of our vector differential operator V, we write a set 
of vector equations which contain the same laws of electromagnetism that we gave 
in words in Chapter 1. They are called Maxwell’s equations. 


Maxwell's Equations 


(1) 

V ■ E 

(2) 

V X E 

(3) 

V ■ B 

(4) 

c 2 V X B 


P_ 

to 

_ dB 
~dt 


0 

BE l 
St € 0 


(2.41) 


where p (rho), the “electric charge density,” is the amount of charge per unit 
volume, and j, the “electric current density,” is the rate at which charge flows 
through a unit area per second. These four equations contain the complete 
classical theory of the electromagnetic field. You see what an elegantly simple 
form we can get with our new notation! 


2-6 The differential equation of heat flow 

Let us give another example of a law of physics written in vector notation. 
The law is not a precise one, but for many metals and a number of other sub- 
stances that conduct heat it is quite accurate. You know that if you take a slab of 
material and heat one face to temperature T 2 and cool the other to a different 
temperature T i, the heat will flow through the material from T 2 to T\ [Fig. 2-7 (a)]. 
The heat flow is proportional to the area A of the faces, and to the temperature 
difference. It is also inversely proportional to d , the distance between the plates. 
(For a given temperature difference, the thinner the slab the greater the heat flow.) 
Letting J be the thermal energy that passes per unit time through the slab, we write 

J = k(T 2 - r,) ~ ■ (2.42) 

The constant of proportionality k (kappa) is called the thermal conductivity. 
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What will happen in a more complicated case? Say in an odd-shaped block of 
material in which the temperature varies in peculiar ways? Suppose we look at a 
tiny piece of the block and imagine a slab like that of Fig. 2-7 (a) on a miniature 
scale. We orient the faces parallel to the isothermal surfaces, as in Fig. 2-7 (b), so 
that Eq. (2.42) is correct for the small slab. 

If the area of the small slab is AA , the heat flow per unit time is 

A A 

AJ — k AT f (2.43) 

where As is the thickness of the slab. Now AJ/AA we have defined earlier as the 
magnitude of A, whose direction is the heat flow. The heat flow will be from 
T i + AT toward T u and so it will be perpendicular to the isotherms, as drawn in 
Fig. 2-7(b). Also, AT/ As is just the rate of change of T with position. And since 
the position change is perpendicular to the isotherms, our AT/tss is the maximum 
rate of change. It is, therefore, just the magnitude of VT. Now since the direction 
of VT is opposite to that of A, we can write (2.43) as a vector equation: 

h = -kVT. (2.44) 

(The minus sign is necessary because heat flows “downhill” in temperature.) 
Equation (2.44) is the differential equation of heat conduction in bulk materials. 
You see that it is a proper vector equation. Each side is a vector if k is just a num- 
ber. It is the generalization to arbitrary cases of the special relation (2.42) for 
rectangular slabs. Later we should learn to write all sorts of elementary physics 
relations like (2.42) in the more sophisticated vector notation. This notation is 
useful not only because it makes the equations look simpler. It also shows most 
clearly the physical content of the equations without reference to any arbitrarily 
chosen coordinate system. 


2-7 Second derivatives of vector fields 

So far we have had only first derivatives. Why not second derivatives? We 
could have several combinations : 

(a) V(VT) 

(b) V X (VT) 

(c) V(V ■ h) (2.45) 

(d) v ■ (v x h) 

(e) V X (v X h) 

You can check that these are all the possible combinations. 

Let’s look first at the second one, (b). It has the same form as 

A X (AT) = (AX A)T = 0, 

since A X A is always zero. So we should have 

curl (grad T) = V X (VT) * 0. (2.46) 

We can see how this equation comes about if we go through once with the com- 
ponents: 

[v x (VDJ* = V X (VT)„ - v y (vDx . 

- h (£) -£(€)• <2 - 47> 

^which is zero (by Eq. 2.8). It goes the same for the other components. So V X 
(VT) = 0, for any temperature distribution — in fact, for any scalar function. 
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Now let us take another example. Let us see whether we can find another 
zero. The dot product of a vector with a cross product which contains that vector 
is zero: 

A‘(AX B) = 0, (2.48) 

because A X B is perpendicular to A , and so has no components in the direction A. 
The same combination appears in (d) of (2.45), so we have 

V - (V X h) = div (curl h) = 0. (2.49) 

Again, it is easy to show that it is zero by carrying through the operations with 
components. 

Now we are going to state two mathematical theorems that we will not prove. 
They are very interesting and useful theorems for physicists to know. 

In a physical problem we frequently find that the curl of some quantity — say 
of the vector field A — is zero. Now we have seen (Eq. 2.46) that the curl of a 
gradient is zero, which is easy to remember because of the way the vectors work. 
It could certainly be. then, that A is the gradient of some quantity, because then 
its curl would necessarily be zero. The interesting theorem is that if the curl A is 
zero, then A is always the gradient of something — there is some scalar field p (psi) 
such that A is equal to grad p. In other words, we have the 

Theorem: 

If V X A = 0 

there is a p 

such that A — Vp. (2.50) 

There is a similar theorem if the divergence of A is zero. We have seen in 
Eq. (2.49) that the divergence of a curl of something is always zero. If you come 
across a vector field D for which div D is zero, then you can conclude that D is 
the curl of some vector field C. 


Theorem: 

If V D = 0 

there is a C 

such that D = V X C. (2.51) 

In looking at the possible combinations of two V operators, we have found 
that two of them always give zero. Now we look at the ones that are not zero. 
Take the combination V * (VT), which was first on our list. It is not, in general, 
zero. We write out the components: 


Then 


VT = V X T + VJT + V Z T. 

V • (VT) = V x (V,n + w,r) + V Z (V Z T) 
= d^r d^r e?T 

dx 2 dy 2 dz 2 * 


(2.52) 


which would, in general, come out to be some number. It is a scalar field. 

You see that we do not need to keep the parentheses, but can write, without 
any chance of confusion. 


V • (VT) = V • VT - (V • V)T - V 2 7. (2.53) 

We look at V 2 as a new operator. It is a scalar operator. Because it appears often 
in physics, it has been given a special name — the Laplactan. 

i i * _2 d 2 S 2 - d 2 CA s 

Laplacian = V = — + — + (2.54) 
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Since the Laplacian is a scalar operator, we may operate with it on a vector — 
by which we mean the same operation on each component in rectangular coor- 
dinates: 

v 2 a = (v%, v%, v\). 

Let’s look at one more possibility: V X (V X h ), which was (e) in the list 
(2,45). Now the curl of the curl can be written differently if we use the vector 
equality (2.6): 

A X (B X C) = B(A'C) - C(A- B ). (2.55) 

In order to use this formula, we should replace A and B by the operator V and 
put C = h. If we do that, we get 

V X (V X h) = V(V * h) — h(V * V) . . . ??? 

Wait a minute! Something is wrong. The first two terms are vectors all right 
(the operators are satisfied), but the last term doesn’t come out to anything. It’s 
still an operator. The trouble is that we haven’t been careful enough about keeping 
the order of our terms straight. If you look again at Eq. (2.55), however, you see 
that we could equally well have written it as 

A X (B X C) = B(A-C) - (A- B)C. (2.56) 

The order of terms looks better. Now let’s make our substitution in (2.56). We get 

V X (V X h) = V(V * h) - (V • V)h . (2.57) 

This form looks all right. It is, in fact, correct, as you can verify by computing the 
components. The last term is the Laplacian, so we can equally well write 

V X (V X h) ~ V(V • h) — V 2 h . (2.58) 

We have had something to say about all of the combinations in our list of 
double V’s, except for (c), V(V * h). It is a possible vector field, but there is nothing 
special to say about it. It’s just some vector field which may occasionally come up. 
It will be convenient to have a table of our conclusions: 

(a) V ■ (VT) = V 2 J = a scalar field 

(b) V X (VT) = 0 

(c) V(V * h) = a vector field 

(d) V • (V X h) = 0 (2 59) 

(e) V X (T X h) = v(y ■ h) - V 2 h 

(f) (V • V)A = V 2 h = a vector field 

You may notice that we haven’t tried to invent a new vector operator (V X V). 
Do you see why? 


2-8 Pitfalls 

We have been applying our knowledge of ordinary vector algebra to the alge- 
bra of the operator V . We have to be careful, though, because it is possible to go 
astray. There are two pitfalls which we will mention, although they will not come 
up in this course. What would you say about the following expression, that in- 
volves the two scalar functions and <i> (phi): 

(V*) X (V*)? 

You might want to say: it must be zero because it’s just like 


(Act) X (Ab\ 
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which is zero because the cross product of two equal vectors A X A is always zero. 
But in our example the two operators V are not equal! The first one operates on 
one function, \j/; the other operates on a different function, <£. So although we rep- 
resent them by the same symbol V, they must be considered as different operators. 
Clearly, the direction of depends on the function so it is not likely to be 
parallel to v<f>. 

(V$) X (V<t>) 5* 0 (generally). 

Fortunately, we won’t have to use such expressions. (What we have said doesn’t 
change the fact that V X Vyp ~ 0 for any scalar field, because here both V’s 
operate on the same function.) 

Pitfall number two (which, again, we need not get into in our course) is the 
following: The rules that we have outlined here are simple and nice when we use 
rectangular coordinates. For example, if we have V 2 h and we want the x-com- 
ponent, it is 

<v ’* ) ' “ (& + 5 ? + &)*■-*’*'■ < 2 “> 

The same expression would not work if we were to ask for the radial component 
of V 2 h, The radial component of V 2 h is not equal to V 2 h r . The reason is that 
when we are dealing with the algebra of vectors, the directions of the vectors are 
all quite definite. But when we are dealing with vector fields, their directions are 
different at different places. If we try to describe a vector field in, say, polar coordi- 
nates, what we call the “radial” direction varies from point to point. So we can 
get into a lot of trouble when we start to differentiate the components. For ex- 
ample, even for a constant vector field, the radial component changes from point 
to point. 

It is usually safest and simplest just to stick to rectangular coordinates and 
avoid trouble, but there is one exception worth mentioning: Since the Laplacian 
V 2 , is a scalar, we can write it in any coordinate system we want to (for example, 
in polar coordinates). But since it is a differential operator, we should use it only 
on vectors whose components are in a fixed direction — that means rectangular 
coordinates. So we shall express all of our vector fields in terms of their x-, y 
and z-components when we write our vector differential equations out in com- 
ponents. 
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3 


Vector Integral Calculus 


3-1 Vector integrals; the line integral of V'T 

We found in Chapter 2 that there were various ways of taking derivatives of 
fields. Some gave vector fields; some gave scalar fields. Although we developed 
many different formulas, everything in Chapter 2 could be summarized in one rule: 
the operators d/dx , d/dy, and d/dz are the three components of a vector operator 
V. We would now like to get some understanding of the significance of the deriva- 
tives of fields. We will then have a better feeling for what a vector field equation 
means. 

We have already discussed the meaning of the gradient operation (V on a 
scalar). Now we turn to the meanings of the divergence and curl operations. 
The interpretation of these quantities is best done in terms of certain vector 
integrals and equations relating such integrals. These equations cannot, unfor- 
tunately, be obtained from vector algebra by some easy substitution, so you will 
just have to learn them as something new. Of these integral formulas, one is 
practically trivial, but the other two are not. We will derive them and explain their 
implications. The equations we shall study are really mathematical theorems. 
They will be useful not only for interpreting the meaning and the content of the 
divergence and the curl, but also in working out general physical theories. These 
mathematical theorems are, for the theory of fields, what the theorem of the con- 
servation of energy is to the mechanics of particles. General theorems like these 
are important for a deeper understanding of physics. You will find, though, that 
they are not very useful for solving problems — except in the simplest cases. It is 
delightful, however, that in the beginning of our subject there will be many simple 
problems which can be solved with the three integral formulas we are going to 
treat. We will see, however, as the problems get harder, that we can no longer use 
these simple methods. 

We take up first an integral formula involving the gradient. The relation 
contains a very simple idea: Since the gradient represents the rate of change of a 
field quantity, if we integrate that rate of change, we should get the total change. 
Suppose we have the scalar field *P(x, y, z). At any two points (1) and (2), the 
function $ will have the values ^(1) and ^(2), respectively. [We use a convenient 
notation, in which (2) represents the point (x 2i y 2 , z 2 ) and \p(2) means the same 
thing as $(x 2} y 2 , z 2 ).] If r(gamma) is any curve joining (1) and (2), as in Fig. 3-1, 
the following relation is true: 

Theorem 1. f {2) 

H2) - m - / (V*)’*. (3.1) 

•/(D 

along T 

The integral is a line integral , from (1) to (2) along the curve T, of the dot product 
of — a vector — with ds — another vector which is an infinitesimal line element 
of the curve r (directed away from (1) and toward (2)). 

First, we should review what we mean by a line integral. Consider a scalar 
function f(x,y, z), and the curve r joining two points (1) and (2). We mark off 
the curve at a number of points and join these points by straight-line segments, as 
shown in Fig. 3-2. Each segment has the length As l , where i is an index that runs 
1, 2, 3, . . . By the line integral 

fds 

along T 
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Fig. 3-1. The terms used in Eq. (3.1). 
The vector Vrp is evaluated at the line 
element ds. 



As 2 


Fig. 3-2. The line integral is the 
limit of a sum. 
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we mean the limit of the sum 

E/i A s i9 

where f x is the value of the function at the ith. segment. The limiting value is what 
the sum approaches as we add more and more segments (in a sensible way, so that 
the largest As* — > 0). 

The integral in our theorem, Eq. (3.1), means the same thing, although it 
looks a little different. Instead of /, we have another scalar — the component of 
in the direction of As. If we write (v^), for this tangential component, it is 
clear that 

(V*)« As = (Vyfr) • As. (3.2) 

The integral in Eq. (3.1) means the sum of such terms. 

Now let’s see why Eq. (3.1) is true. In Chapter 1, we showed that the com- 
ponent of Vyp along a small displacement AR was the rate of change of in the 
direction of AR. Consider the line segment As from (1) to point a in Fig. 3-2. 
According to our definition, 

A^i = \p(d) — iA( 1) = (TV)i 'Aji. (3.3) 

Also, we have 

— i//(a) — (V*) 2 * A$ 2 , (3.4) 

where, of course, (Vi/Oi means the gradient evaluated at the segment Asi, and 
(V*) 2> the gradient evaluated at A s 2 - If we add Eqs. (3.3) and (3.4), we get 

*(*) ~ *(1) = (V*)i ‘ * a *2. (3.5) 


You can see that if we keep adding such terms, we get the result 

K2) ~ *(1) = EW)rAsi. (3.6) 


The left-hand side doesn’t depend on how we choose our intervals — if (1) and (2) 
are kept always the same — so we can take the limit of the right-hand side. We have 
therefore proved Eq. (3.1). 

You can see from our proof that just as the equality doesn’t depend on how 
the points a, b 9 c, . . . , are chosen, similarly it doesn’t depend on what we choose 
for the curve r to join (1) and (2). Our theorem is correct for any curve from (1) 
to (2). 

One remark on notation: You will see that there is no confusion if we write, 
for convenience, 

(V^) • ds = • ds . (3.7) 

With this notation, our theorem is 



Fig. 3-3. The closed surface S 
defines the volume V. The unit vector n 
is the outward normal to the surface 
element da, and h is the heat-flow vector 
at the surface element. 


Theorem 1. 


*(2) - m 


ri 2) 

7(1) 


(1> f 
any curve from 

(1> to (2) 


• ds. 


(3.8) 


3-2 The flux of a vector field 

Before we consider our next integral theorem — a theorem about the divergence 
— we would like to study a certain idea which has an easily understood physical 
significance in the case of heat flow. We have defined the vector h, which represents 
the heat that flows through a unit area in a unit time. Suppose that inside a block 
of material we have some closed surface S which encloses the volume V (Fig. 3-3). 
We would like to find out how much heat is flowing out of this volume. We can, 
of course, find it by calculating the total heat flow out of the surface S. 

We write da for the area of an element of the surface. The symbol stands for 
a two-dimensional differential. If, for instance, the area happened to be in the 
xy-plane we would have 

da = dx dy. 
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Later we shall have integrals over volume and for these it is convenient to con- 
sider a differential volume that is a little cube. So when we write dv we mean 

dV ~ dx dy dz. 

Some people like to write d 2 a instead of da to remind themselves that it is 
kind of a second-order quantity. They would also write d 3 V instead of dV. We 
will use the simpler notation, and assume that you can remember that an area 
has two dimensions and a volume has three. 

The heat flow out through the surface element da is the area times the com- 
ponent of h perpendicular to da . We have already defined n as a unit vector pointing 
outward at right angles to the surface (Fig. 3-3). The component of k that we 
want is 

h n = A * ii. (3.9) 

The heat how out through da is then 

hnda, (3.10) 

To get the total heat flow through any surface we sum the contributions from all 
the elements of the surface. In other words, we integrate (3.10) over the whole 
surface: 

Total heat flow outward through S = f h- n da. (3.11) 

Js 

We are also going to call this surface integral “the flux of h through the sur- 
face.” Originally the word flux meant flow, so that the surface integral just means 
the flow of h through the surface. We may think: h is the “current density” of 
heat flow and the surface integral of it is the total heat current directed out of the 
surface; that is, the thermal energy per unit time (joules per second). 

We would like to generalize this idea to the case where the vector does not 
represent the flow of anything; for instance, it might be the electric field. We can 
certainly still integrate the normal component of the electric field over an area if 
we wish. Although it is not the flow of anything, we still call it the “flux.” We say 

Flux of E through the surface S = [ E * n da, (3.12) 

Js 

We generalize the word “flux” to mean the “surface integral of the normal com- 
ponent” of a vector. We will also use the same definition even when the surface 
considered is not a closed one, as it is here. 

Returning to the special case of heat flow, let us take a situation in which 
heat is conserved . For example, imagine some material in which after an initial 
heating no further heat energy is generated or absorbed. Then, if there is a net 
heat flow out of a closed surface, the heat content of the volume inside must 
decrease. So, in circumstances in which heat would be conserved, we say that 

/ hnda = - (3.13) 

Js dt 

where Q is the heat inside the surface. The heat flux out of S is equal to minus the 
rate of change with respect to time of the total heat Q inside of S. This interpreta- 
tion is possible because we are speaking of heat flow and also because we supposed 
that the heat was conserved. We could not, of course, speak of the total heat 
inside the volume if heat were being generated there. 

Now we shall point out an interesting fact about the flux of any vector. You 
may think of the heat flow vector if you wish, but what we say will be true for any 
vector field C. Imagine that we have a closed surface S that encloses the volume V, 
We now separate the volume into two parts by some kind of a “cut,” as in Fig. 
3-4. Now we have two closed surfaces and volumes. The volume V i is enclosed 
in the surface Su which is made up of part of the original surface S a and of the 
surface of the cut, S ab . The volume V 2 is enclosed by S 2 , which is made up of 
the rest of the original surface Si, and closed off by the cut S a& . Now consider the 
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Fig. 3-4. A volume V contained inside the surface 
5 is divided into two pieces by a "cut” at the surface 
Sab* We now have the volume V] enclosed in the 
surface Si = S a + Safa and the volume V 2 enclosed 
in the surface S 2 = $b + S 0 t. 



cut 


following question: Suppose we calculate the flux out through surface Si and 
add to it the flux through surface S 2 . Does the sum equal the flux through the 
whole surface that we started with? The answer is yes. The flux through the part 
of the surfaces S a b common to both Si and S 2 just exactly cancels out. For the 
flux of the vector C out of Fi, we can write 


Flux through Si — 
and for the flux out of F 2 , 

Flux through S 2 = 


/, 

fsb 


C • n da + / 

C • n x da , 

(3.14) 

J S a b 


• 

C • n da + f 

C • #i 2 da. 

(3.15) 

JSab 



Note that in the second integral we have written n 1 for the outward normal for 
Sab when it belongs to Si, and n 2 when it belongs to S 2 , as shown in Fig. 3-4. 
Clearly, n x = — » 2 , so that 



Fig. 3-5. Computation of the flux of 
C out of a small cube. 


f C-n x da = — f Cn 2 da. (3.16) 

Sab " Sab 

If we now add Eqs. (3.14) and (3.15), we see that the sum of the fluxes through 
Si and S 2 is just the sum of two integrals which, taken together, give the flux 
through the original surface S = S a + S&. 

We see that the flux through the complete outer surface S can be considered 
as the sum of the fluxes from the two pieces into which the volume was broken. 
We can similarly subdivide again — say by cutting V x into two pieces. You see 
that the same arguments apply. So for any way of dividing the original volume, it 
must be generally true that the flux through the outer surface, which is the original 
integral, is equal to a sum of the fluxes out of all the little interior pieces. 


3-3 The flux from a cube; Gauss 9 theorem 

We now take the special case of a small cube* and find an interesting formula 
for the flux out of it. Consider a cube whose edges are lined up with the axes as in 
Fig. 3-5. Let us suppose that the coordinates of the corner nearest the origin 
are x, y, z. Let Ax be the length of the cube in the x-direction, Ay be the length 
in the y-direction, and Az be the length in the z-direction. We wish to find the 
flux of a vector field C through the surface of the cube. We shall do this by making 
a sum of the fluxes through each of the six faces. First, consider the face marked 
1 in the figure. The flux outward on this face is the negative of the x-component 
of C, integrated over the area of the face. This flux is 


- J C x dy dz. 


Since we are considering a small cube, we can approximate this integral by the 


* The following development applies equally well to any rectangular parallelepiped. 
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value of C x at the center of the face — which we call the point (1) — multiplied by 
the area of the face. Ay A z: 

Flux out of 1 = — C x {X) Ay Az. 

Similarly, for the flux out of face 2, we write 

Flux out of 2 = C x ( 2) Ay Az . 


Now C x (l) and C x (2) are, in general, slightly different. If Ax is small enough, we 
can write 


C x ( 2) = C,(l) + 


dC x 

dx 


Ax . 


There are, of course, more terms, but they will involve (A x ) 2 and higher powers, 
and so will be negligible if we consider only the limit of small Ax. So the flux 
through face 2 is 


Flux out of 2 


«1) + Axj Ay Az. 


Summing the fluxes for faces 1 and 2, we get 


Flux out of 1 and 2 = Ax Ay Az. 

ox 

The derivative should really be evaluated at the center of face 1; that is, at 
[x, y + (Ay/ 2), z + (Az/2)]. But in the limit of an infinitesimal cube, we make 
a negligible error if we evaluate it at the corner (x, y, z). 

Applying the same reasoning to each of the other pairs of faces, we have 

dC 

Flux out of 3 and 4 = Ax Ay Az 
dy 

and 

Flux out of 5 and 6 = Ax Ay Az. 

The total flux through all the faces is the sum of these terms. We find that 

/ c ' nda = (^ + i ? + tt) A * A ' Az ’ 

cube 

and the sum of the derivatives is just V • C. Also, Ajc Ay Az = AV, the volume of 
the cube. So we can say that for an infinitesimal cube 

f C • nda = (y • C)AV. (3.17) 

surface 

We have shown that the outward flux from the surface of an infinitesimal cube is 
equal to the divergence of the vector multiplied by the volume of the cube. We 
now see the “meaning” of the divergence of a vector. The divergence of a vector 
at the point P is the flux — the outgoing “flow” of C—per unit volume , in the neigh- 
borhood of P. 

We have connected the divergence of C to the flux of C out of each infinitesimal 
volume. For any finite volume we can use the fact we proved above — that the 
total flux from a volume is the sum of the fluxes out of each part. We can, that is, 
integrate the divergence over the entire volume. This gives us the theorem that the 
integral of the normal component of any vector over any closed surface can also be 
written as the integral of the divergence of the vector over the volume enclosed 
by the surface. This theorem is named after Gauss. 

Gauss’ Theorem. 

[ C'nda = f V-CdV, (3.18) 

Js Jv 

where S is any closed surface and V is the volume inside it. 
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3-4 Heat conduction; the diffusion equation 

Let’s consider an example of the use of this theorem, just to get familiar 
with it. Suppose we take again the case of heat flow in, say, a metal. Suppose we 
have a simple situation in which all the heat has been previously put in and the 
body is just cooling off. There are no sources of heat, so that heat is conserved. 
Then how much heat is there inside some chosen volume at any time? It must be 
decreasing by just the amount that flows out of the surface of the volume. If our 
volume is a little cube, we would write, following Eq. (3.17), 

Heat out = J A * nda — V * h AV. (3.19) 

cube 


But this must equal the rate of loss of the heat inside the cube. If q is the heat per 
unit volume, the heat in the cube is q AF, and the rate of loss is 

< 3 ' M) 

Comparing (3.19) and (3.20), we see that 

-§=▼•*. (3-21) 


Take careful note of the form of this equation; the form appears often in phys- 
ics. It expresses a conservation law — here the conservation of heat. We have 
expressed the same physical fact in another way in Eq. (3.13). Here we have the 
differential form of a conservation equation, while Eq. (3.13) is the integral form. 

We have obtained Eq. (3.21) by applying Eq. (3.13) to an infinitesimal cube. 
We can also go the other way. For a big volume V bounded by S , Gauss’ law 
says that 

j^h-nda = j V • h dV. (3.22) 



Fig. 3-6. In the region near a point 
source of heat, the heat flow is radially 
outward. 


Using (3.21), the integral on the right-hand side is found to be just — dQ/dt , 
and again we have Eq. (3.13). 

Now let’s consider a different case. Imagine that we have a block of material 
and that inside it there is a very tiny hole in which some chemical reaction is 
taking place and generating heat. Or we could imagine that there are some wires 
running into a tiny resistor that is being heated by an electric current. We shall 
suppose that the heat is generated practically at a point, and let W represent the 
energy liberated per second at that point. We shall suppose that in the rest of the 
volume heat is conserved, and that the heat generation has been going on for a 
long time — so that now the temperature is no longer changing anywhere. The 
problem is: What does the heat vector h look like at various places in the metal? 
How much heat flow is there at each point? 

We know that if we integrate the normal component of h over a closed surface 
that encloses the source, we will always get W . All the heat that is being generated 
at the point source must flow out through the surface, since we have supposed 
that the flow is steady. We have the difficult problem of finding a vector field 
which, when integrated over any surface, always gives W. We can, however, find 
the field rather easily by taking a somewhat special surface. We take a sphere of 
radius R, centered at the source, and assume that the heat flow is radial (Fig. 3-6). 
Our intuition tells us that h should be radial if the block of material is large and 
we don’t get too close to the edges, and it should also have the same magnitude 
at all points on the sphere. You see that we are adding a certain amount of guess- 
work — usually called “physical intuition” — to our mathematics in order to find 
the answer. 

When h is radial and spherically symmetric, the integral of the normal com- 
ponent of h over the area is very simple, because the normal component is just 
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the magnitude of h and is constant. The area over which we integrate is 4irR 2 . 
We have then that 

f h nda = h- 4w R 2 (3.23) 


(where h is the magnitude of h). This integral should equal W, the rate at which 
heat is produced at the source. We get 


or 


W 

1 4 tR 2 ’ 

W 

4 ttR 2 gr ’ 


(3.24) 


where, as usual, e r represents a unit vector in the radial direction. Our result 
says that h is proportional to W and varies inversely as the square of the distance 
from the source. 

The result we have just obtained applies to the heat flow in the vicinity of a 
point source of heat. Let’s now try to find the equations that hold in the most 
general kind of heat flow, keeping only the condition that heat is conserved. 
We will be dealing only with what happens at places outside of any sources or 
absorbers of heat. 

The differential equation for the conduction of heat was derived in Chapter 2. 
According to Eq, (2.44), 

h = —k VT . (3.25) 

(Remember that this relationship is an approximate one, but fairly good for some 
materials like metals.) It is applicable, of course, only in regions of the material 
where there is no generation or absorption of heat. We derived above another 
relation, Eq. (3.21), that holds when heat is conserved. If we combine that equation 
with (3.25), we get 

= v * h = - V • (k VT), 

= kV* kV% (3.26) 


dq 

dt 

dq 

dt 


if k is a constant. You remember that q is the amount of heat in a unit volume 
and V • V = V 2 is the Laplacian operator 


v»- a 


2 j\2 2 

+ 1 " — • 

dx 2 ^ by 2 dz 2 


If we now make one more assumption we can obtain a very interesting equa- 
tion. We assume that the temperature of the material is proportional to the heat 
content per unit volume — that is, that the material has a definite specific heat. 
When this assumption is valid (as it often is), we can write 


or 


Aq = c v AT 

dq dT 
di ~ Cv ~di‘ 


(3.27) 


The rate of change of heat is proportional to the rate of change of temperature. 
The constant or proportionality c v is, here, the specific heat per unit volume of 
the material. Using Eq. (3.27) with (3.26), we get 

^ = - V 2 T. (3.28) 

dt c v v ' 

We find that the time rate of change of T— at every point — is proportional to the 
Laplacian of T, which is the second derivative of its spatial dependence. We have 
a differential equation — in x, y, z, and t — for the temperature T. 
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Fig. 3-7. The circulation of C around 
the curve T is the line integral of Ct, the 
tangential component of C. 


(l) 



Fig. 3-8. The circulation around the 
whole loop is the sum of the circulations 
around the two loops: T i = r a + r a 6 
and r 2 = Tb + Tab . 


The differential equation (3.28) is called the heat diffusion equation. It is 
often written as 

^ = D V 2 T, (3.29) 

at 

where D is called the diffusion constant, and is here equal to k/c v . 

The diffusion equation appears in many physical problems — in the diffusion 
of gases, in the diffusion of neutrons, and in others. We have already discussed 
the physics of some of these phenomena in Chapter 43 of Vol. I. Now you have 
the complete equation that describes diffusion in the most general possible situa- 
tiqn. At some later time we will take up ways of solving the diffusion equation 
to find how the temperature varies in particular cases. We turn back now to 
consider other theorems about vector fields. 


3-5 The circulation of a vector field 

We wish now to look at the curl in somewhat the same way we looked at the 
divergence. We obtained Gauss’ theorem by considering the integral over a 
surface, although it was not obvious at the beginning that we were going to be 
dealing with the divergence. How did we know that we were supposed to integrate 
over a surface in order to get the divergence? It was not at all clear that this would 
be the result. And so with an apparent equal lack of justification, we shall calculate 
something else about a vector and show that it is related to the curl. This time we 
calculate what is called the circulation of a vector field. If C is any vector field, 
we take its component along a curved line and take the integral of this component 
all the way around a complete loop. The integral is called the circulation of the 
vector field around the loop. We have already considered a line integral of V\f/ 
earlier in this chapter. Now we do the same kind of thing for any vector field C. 

Let r be any closed loop in space — imaginary, of course. An example is given 
in Fig. 3-7. The line integral of the tangential component of C around the loop 
is written as 

^ C t ds = £ C • ds , (3.30) 

You should note that the integral is taken all the way around, not from one point 
to another as we did before. The little circle on the integral sign is to remind us 
that the integral is to be taken all the way around. This integral is called the 
circulation of the vector field around the curve T. The name came originally from 
considering the circulation of a liquid. But the name — like flux — has been extended 
to apply to any field even when there is no material “circulating.” 

Playing the same kind of game we did with the flux, we can show that the 
circulation around a loop is the sum of the circulations around two partial loops. 
Suppose we break up our curve of Fig. 3-7 into two loops, by joining two points 
(1) and (2) on the original curve by some line that cuts across as shown in Fig. 
3-8. There are now two loops, Ti and r 2 . r x is made up of r o , which is that part 
of the original curve to the left of (1) and (2), plus T abi the “short cut.” r 2 is made 
up of the rest of the original curve plus the short cut. 

The circulation around r i is the sum of an integral along T a and along T a b- 
Similarly, the circulation around r 2 is the sum of two parts, one along T b and the 
other along T ab . The integral along r a & will have, for the curve r 2 , the opposite 
sign from what it has for r ls because the direction of travel is opposite — we must 
take both our line integrals with the same “sense” of rotation. 

Following the same kind of argument we used before, you can see that the 
sum of the two circulations will give just the line integral around the original curve 
T. The parts due to T ab cancel. The circulation around the one part plus the cir- 
culation around the second part equals the circulation about the outer line. 
We can continue the process of cutting the original loop into any number of smaller 
loops. When we add the circulations of the smaller loops, there is always a can- 
cellation of the parts on their adjacent portions, so that the sum is equivalent to the 
circulation around the original single loop. 
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Now let us suppose that the original loop is the boundary of some surface. 
There are, of course, an infinite number of surfaces which all have the original 
loops as the boundary. Our results will not, however, depend on which surface 
we choose. First, we break our original loop into a number of small loops that all 
lie on the surface we have chosen, as in Fig. 3-9. No matter what the shape of 
the surface, if we choose our small loops small enough, we can assume that each 
of the small loops will enclose an area which is essentially flat. Also, we can choose 
our small loops so that each is very nearly a square. Now we can calculate the 
circulation around the big loop T by finding the circulations around all of the 
little squares and then taking their sum. 

3-6 The circulation around a square; Stokes 9 theorem 

How shall we find the circulation for each little square? One question is, 
how is the square oriented in space? We could easily make the calculation if it 
had a special orientation. For example, if it were in one of the coordinate planes. 
Since we have not assumed anything as yet about the orientation of the coordinate 
axes, we can just as well choose the axes so that the one little square we are con- 
centrating on at the moment lies in the xy-plane, as in Fig. 3-10. If our result is 
expressed in vector notation, we can say that it will be the same no matter what the 
particular orientation of the plane. 

We want now to find the circulation of the field C around our little square. 
It will be easy to do the line integral if we make the square small enough that the 
vector C doesn’t change much along any one side of the square. (The assumption 
is better the smaller the square, so we are really talking about infinitesimal squares.) 
Starting at the point (x, y) — the lower left corner of the figure — we go around in 
the direction indicated by the arrows. Along the first side — marked (1) — the 
tangential component is C x (l) and the distance is Ax. The first part of the integral 
is C,(l)Ax. Along the second leg, we get C y (2)Ay. Along the third, we get 
— C x (3)Ax, and along the fourth, — C y (4) Ay. The minus signs are required 
because we want the tangential component in the direction of travel. The whole 
line integral is then 



Fig. 3-9. Some surface bounded by 
the loop T is chosen. The surface is 
divided into a number of small areas, 
each approximately a square. The 
circulation around T is the sum of the 
circulations around the little loops. 


y 



j> C • ds =• C x (l) Ax + C y ( 2) Ay — C x ( 3) Ax — C y { 4) Ay. 
Now let’s look at the first and third pieces. Together they are 


(3.31) Fig. 3-10. Computing the circulation 
of C around a small square. 


[C x ( 1) - C x (3)]Ax. 


(3.32) 


You might think that to our approximation the difference is zero. That is true to 
the first approximation. We can be more accurate, however, and take into account 
the rate of change of C x . If we do, we may write % 

Q(3) = C x (l) + Ay. (3.33) 

If we included the next approximation, it would involve terms in (Ay) 2 , but since 
we will ultimately think of the limit as Ay — > 0, such terms can be neglected. 
Putting (3.33) together with (3.32), we find that 


[C x ( 1) - C x (3)] Ay = - ^ Ax Ay. (3.34) 

The derivative can, to our approximation, be evaluated at (x, y). 

Similarly, for the other two terms in the circulation, we may write 

c y ( 2) Ay - C v ( 4) Ay = Ax Ay. (3.35) 


The circulation around our square is then 

< 3 - 3 « 
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Fig. 3-11. The circulation of C 
around T is the surface integral of the 
normal component of V X C, 


( 2 ) 



W 


Fig. 3-12. If V X C is zero, the 
circulation around the closed curve T is 
zero. The line integral of C * ds from (1 ) 
to (2) along a must be the same as the 
line integral along b. 


which is interesting, because the two terms in the parentheses are just the z-com- 
ponent of the curl. Also, we note that Ax Ay is the area of our square. So we 
can write our circulation (3.36) as 


(V X C) z Aa. 

But the z-component really means the component normal to the surface element. 
We can, therefore, write the circulation around a differential square in an invariant 
vector form : 

fc-ds= (V X C)„ Aa = (V X C) • n A a. (3.37) 

Our result is : the circulation of any vector C around an infinitesimal square 
is the component of the curl of C normal to the surface, times the area of the square. 

The circulation around any loop T can now be easily related to the curl of 
the vector field. We fill in the loop with any convenient surface 5, as in Fig. 3-11, 
and add the circulations around a set of infinitesimal squares in this surface. The 
sum can be written as an integral. Our result is a very useful theorem called Stokes’ 
theorem (after Mr. Stokes). 

Stokes’ Theorem. 

£ C'ds = j s (V X C) n da, (3.38) 

where S is any surface bounded by F. 

We must now speak about a convention of signs. In Fig. 3-10 the z-axis 
would point toward you in a “usual” — that is, “right-handed” — system of axes. 
When we took our line integral with a “positive” sense of rotation, we found that 
the circulation was equal to the z-component of V X C. If we had gone around 
the other way, we would have gotten the opposite sign. Now how shall we know, 
in general, what direction to choose for the positive direction of the “normal” 
component of T X C? The “positive” normal must always be related to the 
sense of rotation, as in Fig. 3-10. It is indicated for the general case in Fig. 3-11. 

One way of remembering the relationship is by the “right-hand rule.” If you 
make the fingers of your right hand go around the curve r, with the fingertips 
pointed in the direction of the positive sense of ds, then your thumb points in the 
direction of the positive normal to the surface 5. 


3-7 Curl-free and divergence-free fields 

We would like, now, to consider some consequences of our new theorems. 
Take first the case of a vector whose curl is everywhere zero. Then Stokes* theorem 
says that the circulation around any loop is zero. Now if we choose two points 
(1) and (2) on a closed curve (Fig. 3-12), it follows that the line integral of the 
tangential component from (1) to (2) is independent of which of the two possible 
paths is taken. We can conclude that the integral from (1) to (2) can depend only 
on the location of these points — that is to say, it is some function of position only. 
The same logic was used in Chapter 14 of Vol. I, where we proved that if the integral 
around a closed loop of some quantity is always zero, then that integral can be 
represented as the difference of a function of the position of the two ends. This 
fact allowed us to invent the idea of a potential. We proved, furthermore, that the 
vector field was the gradient of this potential function (see Eq. 14.13 of Vol. I). 

It follows that any vector field whose curl is zero is equal to the gradient of 
some scalar function. That is, if V X C = 0, everywhere, there is some ^ (psi) for 
which C = V\j / — a useful idea. We can, if we wish, describe this special kind of 
vector field by means of a scalar field. 

Let’s show something else. Suppose we have any scalar field <f> (phi). If we 
take its gradient, V0, the integral of this vector around any closed loop must be 
zero. Its line integral from point (1) to point (2) is [<£( 2) — <f>( 1)]. If (1) and (2) 
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are the same points, our Theorem 1, Eq. (3.8), tells us that the line integral is zero: 

£ V<f> ■ ds — 0. 

loop 

Using Stokes* theorem, we can conclude that 

f V X (r<£) da = 0 

over any surface. But if the integral is zero over any surface, the integrand must 
be zero. So 

V X (V<t>) = 0, always. 

We proved the same result in Section 2-7 by vector algebra. 

Let’s look now at a special case in which we fill in a small loop r with a large 
surface *S, as indicated in Fig. 3-13. We would like, in fact, to see what happens 
when the loop shrinks down to a point, so that the surface boundary disappears — 
the surface becomes closed. Now if the vector C is everywhere finite, the line 
integral around T must go to zero as we shrink the loop — the integral is roughly 
proportional to the circumference of V, which goes to zero. According to Stokes’ 
theorem, the surface integral of (V X C) n must also vanish. Somehow, as we 
close the surface we add in contributions that cancel out what was there before. 
So we have a new theorem: 

I (v X C)„ da = 0. (3.39) 

any closed 
surface 

Now this is interesting, because we already have a theorem about the surface 
integral of a vector field. Such a surface integral is equal to the volume integral 
of the divergence of the vector, according to Gauss* theorem (Eq. 3.18). Gauss* 
theorem, applied to V X C, says 

f (V X c% da = f V • (V X C)dV. (3.40) 

closed volume 

surface inside 

So we conclude that the second integral must also be zero: 

I V ■ (V X C) dV = 0, (3.41) 

any 

volume 

and this is true for any vector field C whatever. Since Eq. (3.41) is true for any 
volume , it must be true that at every point in space the integrand is zero. We have 

V • (V X C) = 0, always. 

But this is the same result we got from vector algebra in Section 2-7. Now we 
begin to see how everything fits together. 

3-8 Summary 

Let us summarize what we have found about the vector calculus. These are 
really the salient points of Chapters 2 and 3 : 

1. The operators d/dx , d/dy, and d/dz can be considered as the three 
components of a vector operator V, and the formulas which result from vector 
algebra by treating this operator as a vector are correct: 



2. The difference of the values of a scalar field at two points is equal to the 
line integral of the tangential component of the gradient of that scalar along 
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Fig. 3-13. Going to the limit of a 
closed surface, we find that the surface 
integral of (V X C) n must vanish. 



any curve at all between the first and second points : 

*(2) - *(1) = f m Vf • ds. (3.42) 

An 

any curve 

3. The surface integral of the normal component of an arbitrary vector 
over a closed surface is equal to the integral of the divergence of the vector over 
the volume interior to the surface: 

j C ■ nda = f V-CdV. (3.43) 

closed volume 

surface inside 

4, The line integral of the tangential component of an arbitrary vector 
around a closed loop is equal to the surface integral of the normal component 
of the curl of that vector over any surface which is bounded by the loop. 

I C- ds = j (V X C) • « da. (3.44) 

boundary surface 
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4 


Electrostatics 


4-1 Statics 


We begin now our detailed study of the theory of electromagnetism. All of 
electromagnetism is contained in the Maxwell equations. 

Maxwell's equations: 


<4 

II 

(4.1) 

VX£--f, 

(4.2) 

+i. 

(4.3) 

V B = 0. 

(4.4) 


4-1 Statics 

4-2 Coulomb’s law; superposition 

4-3 Electric potential 

4-4 E = ~V<f> 

4-5 The flux of E 

4-6 Gauss’ law; the divergence of E 

4-7 Field of a sphere of charge 

4-8 Field lines; equipotential 
surfaces 


The situations that are described by these equations can be very complicated. 

We will consider first relatively simple situations, and learn how to handle them 
before we take up more complicated ones. The easiest circumstance to treat is one 

in which nothing depends on the time — called the static case. All charges are Review: Chapters 13 and 14, Vol. I, 

permanently fixed in space, or if they do move, they move as a steady flow in a Work and Potential Energy 

circuit (so p and j are constant in time). In these circumstances, all of the terms in 
the Maxwell equations which are time derivatives of the field are zero. In this 
case, the Maxwell equations become: 


Electrostatics: 


Magnetostatics: 


V E 
V X E 


P 

— » 

€ o 

0 . 


V X B 
V B 




You will notice an interesting thing about this set of four equations. It can 
be separated into two pairs. The electric field E appears only in the first two, and 
the magnetic field B appears only in the second two. The two fields are not inter- 
connected. This means that electricity and magnetism are distinct phenomena so 
long as charges and currents are static. The interdependence of E and B does not 
appear until there are changes in charges or currents, as when a condensor is 
charged, or a magnet moved. Only when there are sufficiently rapid changes, so 
that the time derivatives in Maxwell’s equations become significant, will E and B 
depend on each other. 

Now if you look at the equations of statics you will see that the study of the 
two subjects we call electrostatics and magnetostatics is ideal from the point of 
view of learning about the mathematical properties of vector fields. Electrostatics 
is a neat example of a vector field with zero curl and a given divergence . Magnet- 
ostatics is a neat example of a field with zero divergence and a given curl . The more 
conventional — and you may be thinking, more satisfactory — way of presenting 
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the theory of electromagnetism is to start first with electrostatics and thus to learn 
about the divergence. Magnetostatics and the curl are taken up later. Finally, 
electricity and magnetism are put together. We have chosen to start with the 
complete theory of vector calculus. Now we shall apply it to the special case of 
electrostatics, the field of E given by the first pair of equations. 

We will begin with the simplest situations — ones in which the positions of all 
charges are specified. If we had only to study electrostatics at this level (as we 
shall do in the next two chapters), life would be very simple — in fact, almost 
trivial. Everything can be obtained from Coulomb’s law and some integration, 
as you will see. In many real electrostatic problems, however, we do not know, 
initially, where the charges are. We know only that they have distributed them- 
selves in ways that depend on the properties of matter. The positions that the 
charges take up depend on the E field, which in turn depends on the positions of 
the charges. Then things can get quite complicated. If, for instance, a charged 
body is brought near a conductor or insulator, the electrons and protons in the 
conductor or insulator will move around. The charge density p in Eq. (4.5) may 
have one part that we know about, from the charge that we brought up ; but there 
will be other parts from charges that have moved around in the conductor. And 
all of the charges must be taken into account. One can get into some rather subtle 
and interesting problems. So although this chapter is to be on electrostatics, it will 
not cover the more beautiful and subtle parts of the subject. It will treat only the 
situation where we can assume that the positions of all the charges are known. 
Naturally, you should be able to do that case before you try to handle the other 
ones. 


4-2 Coulomb’s law; superposition 

It would be logical to use Eqs. (4.5) and (4.6) as our starting points. It will 
be easier, however, if we start somewhere else and come back to these equations. 
The results will be equivalent. We will start with a law that we have talked about 
before, called Coulomb’s law, which says that between two charges at rest there is 
a force directly proportional to the product of the charges and inversely propor- 
tional to the square of the distance between. The force is along the straight line 
from one charge to the other. 


Coulomb’s law: 


Fi = 


1 


1?2 , 


47T€ 0 /f, 


- -f 2 


12 


(4.9) 


Fi is the force on charge q u e 12 is the unit vector in the direction to q i from q 2i 
and r x 2 is the distance between q 1 and q 2 . The force F 2 on q 2 is equal and opposite 
to Fi. 

The constant of proportionality, for historical reasons, is written as l/47re 0 . 
In the system of units which we use — the mks system — it is defined as exactly 
10“ 7 times the speed of light squared. Now since the speed of light is approxi- 
mately 3 X 10 s meters per second, the constant is approximately 9 X 10 9 , and 
the unit turns out to be newton-meter 2 per coulomb 2 or volt-meter per coulomb. 


— = 10“ 7 c 2 (by definition) 

= 9.0 X 10 9 (by experiment). (4.10) 

Unit: newton-meter 2 /coulomb 2 , 
or volt-meter/coulomb. 


When there are more than two charges present — the only really interesting 
times — we must supplement Coulomb’s law with one other fact of nature: the 
force on any charge is the vector sum of the Coulomb forces from each of the other 
charges. This fact is called “the principle of superposition.” That’s all there is to 
electrostatics. If we combine the Coulomb law and the principle of superposition, 
there is nothing else. Equations (4.5) and (4.6) — the electrostatic equations — say 
no more and no less. 
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When applying Coulomb’s law, it is convenient to introduce the idea of an 
electric field. We say that the field E(X) is the force per unit charge on q x (due to 
all other charges). Dividing Eq. (4.9) by q l9 we have, for one other charge besides 

< 4 -"> 


Also, we consider that E(\) describes something about the point (1) even if q x 
were not there — assuming that all other charges keep their same positions. We 
say; £(1) is the electric field at the point (1). 

The electric field £ is a vector, so by Eq. (4.1 1) we really mean three equations 
— one for each component. Writing out explicitly the ^-component, Eq. (4.11) 
means 


Ex(x i, y u zi) 


J72 *1 - *2 

47T€ 0 [(*i - x 2 ) 2 + Oi - y 2 ) 2 + (Zl “ ^ 2 ) 2 ] 3/2 ’ 


(4.12) 


and similarly for the other components. 

If there are many charges present, the field E at any point (1) is a sum of the 
contributions from each of the other charges. Each term of the sum will look like 
(4.11) or (4.12). Letting q$ be the magnitude of the /th charge, and r 1} the dis- 
placement from q } to the point (1), we write 


Which means, of course, 


w) = E 

3 


1 

47T€ 0 r\ } 


e u . 


(4.13) 


Ex(xu yu *i) 


E _i gj(*i ~ X } ) 

} 47T€o [(jCi - Xj ) 2 + Oi - y,) 2 + (Zl - Zj) 2 ]S/2 ’ 


(4.14) 


and so on. 

Often it is convenient to ignore the fact that charges come in packages like 
electrons and protons, and think of them as being spread out in a continuous smear 
— or in a “distribution,” as it is called. This is O.K. so long as we are not interested 
in what is happening on too small a scale. We describe a charge distribution by 
the “charge density,” p{x , y f z). If the amount of charge in a small volume AV 2 
located at the point (2) is Aq 2i then p is defined by 


Aq 2 = p(2)AV 2 . 


(4.15) 


To use Coulomb’s law with such a description, we replace the sums of Eqs. 
(4.13) or (4.14) by integrals over all volumes containing charges. Then we have 


p/i\ 1 f p(2)^ 12^1^2 

£(1) "4 WoJ ^ 

all 12 

space 


Some people prefer to write 


„ _ rit 

*12 *= — : 
^12 


(4.16) 


where r X2 is the vector displacement to (1) from (2), as shown in Fig. 4-1. The 
integral for E is then written as 


E( 1) = 
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(4.17) 


When we want to calculate something with these integrals, we usually have to 
write them out in explicit detail. For the ^-component of either Eq. (4.16) or 
(4.17), we would have 
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Fig. 4-1. The electric field E at 
point (1), from a charge distribution, is 
obtained from an integral over the 
distribution. Point ( 1 ) could also be inside 
the distribution. 
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We are not going to use this formula much. We write it here only to empha- 
size the fact that we have completely solved all the electrostatic problems in which 
we know the locations of all of the charges. Given the charges, what are the fields? 
Answer: Do this integral. So there is nothing to the subject; it is just a case of 
doing complicated integrals over three dimensions — strictly a job for a computing 
machine! 

With our integrals we can find the fields produced by a sheet of charge, from 
a line of charge, from a spherical shell of charge, or from any specified distribution. 
It is important to realize, as we go on to draw field lines, to talk about potentials, 
or to calculate divergences, that we already have the answer here. It is merely a 
matter of it being sometimes easier to do an integral by some clever guesswork 
than by actually carrying it out. The guesswork requires learning all kinds of 
strange things. In practice, it might be easier to forget trying to be clever and al- 
ways to do the integral directly instead of being so smart. We are, however, going 
to try to be smart about it. We shall go on to discuss some other features of the 
electric field. 



Fig. 4-2. The work done in carrying 
a charge from a to b is the negative of 
the integral of F • ds along the path 
taken. 


4-3 Electric potential 

First we take up the idea of electric potential, which is related to the work done 
in carrying a charge from one point to another. There is some distribution of 
charge, which produces an electric field. We ask about how much work it would 
take to carry a small charge from one place to another. The work done against 
the electrical forces in carrying a charge along some path is the negative of the com- 
ponent of the electrical force in the direction of the motion, integrated along the 
path. If we carry a charge from point a to point b. 



where F is the electrical force on the charge at each point, and ds is the differential 
vector displacement along the path. (See Fig. 4-2.) 

It is more interesting for our purposes to consider the work that would be 
done in carrying one unit of charge. Then the force on the charge is numerically 
the same as the electric field. Calling the work done against electrical forces in this 
case JT(unit), we write 

IT(unit) — — f E • ds. (4.19) 


Now, in general, what we get with this kind of an integral depends on the path we 
take. But if the integral of (4.19) depended on the path from a to b , we could get 
work out of the field by carrying the charge to b along one path and then back to a 
on the other. We would go to b along the path for which W is smaller and back 
along the other, getting out more work than we put in. 

There is nothing impossible, in principle, about getting energy out of a field. 
We shall, in fact, encounter fields where it is possible. It could be that as you move 
a charge you produce forces on the other part of the “machinery.” If the “ma- 
chinery” moved against the force it would lose energy, thereby keeping the total 
energy in the world constant. For electrostatics , however, there is no such “ma- 
chinery.” We know what the forces back on the sources of the field are. They are 
the Coulomb forces on the charges responsible for the field. If the other charges 
are fixed in position — as we assume in electrostatics only — these back forces can 
do no work on them. There is no way to get energy from them — provided, of 
course, that the principle of energy conservation works for electrostatic situations. 
We believe that it will work, but let’s just show that it must follow from Coulomb’s 
law of force. 

We consider first what happens in the field due to a single charge q. Let 
point a be at the distance r x from q, and point b at r 2 . Now we carry a different 
charge, which we will call the “test” charge, and whose magnitude we choose to 
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be one unit, from a to b. Let’s start with the easiest possible path to calculate. We 
carry our test charge first along the arc of a circle, then along a radius, as shown in 
part (a) of Fig. 4-3. Now on that particular path it is child’s play to find the work 
done (otherwise we wouldn’t have picked it). First, there is no work done at all 
on the path from a to a f . The field is radial (from Coulomb’s law), so it is at right 
angles to the direction of motion. Next, on the path from a ' to b , the field is in the 
direction of motion and varies as l/r 2 . Thus the work done on the test charge 
in carrying it from a to b would be 



E-ds 


q f h ± Wl_lY 
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(4.20) 


Now let’s take another easy path. For instance, the one shown in part (b) of 
Fig. 4-3. It goes for awhile along an arc of a circle, then radially for awhile, then 
along an arc again, then radially, and so on. Every time we go along the circular 
parts, we do no work. Every time we go along the radial parts, we must just 
integrate l/r 2 . Along the first radial stretch, we integrate from r 0 to r 0 ', then 
along the next radial stretch from r a . to r a ", and so on. The sum of all these in- 
tegrals is the same as a single integral directly from r a to r b . We get the same answer 
for this path that we did for the first path we tried. It is clear that we would get 
the same answer for any path which is made up of an arbitrary number of the same 
kinds of pieces. 

What about smooth paths? Would we get the same answer? We discussed 
this point previously in Chapter 13 of Vol. I. Applying the same arguments used 
there, we can conclude that work done in carrying a unit charge from a to b is 
independent of the path. 


IF(unit) 
a — ► b t 




— I E-ds. 


any 

path 




Fig. 4-3. In carrying a test charge 
from a to b the same work is done along 
either path. 


Since the work done depends only on the endpoints, it can be represented as 
the difference between two numbers. We can see this in the following way. Let’s 
choose a reference point P 0 and agree to evaluate our integral by using a path that 
always goes by way of point Po. Let stand for the work done against the field 
in going from P 0 to point a, and let <t>(b) be the work done in going from P 0 to 
point b (Fig. 4-4). The work in going to P 0 from a (on the way to b ) is the negative 
of <f>(a), so we have that 



E-ds — <f>(b) — 


(4.21) 


Since only the difference in the function 4> at two points is ever involved, we 
do not really have to specify the location of P 0 . Once we have chosen some 
reference point, however, a number <j> is determined for any point in space; <f> is 
then a scalar field. It is a function of x, y, z. We call this scalar function the elec- 
trostatic potential at any point. 



Electrostatic potential: 


HP) = 



E-ds. 


(4.22) 


For convenience, we will often take the reference point at infinity. Then, 
for a single charge at the origin, the potential <f> is given for any point (x, y, z ) — 
using Eq. (4.20): 


H*, y, z) = 


q I 
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(4.23) 


Fig. 4-4. The work done in going 
along any path from a to b is the negative 
of the work from some point Po to a plus 
the work from Po to b. 


The electric field from several charges can be written as the sum of the electric 
field from the first, from the second, from the third, etc. When we integrate the 
sum to find the potential we get a sum of integrals. Each of the integrals is the 
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potential from one of the charges. We conclude that the potential 0 from a lot of 
charges is the sum of the potentials from all the individual charges. There is a 
superposition principle also for potentials. Using the same kind of arguments by 
which we found the electric field from a group of charges and for a distribution of 
charges, we can get the complete formulas for the potential 4> at a point we call (1): 


«( 1 ) 

0 ( 1 ) 


y 1 sl 9 

4xe 0 r u 5 

1 f p(2)dV 2 
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(4.24) 

(4.25) 


Remember that the potential <j> has a physical significance: it is the potential 
energy which a unit charge would have if brought to the specified point in space 
from some reference point. 


4-4 E =-V<f> 

Who cares about </>? Forces on charges are given by E, the electric field. The 
point is that E can be obtained easily from <f > — it is as easy, in fact, as taking a 
derivative. Consider two points, one at x and one at (x + dx\ but both at the 
same y and z, and ask how much work is done in carrying a unit charge from one 
point to the other. The path is along the horizontal line from x to x + dx. The 
work done is the difference in the potential at the two points: 

LW = 4>(x + Ax, y, z) - 4>(x, y, z) = Ax. 


But the work done against the field for the same path is 
AW = -jE-ds = -E x Ax. 

We see that 

*--!£• < 4 “> 

Similarly, E y = —d<f>/dy, E z = —d<j>/dz, or, summarizing with the notation of 
vector analysis, 

E = -V*. (4.27) 

This equation is the differential form of Eq. (4.22). Any problem with specified 
charges can be solved by computing the potential from (4.24) or (4.25) and using 
(4.27) to get the field. Equation (4.27) also agrees with what we found from vector 
calculus: that for aUy scalar field 0 

/ * ds — $(b) - (4.28) 

J a 

According to Eq. (4.25) the scalar potential 0 is given by a three-dimensional 
integral similar to the one we had for E. Is there any advantage to computing <f> 
rather than El Yes. There is only one integral for <f>, while there are three integrals 
for E — because it is a vector. Furthermore, 1/r is usually a little easier to integrate 
than Jt/r 3 . It turns out in many practical cases that it is easier to calculate <f> and 
then take the gradient to find the electric field, than it is to evaluate the three 
integrals for E. It is merely a practical matter. 

There is also a deeper physical significance to the potential 0. We have shown 
that E of Coulomb’s law is obtained from E — — grad 0, when <j> is given by 
(4.22). But if E is equal to the gradient of a scalar field, then we know from the 
vector calculus that the curl of E must vanish: 
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V x E = 0. 


(4.29) 



But that is just our second fundamental equation of electrostatics, Eq. (4.6). We 
have shown that Coulomb’s law gives an E field that staisfies that condition. So 
far, everything is all right. 

We had really proved that V X E was zero before we defined the potential. 
We had shown that the work done around a closed path is zero. That is, that 

^E- ds = 0 

for any path. We saw in Chapter 3 that for any such field V X E must be zero 
everywhere. The electric field in electrostatics is an example of a curl-free field. 

You can practice your vector calculus by proving that V X E is zero in a dif- 
ferent way — by computing the components of V X E for the field of a point charge, 
as given by Eq. (4.1 1). If you get zero, the superposition principle says you would 
get zero for the field of any charge distribution. 

We should point out an important fact. For any radial force the work done is 
independent of the path, and there exists a potential. If you think about it, the 
entire argument we made above to show that the work integral was independent 
of the path depended only on the fact that the force from a single charge was 
radial and spherically symmetric. It did not depend on the fact that the dependence 
on distance was as 1 /r 2 — there could have been any r dependence. The existence 
of a potential, and the fact that the curl of E is zero, comes really only from the 
symmetry and direction of the electrostatic forces. Because of this, Eq. (4-28) — 
or (4.29) — can contain only part of the laws of electricity. 


4-5 The flux of E 

We will now derive a field equation that depends specifically and directly on 
the fact that the force law is inverse square. That the field varies inversely as the 
square of the distance seems, for some people, to be “only natural,” because “that’s 
the way things spread out.” Take a light source with light streaming out: the 
amount of light that passes through a surface cut out by a cone with its apex at 
the source is the same no matter at what radius the surface is placed. It must be so 
if there is to be conservation of light energy. The amount of light per unit area — 
the intensity — must vary inversely as the area cut by the cone, i.e., inversely as the 
square of the distance from the source. Certainly the electric field should vary 
inversely as the square of the distance for the same reason! But there is no such 
thing as the “same reason” here. Nobody can say that the electric field measures 
the flow of something like light which must be conserved. If we had a “model” 
of the electric field in which the electric field vector represented the direction and 
speed — say the current — of some kind of little “bullets” which were flying out, 
and if our model required that these bullets were conserved, that none could ever 
disappear once it was shot out of a charge, then we might say that we can “see” 
that the inverse square law is necessary. On the other hand, there would necessarily 
be some mathematical way to express this physical idea. If the electric field were 
like conserved bullets going out, then it would vary inversely as the square of the 
distance and we would be able to describe that behavior by an equation — which 
is purely mathematical. Now there is no harm in thinking this way, so long as we 
do not say that the electric field is made out of bullets, but realize that we are 
using a model to help us find the right mathematics. 

Suppose, indeed, that we imagine for a moment that the electric field did 
represent the flow of something that was conserved — everywhere, that is, except 
at charges. (It has to start somewhere!) We imagine that whatever it is flows out 
of a charge into the space around. If E were the vector of such a flow (as h is for 
heat flow), it would have a 1/r 2 dependence near a point source. Now we wish to 
use this model to find out how to state the inverse square law in a deeper or more 
abstract way, rather than simply saying “inverse square.” (You may wonder 
why we should want to avoid the direct statement of such a simple law, and want 
instead to imply the same thing sneakily in a different way. Patience! It will turn 
out to be useful.) 
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Fig. 4-7. Any volume can be thought 
of as completely made up of infinitesimal 
truncated cones. The flux of E from one 
end of each conical segment is equal and 
opposite to the flux from the other end. 
The total flux from the surface S is 
therefore zero. 



surface, the flux out is not zero. 


We ask: What is the “flow” of E out of an arbitrary closed surface in the 
neighborhood of a point charge? First let’s take an easy surface — the one shown 
in Fig. 4-5. If the E field is like a flow, the net flow out of this box should be zero. 
That is what we get if by the “flow” from this surface we mean the surface integral 
of the normal component of E — that is, the flux of E. On the radial faces, the nor- 
mal component is zero. On the spherical faces, the normal component E n is just 
the magnitude of E — minus for the smaller face and plus for the larger face. The 
magnitude of E decreases as 1/r 2 , but the surface area is proportional to r 2 , so 
the product is independent of r. The flux of E into face a is just cancelled by the 
flux out of face b. The total flow out of S is zero, which is to say that for this 
surface 

[ E n da = 0. (4.30) 

Js 

Next we show that the two end surfaces may be tilted with respect to the 
radial line without changing the integral (4.30). Although it is true in general, for 
our purposes it is only necessary to show that this is true when the end surfaces are 
small, so that they subtend a small angle from the source — in fact, an infinitesimal 
angle. In Fig. 4-6 we show a surface S whose “sides” are radial, but whose “ends” 
are tilted. The end surfaces are not small in the figure, but you are to imagine the 
situation for very small end surfaces. Then the field E will be sufficiently uniform 
over the surface that we can use just its value at the center. When we tilt the sur- 
face by an angle 0, the area is increased by the factor 1/cos 6. But E nt the compo- 
nent of E normal to the surface, is decreased by the factor cos 0. The product 
E n A a is unchanged. The flux out of the whole surface S is still zero. 

Now it is easy to see that the flux out of a volume enclosed by any surface S 
must be zero. Any volume can be thought of as made up of pieces, like that in 
Fig. 4-6. The surface will be subdivided completely into pairs of end surfaces, 
and since the fluxes in and out of these end surfaces cancel by pairs, the total flux 
out of the surface will be zero. The idea is illustrated in Fig. 4-7. We have the 
completely general result that the total flux of E out of any surface S in the field 
of a point charge is zero. 

But notice! Our proof works only if the surface S does not surround the charge. 
What would happen if the point charge were inside the surface? We could still 
divide our surface into pairs of areas that are matched by radial lines through the 
charge, as shown in Fig. 4-8. The fluxes through the two surfaces are still equal — 
by the same arguments as before — only now they have the same sign. The flux 
out of a surface that surrounds a charge is not zero. Then what is it? We can find 
out by a little trick. Suppose we “remove” the charge from the “inside” by sur- 
rounding the charge by a little surface S' totally inside the original surface S, as 
shown in Fig. 4-9. Now the volume enclosed between the two surfaces S and S' 
has no charge in it. The total flux out of this volume (including that through S') 
is zero, by the arguments we have given above. The arguments tell us, in fact, that 
the flux into the volume through S' is the same as the flux outward through S. 
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We can choose any shape we wish for S', so let’s make it a sphere centered on 
the charge, as in Fig. 4-10. Then we can easily calculate the flux through it. If the 
radius of the little sphere is r , the value of E everywhere on its surface is 

47T€q r 2 


and is directed always normal to the surface. We find the total flux through S' if 
we multiply this normal component of E by the surface area: 


Flux through the suface S' 



Q 

— , 
6 o 


(4.31) 


a number independent of the radius of the sphere! We know then that the flux 
outward through S is also q/e 0 — a value independent of the shape of S so long as 
the charge q is inside. 

We can write our conclusions as follows: 



Fig. 4-9. The flux through S is the 
same as the flux through $\ 


/ 

any s urf ace S 


E n da 


1 0; q outside S 

— ; q inside S' 
fo 


(4.32) 


Let’s return to our “bullet” analogy and see if it makes sense. Our theorem 
says that the net flow of bullets through a surface is zero if the surface does not 
enclose the gun that shoots the bullets. If the gun is enclosed in a surface, whatever 
size and shape it is, the number of bullets passing through is the same — it is given 
by the rate at which bullets are generated at the gun. It all seems quite reasonable 
for conserved bullets. But does the model tell us anything more than we get 
simply by writing Eq. (4.32)? No one has succeeded in making these “bullets” do 
anything else but produce this one law. After that, they produce nothing but 
errors. That is why today we prefer to represent the electromagnetic field purely 
abstractly. 



4-6 Gauss’ law; the divergence of E 


Our nice result, Eq. (4.32), was proved for a single point charge. Now suppose 
that there are two charges, a charge q± at one point and a charge q 2 at another. 
The problem looks more difficult. The electric field whose normal component we 
integrate for the flux is the field due to both charges. That is, if E x represents the 
electric field that would have been produced by q x alone, and E 2 represents the 
electric field produced by q 2 alone, the total electric field is E = E x + E 2 . The 
flux through any closed surface S is 

f (E ln + E 2n ) da = f E ln da + f E 2n da , . (4.33) 

Js Js Js 


The flux with both charges present is the flux due to a single charge plus the flux 
due to the other charge. If both charges are outside S , the flux through S is zero. 
If qi is inside S but q 2 is outside, then the first integral gives q i/e 0 and the second 
integral gives zero. If the surface encloses both charges, each will give its contribu- 
tion and we have that the flux is (q x + q 2 )/e 0 . The general rule is clearly that the 
total flux out of a closed surface is equal to the total charge inside , divided by € 0 . 
Our result is an important general law of the electrostatic field, called Gauss’ 

law. 


Gauss ’ law: 


or 


where 


/ 

any closed 
surface S 


E n da 


/ 


E * n da 


any closed 
surface S 


sum of charges inside 


Gint ^ 

*0 


Gint ~ y y Qi* 

inside S 


(4.34) 

(4.35) 

(4.36) 


Fig. 4-10. The flux through a spheri- 
cal surface containing a point charge 

q is q/fo- 
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If we describe the location of charges in terms of a charge density p, we can con- 
sider that each infinitesimal volume dV contains a “point” charge p dV. The sum 
over all charges is then the integral 

(3int= / pdV. (4.37) 

volume 
inside S 

From our derivation you see that Gauss’ law follows from the fact that the 
exponent in Coulomb’s law is exactly two. A 1/r 3 field, or any \/r n field with 
n 2, would not give Gauss’ law. So Gauss’ law is just an expression, in a dif- 
ferent form, of the Coulomb law of forces between two charges. In fact, working 
back from Gauss’ law, you can derive Coulomb’s law. The two are quite equiva- 
lent so long as we keep in mind the rule that the forces between charges is radial. 

We would now like to write Gauss’ law in terms of derivatives. To do this, 
we apply Gauss’ law to an infinitesimal cubical surface. We showed in Chapter 3 
that the flux of E out of such a cube is V * E times the volume dV of the cube. The 
charge inside of dV, by the definition of p, is equal to p dV, so Gauss’ law gives 


or 


V -EdV 


pdV 

9 

€0 


V E = 


P_. 


(4.38) 


The differential form of Gauss’ law is the first of our fundamental field equations of 
electrostatics, Eq. (4.5). We have now shown that the two equations of electro- 
statics, Eqs. (4.5) and (4.6), are equivalent to Coulomb’s law of force. We will 
now consider one example of the use of Gauss’ law. (We will come later to many 
more examples.) 



Fig. 4-1 1 . Using Gauss' law to find 
the field of a uniform sphere of charge. 


4-7 Field of a sphere of charge 

One of the difficult problems we had when we studied the theory of gravita- 
tional attractions was to prove that the force produced by a solid sphere of matter 
was the same at the surface of the sphere as it would be if all the matter were 
concentrated at the center. For many years Newton didn’t make public his 
theory of gravitation, because he couldn’t be sure this theorem was true. We 
proved the theorem in Chapter 13 of Vol. I by doing the integral for the 
potential and then finding the gravitational force by using the gradient. Now we 
can prove the theorem in a most simple fashion. Only this time we will prove the 
corresponding theorem for a uniform sphere of electrical charge. (Since the laws 
of electrostatics are the same as those of gravitation, the same proof could be 
done for the gravitational field.) 

We ask: What is the electric field £ at a point P anywhere outside the surface 
of a sphere filled with a uniform distribution of charge? Since there is no “special” 
direction, we can assume that E is everywhere directed away from the center of the 
sphere. We consider an imaginary surface that is spherical and concentric with 
the sphere of charge, and that passes through the point P (Fig. 4-11). For this 
surface, the flux outward is 


jE n da = E • 4t rR 2 . 


Gauss’ law tells us that this flux is equal to the total charge Q of the sphere (over e 0 ): 


or 


E-AirR 2 = 

Co 
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Fig. 4—1 2. Field lines and equipotential surfaces for a positive point charge. 


which is the same formula we would have for a point charge Q. We have proved 
Newton’s problem more easily than by doing the integral. It is, of course, a false 
kind of easiness — it has taken you some time to be able to understand Gauss’ law, 
so you may think that no time has really been saved. But after you have used the 
theorem more and more, it begins to pay. It is a question of efficiency. 


4-8 Field lines; equipotential surfaces 

We would like now to give a geometrical description of the electrostatic field. 
The two laws of electrostatics, one that the flux is proportional to the charge inside 
and the other that the electric field is the gradient of a potential, can also be repre- 
sented geometrically. We illustrate this with two examples. 

First, we take the field of a point charge. We draw lines in the direction of the 
field — lines which are always tangent to the field, as in Fig. 4-12. These are called 
field lines. The lines show everywhere the direction of the electric vector. But we 
also wish to represent the magnitude of the vector. We can make the rule that the 
strength of the electric field will be represented by the “density” of the lines. By 
the density of the lines we mean the number of lines per unit area through a sur- 
face perpendicular to the lines. With these two rules we can have a picture of the 
electric field. For a point charge, the density of the lines must decrease as 1/r 2 . 
But the area of a spherical surface perpendicular to the lines at any radius r increases 
as r 2 , so if we always keep the same number of lines for all distances from the 
charge, the density will remain in proportion to the magnitude of the field. We can 
guarantee that there are the same number of lines at every distance if we insist 
that the lines be continuous — that once a line is started from the charge, it never 
stops. In terms of the field lines, Gauss’ law says that lines should start only at 
plus charges and stop at minus charges. The number which leave a charge q must 
be equal to q/e Q . 

Now, we can find a similar geometrical picture for the potential </>. The easiest 
way to represent the potential is to draw surfaces on which ^ is a constant. We call 
them equipotential surfaces — surfaces of equal potential. Now what is the geometri- 
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A Note about Units 

Quantity Unit 

newton 
coulomb 
meter 
joule 

p ~ Q/L s coulomb/meter 3 

1/ €o ~ FL 2 /Q 2 newton- meter 2 /coulomb 2 

E ~ F/Q newton/coulomb 

<f> ~ W/ Q joule/coulomb = volt 

E ~ <I>/L volt/meter 

1/eo ^ EL 2 IQ volt* meter/ coulomb 


cal relationship of the equipotential surfaces to the field lines? The electric field is 
the gradient of the potential. The gradient is in the direction of the most rapid 
change of the potential, and is therefore perpendicular to an equipotential surface. 
If E were not perpendicular to the surface, it would have a component in the 
surface. The potential would be changing in the surface, but then it wouldn’t be 
an equipotential. The equipotential surfaces must then be everywhere at right 
angles to the electric field lines. 

For a point charge all by itself, the equipotential surfaces are spheres centered 
at the charge. We have shown in Fig. 4-12 the intersection of these spheres with a 
plane through the charge. 

As a second example, we consider the field near two equal charges, a positive 
one and a negative one. To get the field is easy. The field is the superposition of 
the fields from each of the two charges. So, we can take two pictures like Fig. 4-12 
and superimpose them — impossible! Then we would have field lines crossing each 
other, and that’s not possible, because E can’t have two directions at the same point. 
The disadvantage of the field-line picture is now evident. By geometrical argu- 
ments it is impossible to analyze in a very simple way where the new lines go. 
From the two independent pictures, we can’t get the combined picture. The 
principle of superposition, a simple and deep principle about electric fields, does 
not have, in the field-line picture, an easy representation. 

The field-line picture has its uses, however, so we might still like to draw the 
picture for a pair of equal (and opposite) charges. If we calculate the fields from 
Eq. (4.13) and the potentials from (4.23), we can draw the field lines and equi- 
potentials. Figure 4-13 shows the result. But we first had to solve the problem 
mathematically! 
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Application of Gauss 9 Law 


5-1 Electrostatics is Gauss 9 law plus . . . 

There are two laws of electrostatics: that the flux of the electric field from a 
volume is proportional to the charge inside — Gauss’ law, and that the circulation 
of the electric field is zero — E is a gradient. From these two laws, all the predictions 
of electrostatics follow. But to say these things mathematically is one thing; to 
use them easily, and with a certain amount of ingenuity, is another. In this chapter 
we will work through a number of calculations which can be made with Gauss’ law 
directly. We will prove theorems and describe some effects, particularly in con- 
ductors, that can be understood very easily from Gauss’ law. Gauss’ law by itself 
cannot give the solution of any problem because the other law must be obeyed too. 
So when we use Gauss’ law for the solution of particular problems, we will have to 
add something to it. We will have to presuppose, for instance, some idea of how 
the field looks — based, for example, on arguments of symmetry. Or we may have 
to introduce specifically the idea that the field is the gradient of a potential. 


5-2 Equilibrium in an electrostatic field 

Consider first the following question: When can a point charge be in stable 
mechanical equilibrium in the electric field of other charges? As an example, 
imagine three negative charges at the corners of an equilateral triangle in a hori- 
zontal plane. Would a positive charge placed at the center of the triangle remain 
there? (It will be simpler if we ignore gravity for the moment, although including 
it would not change the results.) The force on the positive charge is zero, but 
is the equilibrium stable? Would the charge return to the equilibrium position if 
displaced slightly? The answer is no. 

There are no points *of stable equilibrium in any electrostatic field — except 
right on top of another charge. Using Gauss’ law, it is easy to see why. First, for a 
charge to be in equilibrium at any particular point P 0 , the field must be zero. 
Second, if the equilibrium is to be a stable one, we require that if we move the 
charge away from P 0 in any direction, there should be a restoring force directed 
opposite to the displacement. The electric field at all nearby points must be 
pointing inward — toward the point P 0 . But that is in violation of Gauss’ law if 
there is no charge at P 0 , as we can easily see. 

Consider a tiny imaginary surface that encloses P 0f as in Fig. 5-1. If the 
electric field everywhere in the vicinity is pointed toward P 0 , the surface integral 
of the normal component is certainly not zero. For the case shown in the figure, 
the flux through the surface must be a negative number. But Gauss’ law says that 
the flux of electric field through any surface is proportional to the total charge 
inside. If there is no charge at P 0i the field we have imagined violates Gauss’ law. 
It is impossible to balance a positive charge in empty space — at a point where 
there is not some negative charge. A positive charge can be in equilibrium if it is 
in the middle of a distributed negative charge. Of course, the negative charge 
distribution would have to be held in place by other than electrical forces! 

Our result has been obtained for a point charge. Does the same conclusion 
hold for a complicated arrangement of charges held together in fixed relative 
positions — with rods, for example? We consider the question for two equal 
charges fixed on a rod. Is it possible that this combination can be in equilibrium 
in some electrostatic field? The answer is again no. The total force on the rod 
cannot be restoring for displacements in every direction. 
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Fig. 5-1. If Po were a position of 
stable equilibrium for a positive charge, 
the electric field everywhere in the 
neighborhood would point toward Pq. 
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Call F the total force on the rod in any position — F is then a vector field. 
Following the argument used above, we conclude that at a position of stable equi- 
librium, the divergence of F must be a negative number. But the total force on the 
rod is the first charge times the field at its position, plus the second charge times 
the field at its position: 

F = qxE x + q 2 E 2 . (5.1) 


The divergence of F is given by 

V • F = q t (V ■ E 1 ) + q 2 (V • E 2 ). 


If each of the two charges q i and q 2 is in free space, both V * E x and V * E 2 are 
zero, and V • F is zero — not negative, as would be required for equilibrium. You 
can see that an extension of the argument shows that no rigid combination of any 
number of charges can have a position of stable equilibrium in an electrostatic 
field in free space. 


Fig. 5-2. A charge can be in equili- 
brium if there are mechanical constraints. 



Now we have not shown that equilibrium is forbidden if there are pivots or 
other mechanical constraints. As an example, consider a hollow tube in which a 
charge can move back and forth freely, but not sideways. Now it is very easy to 
devise an electric field that points inward at both ends of the tube if it is allowed 
that the field may point laterally outward near the center of the tube. We simply 
place positive charges at each end of the tube, as in Fig. 5-2. There can now be an 
equilibrium point even though the divergence of E is zero. The charge, of course, 
would not be in stable equilibrium for sideways motion were it not for “non- 
electrical’ ' forces from the tube walls. 


5-3 Equilibrium with conductors 

There is no stable spot in the field of a system of fixed charges. What about 
a system of charged conductors? Can a system of charged conductors produce a 
field that will have a stable equilibrium point for a point charge? (We mean at a 
point other than on a conductor, of course.) You know that conductors have the 
property that charges can move freely around in them. Perhaps when the point 
charge is displaced slightly, the other charges on the conductors will move in a way 
that will give a restoring force to the point charge? The answer is still no — al- 
though the proof we have just given doesn’t show it. The proof for this case is 
more difficult, and we will only indicate how it goes. 

First, we note that when charges redistribute themselves on the conductors, 
they can only do so if their motion decreases their total potential energy. (Some 
energy is lost to heat as they move in the conductor.) Now we have already shown 
that if the charges producing a field are stationary , there is, near any zero point P 0 
in the field, some direction for which moving a point charge away from P 0 will 
decrease the energy of the system (since the force is away from P 0 ). Any readjust- 
ment of the charges on the conductors can only lower the potential energy still 
more, so (by the principle of virtual work) their motion will only increase the force 
in that particular direction away from P 0 , and not reverse it. 

Our conclusions do not mean that it is not possible to balance a charge by 
electrical forces. It is possible if one is willing to control the locations or the sizes 
of the supporting charges with suitable devices. You know that a rod standing on 
its point in a gravitational field is unstable, but this does not prove that it cannot 
be balanced on the end of a finger. Similarly, a charge can be held in one spot by 
electric fields if they are variable . But not with a passive — that is, a static — system. 
5-2 



5-4 Stability of atoms 

If charges cannot be held stably in position, it is surely not proper to imagine 
matter to be made up of static point charges (electrons and protons) governed only 
by the laws of electrostatics. Such a static configuration is impossible; it would 
collapse! 

It was once suggested that the positive charge of an atom could be distributed 
uniformly in a sphere, and the negative charges, the electrons, could be at rest 
inside the positive charge, as shown in Fig. 5-3. This was the first atomic model, 
proposed by Thompson. But Rutherford concluded from the experiment of Geiger 
and Marsden that the positive charges were very much concentrated, in what he 
called the nucleus. Thompson’s static model had to be abandoned. Rutherford 
and Bohr then suggested that the equilibrium might be dynamic, with the electrons 
revolving in orbits, as shown in Fig. 5-4. The electrons would be kept from falling 
in toward the nucleus by their orbital motion. We already know at least one 
difficulty with this picture. With such motion, the electrons would be accelerating 
(because of the circular motion) and would, therefore, be radiating energy. They 
would lose the kinetic energy required to stay in orbit, and would spiral in toward 
the nucleus. Again unstable! 

The stability of the atoms is now explained in terms of quantum mechanics. 
The electrostatic forces pull the electron as close to the nucleus as possible, but the 
electron is compelled to stay spread out in space over a distance given by the 
uncertainty principle. If it were confined in too small a space, it would have a 
great uncertainty in momentum. But that means that it would have a high ex- 
pected energy — which it would use to escape from the electrical attraction. The 
net result is an electrical equilibrium not too different from the idea of Thompson 
— only it is the negative charge that is spread out (because the mass of the electron 
is so much smaller than the mass of the proton). 
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CONCENTRATED 
AT THE CENTER 


Fig. 5-3. The Thompson model of an 
atom. 



Fig. 5-4. The Rutherford -Bohr model 
of an atom. 


5-5 The field of a line charge 


Gauss’ law can be used to solve a number of electrostatic field problems in- 
volving a special symmetry — usually spherical, cylindrical, or planar symmetry. 
In the remainder of this chapter we will apply Gauss* law to a few such problems. 
The ease with which these problems can be solved may give the misleading impres- 
sion that the method is very powerful, and that one should be able to go on to 
many other problems. It is unfortunately not so. One soon exhausts the list of 
problems that can be solved easily with Gauss’ law. In later chapters we will 
develop more powerful methods for investigating electrostatic fields. 

As our first example, we consider a system with cylindrical symmetry. Suppose 
that we have a very long, uniformly charged rod. By this we mean that electric 
charges are distributed uniformly along an indefinitely long straight line, with the 
charge X per unit length. We wish to know the electric field. The problem can, of 
course, be solved by integrating the contribution to the field from every part of 
the line. We are going to do it without integrating, by using Gauss’ law and some 
guesswork. First, we surmise that the electric field will be directed radially outward 
from the line. Any axial component from charges on one side would be accom- 
panied by an equal axial component from charges on the other side. The result 
could only be a radial field. It also seems reasonable that the field should have the 
same magnitude at all points equidistant from the line. This is obvious. (It may 
not be easy to prove, but it is true if space is symmetric — as we believe it is.) 

We can use Gauss’ law in the following way. We consider an imaginary 
surface in the shape of a cylinder coaxial with the line, as shown in Fig. 5-5. 
According to Gauss* law, the total flux of E from this surface is equal to the charge 
inside divided by e 0 . Since the field is assumed to be normal to the surface, the 
normal component is the magnitude of the field. Let’s call it E. Also, let the radius 
of the cylinder be r, and its length be taken as one unit, for convenience. The flux 
through the cylindrical surface is equal to E times the area of the surface, which is 
2irr. The flux through the two end faces is zero because the electric field is tan- 
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Fig. 5-5. A cylindrical gaussian sur- 
face coaxial with a line charge. 




Fig. 5-6. The electric field near a 
uniformly charged sheet can be found by 
applying Gauss’ law to an imaginary box. 
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Fig. 5-7. The field between two 
charged sheets is <r/e 0 ' 


gential to them. The total charge inside our surface is just X, because the length of 
the line inside is one unit. Gauss’ law then gives 


E • 2irr = X/€ 0 , 


E = 


X 

2? T€ 0 /* 


(5.2) 


The electric field of a line charge depends inversely on the first power of the 
distance from the line. 


5-6 A sheet of charge; two sheets 


As another example, we will calculate the field from a uniform plane sheet of 
charge. Suppose that the sheet is infinite in extent and that the charge per unit 
area is <r. We are going to take another guess. Considerations of symmetry lead 
us to believe that the field direction is everywhere normal to the plane, and if we 
have no field from any other charges in the world , the fields must be the same (in 
magnitude) on each side. This time we choose for our Gaussian surface a rec- 
tangular box that cuts through the sheet, as shown in Fig. 5-6. The two faces 
parallel to the sheet will have equal areas, say A. The field is normal to these two 
faces, and parallel to the other four. The total flux is E times the area of the first 
face, plus E times the area of the opposite face — with no contribution from the 
other four faces. The total charge enclosed in the box is a A. Equating the flux to 
the charge inside, we have 

EA + EA = — » 

€0 


from which 



(5.3) 


a simple but important result. 

You may remember that the same result was obtained in an earlier chapter 
by an integration over the entire surface. Gauss’ law gives us the answer, in this 
instance, much more quickly (although it is not as generally applicable as the 
earlier method). 

We emphasize that this result applies only to the field due to the charges on 
the sheet. If there are other charges in the neighborhood, the total field near the 
sheet would be the sum of (5.3) and the field of the other charges. Gauss’ law 
would then tell us only that 

Ei + E 2 = f . (5.4) 

€o 


where E\ and E 2 are the fields directed outward on each side of the sheet. 

The problem of two parallel sheets with equal and opposite charge densities, 
+<r and —a, is equally simple if we assume again that the outside world is quite 
symmetric. Either by superposing two solutions for a single sheet or by construct- 
ing a gaussian box that includes both sheets, it is easily seen that the field is zero 
outside of the two sheets (Fig. 5-7a). By considering a box that includes only one 
surface or the other, as in (b) or (c) of the figure, it can be seen that the field 
between the sheets must be twice what it is for a single sheet. The result is 


E (between the sheets) = <r/e o, (5.5) 

E (outside) = 0. (5.6) 


5-7 A sphere of charge; a spherical shell 

We have already (in Chapter 4) used Gauss’ law to find the field outside a 
uniformly charged spherical region. The same method can also give us the field 
at points inside the sphere. For example, the computation can be used to obtain 
a good approximation to the field inside an atomic nucleus. In spite of the fact 
that the protons in a nucleus repel each other, they are, because of the strong nu- 
clear forces, spread nearly uniformly throughout the body of the nucleus. 
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Suppose that we have a sphere of radius R filled uniformly with charge. Let 
p be the charge per unit volume. Again using arguments of symmetry, we assume 
the field to be radial and equal in magnitude at all points at the same distance 
from the center. To find the field at the distance r from the center, we take a 
spherical gaussian surface of radius r (r < R ), as shown in Fig. 5-8. The flux out 
of this surface is 

4irr 2 E, 

The charge inside our gaussian surface is the volume inside times p, or 

%irr 2 p. 

Using Gauss’ law, it follows that the magnitude of the field is given by 

E = g (r<R). (5.7) 

You can see that this formula gives the proper result for r = R. The electric field 
is proportional to the radius and is directed radially outward. 

The arguments we have just given for a uniformly charged sphere can be 
applied also to a thin spherical shell of charge. Assuming that the field is every- 
where radial and is spherically symmetric, one gets immediately from Gauss’ 
law that the field outside the shell is like that of a point charge, while the field 
everywhere inside the shell is zero. (A gaussian surface inside the shell will con- 
tain no charge.) 

5-8 Is the field of a point charge exactly 1/r 2 ? 

If we look in a little more detail at how the field inside the shell gets to be zero, 
we can see more clearly why it is that Gauss’ law is true only because the coulomb 
force depends exactly on the square of the distance. Consider any point P inside 
a uniform spherical shell of charge. Imagine a small cone whose apex is at P and 
which extends to the surface of the sphere, where it cuts out a small surface area 
Aai, as in Fig. 5-9. An exactly symmetric cone diverging from the opposite side 
of P would cut out the surface area A a 2 . If the distances from P to these two ele- 
ments of area are r x and r 2 , the areas are in the ratio 

Aa 2 r\ 

A a x ~~ r\ * 

(You can show this by geometry for any point P inside the sphere.) 

If the surface of the sphere is uniformly charged, the charge Aq on each of the 
elements of area is proportional to the area, so 

Aq 2 _ Aa2 

Aqi Aai 

Coulomb’s law then says that the magnitudes of the fields produced at P by these 
two surface elements are in the ratio 

= Qz/rl ___ . 

£1 qi/r\ 

The fields cancel exactly. Since all parts of the surface can be paired off in the same 
way, the total field at P is zero. But you can see that it would not be so if the 
exponent of r in Coulomb’s law were not exactly two. 

The validity of Gauss’ law depends upon the inverse square law of Coulomb. 
If the force law were not exactly the inverse square, it would not be true that the 
field inside a uniformly charged sphere would be exactly zero. For instance, if the 
force varied more rapidly, like, say, the inverse cube of r, that portion of the sur- 
face which is nearer to an interior point would produce a field which is larger than 
that which is farther away, resulting in a radial inward field for a positive surface 
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Fig. 5-8. Gauss’ law can be used to 
find the field inside a uniformly charged 
sphere. 



Fig. 5-9. The field is zero at any 
point P inside a spherical shell of charge. 





Fig. 5-10. The electric field is zero 
inside a closed conducting shell. 


i 


charge. These conclusions suggest an elegant way of finding out whether the in- 
verse square law is precisely correct. We need only determine whether or not the 
field inside of a uniformly charged spherical shell is precisely zero. 

It is lucky that such a method exists. It is usually difficult to measure a physical 
quantity to high precision — a one percent result may not be too difficult, but how 
would one go about measuring, say, Coulomb’s law to an accuracy of one part in 
a billion? It is almost certainly not possible with the best available techniques to 
measure the force between two charged objects with such an accuracy. But by 
determining only that the electric fields inside a charged sphere are smaller than 
some value we can make a highly accurate measurement of the correctness of 
Gauss’ law, and hence of the inverse square dependence of Coulomb’s law. What 
one does, in effect, is compare the force law to an ideal inverse square. Such com- 
parisons of things that are equal, or nearly so, are usually the bases of the most 
precise physical measurements. 

How shall we observe the field inside a charged sphere? One way is to try 
to charge an object by touching it to the inside of a spherical conductor. You 
know that if we touch a small metal ball to a charged object and then touch it to 
an electrometer the meter will become charged and the pointer will move from 
zero (Fig. 5- 10a). The ball picks up charge because there are electric fields outside 
the charged sphere that cause charges to run onto (or off) the little ball. If you do 
the same experiment by touching the little ball to the inside of the charged sphere, 
you find that no charge is carried to the electrometer. With such an experiment 
you can easily show that the field inside is, at most, a few percent of the field out- 
side, and that Gauss’ law is at least approximately correct. 

It appears that Benjamin Franklin was the first to notice that the field inside a 
conducting shell is zero. The result seemed strange to him. When he reported his 
observation to Priestley, the latter suggested that it might be connected with an 
inverse square law, since it was known that a spherical shell of matter produced 
no gravitational field inside. But Coulomb didn’t measure the inverse square 
dependence until 18 years later, and Gauss’ law came even later still. 

Gauss’ law has been checked carefully by putting an electrometer inside a 
large sphere and observing whether any deflections occur when the sphere is 
charged to a high voltage. A null result is always obtained. Knowing the geometry 
of the apparatus and the sensitivity of the meter, it is possible to compute the 
minimum field that would be observed. From this number it is possible to place an 
upper limit on the deviation of the exponent from two. If we write that the elec- 
trostatic force depends on r~ 2+e , we can place an upper bound on e. By this method 
Maxwell determined that e was less than 1/10,000, The experiment was repeated 
and improved upon in 1936 by Plimpton and Laughton. They found that Coulomb’s 
exponent differs from two by less than one part in a billion. 

Now that brings up an interesting question: How accurate do we know this 
Coulomb law to be in various circumstances? The experiments we just described 
measure the dependence of the field on distance for distances of some tens of 
centimeters. But what about the distances inside an atom — in the hydrogen 
atom, for instance, where we believe the electron is attracted to the nucleus by 
the same inverse square law? It is true that quantum mechanics must be used for 
the mechanical part of the behavior of the electron, but the force is the usual 
electrostatic one. In the formulation of the problem, the potential energy of an 
electron must be known as a function of distance from the nucleus, and Coulomb’s 
law gives a potential which varies inversely with the first power of the distance. 
How accurately is the exponent known for such small distances? As a result of 
very careful measurements in 1947 by Lamb and Retherford on the relative 
positions of the energy levels of hydrogen, we know that the exponent is correct 
again to one part in a billion on the atomic scale — that is, at distances of the order 
of one angstrom (10” 8 centimeter). 

The accuracy of the Lamb-Retherford measurement was possible again 
because of a physical “accident.” Two of the states of a hydrogen atom are 
expected to have almost indentical energies only if the potential varies exactly as 
1 /r. A measurement was made of the very slight difference in energies by finding 
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the frequency co of the photons that are emitted or absorbed in the transition from 
one state to the other, using for the energy difference A E = fua. Computations 
showed that A E would have been noticeably different from what was observed if 
the exponent in the force law 1/r 2 differed from 2 by as much as one part in a billion. 

Is the same exponent correct at still shorter distances? From measurements in 
nuclear physics it is found that there are electrostatic forces at typical nuclear 
distances — at about 10“ 13 centimeter — and that they still vary approximately as 
the inverse square. We shall look at some of the evidence in a later chapter. 
Coulomb’s law is, we know, still valid, at least to some extent, at distances of the 
order of 10“ 13 centimeter. 

How about 10“ 14 centimeter? This range can be investigated by bombarding 
protons with very energetic electrons and observing how they are scattered. Re- 
sults to date seem to indicate that the law fails at these distances. The electrical 
force seems to be about 10 times too weak at distances less than 10“ 14 centimeter. 
Now there are two possible explanations. One is that the Coulomb law does not 
work at such small distances; the other is that our objects, the electrons and 
protons, are not point charges. Perhaps either the electron or proton, or both, is 
some kind of a smear. Most physicists prefer to think that the charge of the proton 
is smeared. We know that protons interact strongly with mesons. This implies 
that a proton will, from time to time, exist as a neutron with a 7r + meson around 
it. Such a configuration would act — on the average — like a little sphere of positive 
charge. We know that the field from a sphere of charge does not vary as 1/r 2 all 
the way into the center. It is quite likely that the proton charge is smeared, but 
the theory of pions is still quite incomplete, so it may also be that Coulomb’s law 
fails at very small distances. The question is still open. 

One more point: The inverse square law is valid at distances like one meter 
and also at 10“ lo m; but is the coefficient l/47re 0 the same? The answer is yes; 
at least to an accuracy of 1 5 parts in a million. 

We go back now to an important matter that we slighted when we spoke of 
the experimental verification of Gauss’ law. You may have wondered how the 
experiment of Maxwell or of Plimpton and Laughton could give such an accuracy 
unless the spherical conductor they used was a perfect sphere. An accuracy of 
one part in a billion is really something to achieve, and you might well ask whether 
they could make a sphere which was that precise. There are certain to be slight 
irregularities in any real sphere and if there are irregularities, will they not produce 
fields inside? We wish to show now that it is not necessary to have a perfect sphere. 
It is possible, in fact, to show that there is no field inside a closed conducting shell 
of any shape. In other words, the experiments depended on 1 /r 2 , but had nothing 
to do with the surface being a sphere (except that with a sphere it is easier to cal- 
culate what the fields would be if Coulomb had been wrong), so we take up that 
subject now. To show this, it is necessary to know some of the properties of 
electrical conductors. 


5-9 The fields of a conductor 

An electrical conductor is a solid that contains many “free” electrons. The 
electrons can move around freely in the material, but cannot leave the surface. 
In a metal there are so many free electrons that any electric field will set large 
numbers of them into motion. Either the current of electrons so set up must be 
continually kept moving by external sources of energy, or the motion of the 
electrons will cease as they discharge the sources producing the initial field. In 
“electrostatic” situations, we do not consider continuous sources of current (they 
will be considered later when we study magnetostatics), so the electrons move only 
until they have arranged themselves to produce zero electric field everywhere 
inside the conductor. (This usually happens in a small fraction of a second.) If 
there were any field left, this field would urge still more electrons to move; the 
only electrostatic solution is that the field is everywhere zero inside. 

Now consider the interior of a charged conducting object. (By “interior” we 
mean in the metal itself.) Since the metal is a conductor, the interior field must 
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Fig. 5-11. The electric field just out- 
side the surface of a conductor is pro- 
portional to the local surface density of 
charge. 



Fig. 5-12. What is the field in an 
empty cavity of a conductor, for any 
shape? 


be zero, and so the gradient of the potential 0 is zero. That means that does not 
vary from point to point. Every conductor is an equipotential region , and its 
surface is an equipotential surface. Since in a conducting material the electric 
field is everywhere zero, the divergence of E is zero, and by Gauss’ law the charge 
density in the interior of the conductor must be zero. 

If there can be no charges in a conductor, how can it ever be charged? What 
do we mean when we say a conductor is “charged”? Where are the charges? 
The answer is that they reside at the surface of the conductor, where there are 
strong forces to keep them from leaving — they are not completely “free.” When 
we study solid-state physics, we shall find that the excess charge of any conductor 
is on the average within one or two atomic layers of the surface. For our present 
purposes, it is accurate enough to say that if any charge is put on, or in, a conductor 
it all accumulates on the surface; there is no charge in the interior of a conductor. 

We note also that the electric field just outside the surface of a conductor must 
be normal to the surface. There can be no tangential component. If there were a 
tangential component, the electrons would move along the surface; there are no 
forces preventing that. Saying it another way: we know that the electric field lines 
must always go at right angles to an equipotential surface. 

We can also, using Gauss’ law, relate the field strength just outside a conductor 
to the local density of the charge at the surface. For a gaussian surface, we take a 
small cylindrical box half inside and half outside the surface, like the one shown 
in Fig. 5-1 1. There is a contribution to the total flux of E only from the side of the 
box outside the conductor. The field just outside the surface of a conductor is then 

Outside a conductor: 

E= f. (5.8) 

Co 

where o* is the local surface charge density. 

Why does a sheet of charge on a conductor produce a different field than just 
a sheet of charge? In other words, why is (5.8) twice as large as (5.3)? The reason, 
of course, is that we have not said for the conductor that there are no “other” 
charges around. There must, in fact, be some to make E — 0 in the conductor. 
The charges in the immediate neighborhood of appoint P on the surface do, in fact, 
give a field £i oca i = <riocai/2e 0 both inside and outside the surface. But all the 
rest of the charges on the conductor “conspire” to produce an additional field at 
the point P equal in magnitude to iTiocai- The total field inside goes to zero and 
the field outside to 2£i oca i = a/e 0 . 


5-10 The field in a cavity of a conductor 

We return now to the problem of the hollow container — a conductor with a 
cavity. There is no field in the metal , but what about in the cavity! We shall show 
that if the cavity is empty then there are no fields in it, no matter what the shape of 
the conductor or the cavity — say for the one in Fig. 5-12. Consider a gaussian 
surface, like S in Fig. 5-12, that encloses the cavity but stays everywhere in the 
conducting material. Everywhere on S the field is zero, so there is no flux through 
S and the total charge inside S is zero. For a spherical shell, one could then argue 
from symmetry that there could be no charge inside. But, in general, we can only 
say that there are equal amounts of positive and negative charge on the inner 
surface of the conductor. There could be a positive surface charge on one part 
and a negative one somewhere else, as indicated in Fig. 5-12. Such a thing cannot 
be ruled out by Gauss’ law. 

What really happens, of course, is that any equal and opposite charges on 
the inner surface would slide around to meet each other, cancelling out completely. 
We can show that they must cancel completely by using the law that the circulation 
of E is always zero (electrostatics). Suppose there were charges on some parts of 
the inner surface. We know that there would have to be an equal number of op- 
posite charges somewhere else. Now any lines of E would have to start on the 
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positive charges and end on the negative charges (since we are considering only the 
case that there are no free charges in the cavity). Now imagine a loop r that crosses 
the cavity along a line of force from some positive charge to some negative charge, 
and returns to its starting point via the conductor (as in Fig, 5-12). The integral 
along such a line of force from the positive to the negative charges would not be 
zero. The integral through the metal is zero, since E — 0. So we would have 

j>E-ds * 0??? 

But the line integral of E around any closed loop in an electrostatic field is always 
zero. So there can be no fields inside the empty cavity, nor any charges on the 
inside surface. 

You should notice carefully one important qualification we have made. 
We have always said “inside an empty ” cavity. If some charges are placed at some 
fixed locations in the cavity — as on an insulator or on a small conductor insulated 
from the main one — then there can be fields in the cavity. But then that is not an 
“empty” cavity. 

We have shown that if a cavity is completely enclosed by a conductor, no 
static distribution of charges outside can ever produce any fields inside. This 
explains the principle of “shielding” electrical equipment by placing it in a metal 
can. The san^^rgurgents can be used to show that no static distribution of charges 
inside a closea conductor can produce any fields outside . Shielding works both 
ways! In electrostatics — but not in varying fields — the fields on the two sides of a 
closed conducting shell are completely independent. 

Now you see why it was possible to check Coulomb’s law to such a great 
precision. The shape of the hollow shell used doesn’t matter. It doesn’t need to 
be spherical; it could be square! If Gauss’ law is exact, the field inside is always 
zero. Now you also understand why it is safe to sit inside the high-voltage terminal 
of a million-volt van de Graaff generator, without worrying about getting a 
shock — because of Gauss’ law. 
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6 


The Electric Field in Various Circumstances 


6-1 Equations of the electrostatic potential 

This chapter will describe the behavior of the electric field in a number of 
different circumstances. It will provide some experience with the way the electric 
field behaves, and will describe some of the mathematical methods which are 
used to find this field. 

We begin by pointing out that the whole mathematical problem is the solution 
of two equations, the Maxwell equations for electrostatics: 

( 6 . 1 ) 

Co 

v X E = 0. (6.2) 


In fact, the two can be combined into a single equation. From the second equation, 
we know at once that we can describe the field as the gradient of a scalar (see 
Section 3—7): 

E = — v<f>, (6.3) 


We may, if we wish, completely describe any particular electric field in terms 
of its potential 4>. We obtain the differential equation that <f> must obey by sub- 
stituting Eq. (6.3) into (6.1), to get 

V • V<I> = — — ■ (6.4) 

Co 


The divergence of the gradient of <j> is the same as V 2 operating on <f>: 


V 


v * - v2 * = s + 


4,4 

dy 2 i " dz 2 


so we write Eq. (6.4) as 


V 2 <f> = 


P_ 

€0 


(6.5) 

( 6 . 6 ) 


The operator V 2 is called the Laplacian, and Eq (6 6) is called the Poisson equa- 
tion. The entire subject of electrostatics, from a mathematical point of view, is 
merely a study of the solutions of the single equation (6.6). Once </> is obtained by 
solving Eq. (6.6) we can find E immediately from Eq. (6.3). 

We take up first the special class of problems in which p is given as a function 
of x, y, z. In that case the problem is almost trivial, for we already know the 
solution of Eq. (6.6) for the general case. We have shown that if p is known at 
every point, the potential at point (1) is 


0(1) = 


f P(2) dV 2 
J 47T€ 0 /*12 ’ 


(67) 


where p(2) is the charge density, dV 2 is the volume element at point (2), and r i 2 
is the distance between points (1) and (2). The solution of the differential equation 
(6.6) is reduced to an integration over space. The solution (6.7) should be especially 
noted, because there are many situations in physics that lead to equations like 


V 2 (something) = (something else), 
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and Eq. (6.7) is a prototype of the solution for any of these problems. 

The solution of electrostatic field problems is thus completely straightforward 
when the positions of all the charges are known. Let’s see how it works in a few 
examples. 
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potential 

The electric dipole 
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gradient 

The dipole approximation for 
an arbitrary distribution 

The fields of charged 
conductors 

The method of images 

A point charge near a 
conducting plane 

A point charge near a 
conducting sphere 
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High-voltage breakdown 

The field emission microscope 
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Fig. 6—1. A dipole: two charges 
+q and — q the distance d apart. 



Fig. 6-2. The water molecule H 2 0. 
The hydrogen atoms have slightly less 
than their share of the electron cloud; the 
oxygen, slightly more. 


6-2 The electric dipole 


First, take two point charges, -\-q and —q, separated by the distance d. Let 
the z-axis go through the charges, and pick the origin halfway between, as shown 
in Fig. 6-1. Then, using (4.24), the potential from the two charges is given by 


4>(x, y, z) 

1 


+ 


-9 


47T€ 0 lV[z — (d/ 2)] 2 + X 2 + y 2 V[z + (d/2)] 2 + x 2 + y 2 


i]- 


( 6 . 8 ) 


We are not going to write out the formula for the electric field, but we can always 
calculate it once we have the potential. So we have solved the problem of two 
charges. 

There is an important special case in which the two charges are very close 
together — which is to say that we are interested in the fields only at distances from 
the charges large in comparison with their separation. We call such a close pair 
of charges a dipole. Dipoles are very common. 

A “dipole” antenna can often be approximated by two charges separated by a 
small distance — if we don’t ask about the field too close to the antenna. (We are 
usually interested in antennas with moving charges; then the equations of statics 
do not really apply, but for some purposes they are an adequate approximation.) 

More important perhaps, are atomic dipoles. If there is an electric field in 
any material, the electrons and protons feel opposite forces and are displaced 
relative to each other. In a conductor, you remember, some of the electrons 
move to the surfaces, so that the field inside becomes zero. In an insulator the 
electrons cannot move very far ; they are pulled back by the attraction of the nu- 
cleus. They do, however, shift a little bit. So although an atom, or molecule, 
remains neutral in an external electric field, there is a very tiny separation of its 
positive and negative charges and it becomes a microscopic dipole. If we are 
interested in the fields of these atomic dipoles in the neighborhood of ordinary- 
sized objects, we are normally dealing with distances large compared with the 
separations of the pairs of charges. 

In some molecules the charges are somewhat separated even in the absence 
of external fields, because of the form of the molecule. In a water molecule, for 
example, there is a net negative charge on the oxygen atom and a net positive 
charge on each of the two hydrogen atoms, which are not placed symmetrically 
but as in Fig. 6-2. Although the charge of the whole molecule is zero, there is a 
charge distribution with a little more negative charge on one side and a little 
more positive charge on the other. This arrangement is certainly not as simple 
as two point charges, but when seen from far away the system acts like a dipole. 
As we shall see a little later, the field at large distances is not sensitive to the 
fine details. 

Let’s look, then, at the field of two opposite charges with a small separation 
d . If d becomes zero, the two charges are on top of each other, the two potentials 
cancel, and there is no field. But if they are not exactly on top of each other, we 
can get a good approximation to the potential by expanding the terms of (6.8) in 
a power series in the small quantity d (using the binomial expansion). Keeping 
terms only to first order in d, we can write 


It is convenient to write 
Then 


( z ~i) 


x 2 + y 2 +z 2 = r 2 . 


and 




V[z~— (d/2)] 2 + x 2 + y 2 \/ r 2 [\ — (zd/r 2 )] r 
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Using the binomial expansion again for [1 — {zd/r 2 )] 1/2 — and throwing away 
terms with higher powers than the square of d — we get 


Similarly, 


1 

r 


( 


i + 


1 zrf\ 

2 r 2 ) ' 


1 

V[z + (d/2)] 2 + x 2 + y 2 



The difference of these two terms gives for the potential 


<t>(x , y , z) 


1 

47T6 0 



(6.9) 


The potential, and hence the field, which is its derivative, is proportional to qd , 
the product of the charge and the separation. This product is defined as the 
dipole moment of the two charges, for which we will use the symbol p (do not 
confuse with momentum!): 

p = qd. (6.10) 


Equation (6.9) can also be written as 


<j>(x P y, z) 


1 p cos 6 
47reo r 2 


( 6 . 11 ) 


since z/r = cos $ , where 6 is the angle between the axis of the dipole and the 
radius vector to the point (x, y , z) — see Fig. 6-1. The potential of a dipole decreases 
as \/r 2 for a given direction from the axis (whereas for a point charge it goes as 
1 /r). The electric field E of the dipole will then decrease as 1/r 3 . 

We can put our formula into a vector form if we define p as a vecior whose 
magnitude is p and whose direction is along the axis of the dipole, pointing from 
q _ toward q+. Then 

cos 6 = p • e ri (6.12) 

where e r is the unit radial vector (Fig. 6-3). We can also represent the point 
( x , y, z) by r. Then 


Dipole potential: 1 p e r _ 1 p r 

^ ^ 47re 0 r 2 47re 0 r 3 


(6.13) 



Fig. 6-3. Vector notation 
dipole. 


for 


a 


This formula is valid for a dipole with any orientation and position if r represents 
the vector from the dipole to the point of interest. 

If we want the electric field of the dipole we can get it by taking the gradient 
of <£. For example, the z-component of the field is —d<t>fdz. For a dipole oriented 
along the z-axis we can use (6.9): 


d<j> 

dz 


or 


47re 0 dz 


E z = 


M = £_ (1 _ 

V/ 8 / 47re 0 \r 3 r 5 J 


p 3 cos 2 6 — 1 


47T6o 


The x- and j'-components are 


_ _P_ 3zx 
x 47re 0 r» ’ 


r 8 


P - -1— 

v 47re 0 


(6.14) 


These two can be combined to give one component directed perpendicular to the 
z-axis, which we will call the transverse component E±: 


- VEfTEZ - ^ 


or 


Ex = 


p 3 cos d sin 6 


4-ire 0 


(6.15) 
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The transverse component E x is in the x-y plane and points directly away from 
the axis of the dipole. The total field, of course, is 


E = VE 2 + E 2 . 

2 _L 


The dipole field varies inversely as the cube of the distance from the dipole. 
On the axis, at 6 = 0, it is twice as strong as at 6 = 90°. At both of these special 
angles the electric field has only a z-component, but of opposite sign at the two 
places (Fig. 6-4). 



Fig. 6-4. 
dipole. 


The electric field of a 


6-3 Remarks on vector equations 

This is a good place to make a general remark about vector analysis. The 
fundamental proofs can be expressed by elegant equations in a general form, but 
in making various calculations and analyses it is always a good idea to choose 
the axes m some convenient way. Notice that when we were finding the potential 
of a dipole we chose the z-axis along the direction of the dipole, rather than at some 
arbitrary angle. This made the work much easier. But then we wrote the equations 
in vector form so that they would no longer depend on any particular coordinate 
system. After that, we are allowed to choose any coordinate system we wish, 
knowing that the relation is, in general, true. It clearly doesn’t make any sense to 
bother with an arbitrary coordinate system at some complicated angle when you 
can choose a neat system for the particular problem — provided that the result can 
finally be expressed as a vector equation. So by all means take advantage of the 
fact that vector equations are independent of any coordinate system. 

On the other hand, if you are trying to calculate the divergence of a vector, 
instead of just looking at V • E and wondering what it is, don’t forget that it can 
always be spread out as 

dE x dE y dE z 
dx dy dz 


If you can then work out the x-, y-, and z-components of the electric field and 
differentiate them, you will have the divergence. There often seems to be a feeling 
that there is something inelegant — some kind of defeat involved — in writing out 
the components; that somehow there ought always to be a way to do everything 
with the vector operators. There is often no advantage to it. The first time we 
encounter a particular kind of problem, it usually helps to write out the components 
to be sure we understand what is going on. There is nothing inelegant about put- 
ting numbers into equations, and nothing inelegant about substituting the deriva- 
tives for the fancy symbols. In fact, there is often a certain cleverness in doing 
just that. Of course when you publish a paper in a professional journal it will look 
better — and be more easily understood — if you can write everything in vector form. 
Besides, it saves print. 


6-4 The dipole potential as a gradient 

We would like to point out a rather amusing thing about the dipole formula, 
Eq. (6.13). The potential can also be written as 


= - 


47T€( 


p V 




If you calculate the gradient of 1 /V, you get 


(6.16) 


v 



r __ e r 
r* ~ ~~r 2 3 


and Eq. (6.16) is the same as Eq. (6.13). 

How did we think of that? We just remembered that e r /r 2 appeared in the 
formula for the field of a point charge, and that the field was the gradient of a 
potential which has a 1 fr dependence. 
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There is a physical reason for being able to write the dipole potential in the 
form of Eq. (6.16), Suppose we have a point charge q at the origin. The potential 
at the point P at (, x , y, z) is 



(Let’s leave off the 1/47 T€o while we make these arguments; we can stick it in at 
the end.) Now if we move the charge +q up a distance A z, the potential at P will 
change a little, by, say, A<£+. How much is A<£ + ? Well, it is just the amount that 
the potential would change if we were to leave the charge at the origin and move 
P downward by the same distance A z (Fig. 6-5). That is, 


Atf>+ = 


d0O 

dz 


Az, 


where by Az we mean the same as d/2. So, using <j> = q/r , we have that the po- 
tential from the positive charge is 


i = g _ i_ m i 

r dz \r) 2 


(6.17) 


Applying the same reasoning for the potential from the negative charge, 
we can write 


<t>~ = 




(6.18) 


The total potential is the sum of (6.17) and (6.18): 


<f> = 0 + + <j>— = 



(6.19) 



Fig. 6-5. The potential at P from a 
point charge at Az above the origin is the 
same as the potential at P'(Az below P) 
from the same charge at the origin. 


For other orientation of the dipole, we could represent the displacement of 
the positive charge by the vector A r + . We should then write Eq. (6.17) as 


A <!>+ = — V0 o -Ar + , 

where A r is then to be replaced by d/2. Completing the derivation as before, 
Eq. (6.19) would then become 



This is the same as Eq. (6.16), if we replace qd = p , and put back the l/47re 0 . 
Looking at it another way, we see that the dipole potential, Eq. (6.13), can be 
interpreted as 

$ = ~P * V$ 0 > (6.20) 

where 4> 0 = l/47re 0 r is the potential of a unit point charge. 

Although we can always find the potential of a known charge distribution by 
an integration, it is sometimes possible to save time by getting the answer with a 
clever trick. For example, one can often make use of the superposition principle. 
If we are given a charge distribution that can be made up of the sum of two dis- 
tributions for which the potentials are already known, it is easy to find the de- 
sired potential by just adding the two known ones. One example of this is our 
derivation of (6.20), another is the following. 

Suppose we have a spherical surface with a distribution of surface charge 
that varies as the cosine of the polar angle. The integration for this distribution is 
fairly messy. But, surprisingly, such a distribution can be analyzed by super- 
position. For imagine a sphere with a uniform volume density of positive charge, 
and another sphere with an equal uniform volume density of negative charge, 
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Fig. 6-6. Two uniformly charged 
spheres, superposed with a slight displace- 
ment, are equivalent to a nonuniform 
distribution of surface charge. 



originally superposed to make a neutral — that is, uncharged — sphere. If the 
positive sphere is then displaced slightly with respect to the negative sphere, the 
body of the uncharged sphere would remain neutral, but a little positive charge will 
appear on one side, and some negative charge will appear on the opposite side, 
as illustrated in Fig. 6-6. If the relative displacement of the two spheres is small, 
the net charge is equivalent to a surface charge (on a spherical surface), and the 
surface charge density will be proportional to the cosine of the polar angle. 

Now if we want the potential from this distribution, we do not need to do an 
integral. We know that the potential from each of the spheres of charge is — for 
points outside the sphere — the same as from a point charge. The two displaced 
spheres are like two point charges; the potential is just that of a dipole. 

In this way you can show that a charge distribution on a sphere of radius a 
with a surface charge density 

a = <r 0 cos 6 


produces a field outside the sphere which is just that of a dipole whose moment is 

47 r<7 0 a 3 

P = —j~. 


It can also be shown that inside the sphere the field is constant, with the value 


E 


£o_ m 
3eo 


If 6 is the angle from the positive z-axis, the electric field inside the sphere is in the 
negative z-direction. The example we have just considered is not as artificial as 
it may appear; we will encounter it again in the theory of dielectrics. 


6-5 The dipole approximation for an arbitrary distribution 

The dipole field appears in another circumstance both interesting and im- 
portant. Suppose that we have an object that has a complicated distribution of 
charge — like the water molecule (Fig. 6-2) — and we are interested only in the 
fields far away. We will show that it is possible to find a relatively simple expression 
for the fields which is appropriate for distances large compared with the size of 
the object. 

We can think of our object as an assembly of point charges q t in a certain limited 
region, as shown in Fig. 6-7. (We can, later, replace q % by p dV if we wish.) Let 
each charge q l be located at the displacement d % from an origin chosen somewhere 


Fig. 6—7. Computation of the po- 
tential at a point P at a large distance 
from a set of charges. 
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in the middle of the group of charges. What is the potential at the point P , located 
at R, where R is much larger than the maximum d t l The potential from the 
whole collection is given by 


<t> = 


1 V* Q± , 
47 re 0 “ r ' ’ 


( 6 . 21 ) 


where r % is the distance from P to the charge q % (the length of the vector R — d % ). 
Now if the distance from the charges to P , the point of observation, is enormous, 
each of the /•*’ s can be approximated by R. Each term becomes q t /R, and we 
can take \/R out as a factor in front of the summation. This gives us the simple 
result 


<t> = 


1 1 
47reo R 


£ 


Q 

4ire 0 R * 


( 6 . 22 ) 


where Q is just the total charge of the whole object. Thus we find that for points 
far enough from any lump of charge, the lump looks like a point charge. The 
result is not too surprising. 

But what if there are equal numbers of positive and negative charges? Then 
the total charge Q of the object is zero. This is not an unusual case; in fact, as we 
know, objects are usually neutral. The water molecule is neutral, but the charges 
are not all at one point, so if we are close enough we should be able to see some 
effects of the separate charges. We need a better approximation than (6.22) for 
the potential from an arbitrary distribution of charge in a neutral object. Equation 
(6.21) is still precise, but we can no longer just set r t = R. We need a more accu- 
rate expression for r % . If the point P is at a large distance, r t will differ from R to 
an excellent approximation by the projection of d on /?, as can be seen from 
Fig. 6-7. (You should imagine that P is really farther away than is shown in the 
figure.) In other words, if e r is the unit vector in the direction of R , then our next 
approximation to r % is 

r % ^ R ~ d t - e r . (6.23) 


What we really want is l/r t , which, since d % <<C R , can be written to our approxima- 
tion as 

i i / j • » \ 

(6.24) 




Substituting this in (6.21), we get that the potential is 


4> = 


47T€o 



k h 


d% ’ Cr 



(6.25) 


The three dots indicate the terms of higher order in d/R that we have neglected. 
These, as well as the ones we have already obtained, are successive terms in a Taylor 
expansion of \/r % about \/R in powers of d t /R. 

The first term in (6.25) is what we got before; it drops out if the object is 
neutral. The second term depends on 1 / R 2 , just as for a dipole. In fact, if we define 

P = Hqid t (6.26) 


as a property of the charge distribution, the second term of the potential (6.25) is 


= 1 P ' e r 

47re 0 R 2 


(6.27) 


precisely a dipole potential The quantity p is called the dipole moment of the 
distribution. It is a generalization of our earlier definition, and reduces to it for 
the special case of two point charges. 

Our result is that, far enough away from any mess of charges that is as a 
whole neutral, the potential is a dipole potential. It decreases as \/R 2 and varies 
as cos 6 — and its strength depends on the dipole moment of the distribution of 
charge. It is for these reasons that dipole fields are important, since the simple 
case of a pair of point charges is quite rare. 
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Fig. 6-8. The field lines and equipo- 
tentials for two point charges. 



Fig. 6-9. The field outside a con- 
ductor shaped like the equipotential A 
of Fig. 6-8. 


The water molecule, for example, has a rather strong dipole moment. The 
electric fields that result from this moment are responsible for some of the im- 
portant properties of water. For many molecules, for example C0 2 , the dipole 
moment vanishes because of the symmetry of the molecule. For them we should 
expand still more accurately, obtaining another term in the potential which de- 
creases as 1 /R^, and which is called a quadrupole potential. We will discuss such 
cases later. 

6-6 The fields of charged conductors 

We have now finished with the examples we wish to cover of situations in 
which the charge distributions is known from the start. It has been a problem 
without serious complications, involving at most some integrations. We turn 
now to an entirely new kind of problem, the determination of the fields near 
charged conductors. 

Suppose that we have a situation in which a total charge Q is placed on an 
arbitrary conductor. Now we will not be able to say exactly where the charges 
are. They will spread out in some way on the surface. How can we know how 
the charges have distributed themselves on the surface? They must distribute 
themselves so that the potential of the surface is constant. If the surface were not 
an equipotential, there would be an electric field inside the conductor, and the 
charges would keep moving until it became zero. The general problem of this 
kind can be solved in the following way. We guess at a distribution of charge and 
calculate the potential. If the potential turns out to be constant everywhere on 
the surface, the problem is finished. If the surface is not an equipotential, we 
have guessed the wrong distribution of charges, and should guess again — hopefully 
with an improved guess! This can go on forever, unless we are judicious about 
the successive guesses. 

The question of how to guess at the distribution is mathematically difficult. 
Nature, of course, has time to do it; the charges push and pull until they all balance 
themselves. When we try to solve the problem, however, it takes us so long to 
make each trial that that method is very tedious With an arbitrary group of 
conductors and charges the problem can be very complicated, and in general it 
cannot be solved without rather elaborate numerical methods. Such numerical 
computations, these days, are set up on a computing machine that will do the 
work for us, once we have told it how to proceed. 

On the other hand, there are a lot of little practical cases where it would 
be nice to be able to find the answer by some more direct method — without having 
to write a program for a computer. Fortunately, there are a number of cases where 
the answer can be obtained by squeezing it out of Nature by some trick or other. 
The first trick we will describe involves making use of solutions we have already 
obtained for situations in which charges have specified locations. 


6-7 The method of images 

We have solved, for example, the field of two point charges. Figure 6-8 
shows some of the field lines and equipotential surfaces we obtained by the com- 
putations in Chapter 5. Now consider the equipotential surface marked A. Sup- 
pose we were to shape a thin sheet of metal so that it just fits this surface. If we 
place it right at the surface and adjust its potential to the proper value, no one 
would ever know it was there, because nothing would be changed. 

But notice! We have really solved a new problem. We have a situation in 
which the surface of a curved conductor with a given potential is placed near a 
point charge. If the metal sheet we placed at the equipotential surface eventually 
closes on itself (or, in practice, if it goes far enough) we have the kind of situation 
considered in Section 5-10, in which our space is divided into two regions, one 
inside and one outside a closed conducting shell. We found there that the fields in 
the two regions are quite independent of each other. So we would have the same 
fields outside our curved conductor no matter what is inside. We can even fill up 
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the whole inside with conducting material. We have found, therefore, the fields 
for the arrangement of Fig. 6-9. In the space outside the conductor the field is 
just like that of two point charges, as in Fig. 6-8. Inside the conductor, it is zero 
Also — as it must be — the electric field just outside the conductor is normal to 
the surface. 

Thus we can compute the fields in Fig. 6-9 by computing the field due to q 
and to an imaginary point charge —q at a suitable point. The point charge we 
“imagine” existing behind the conducting surface is called an image charge . 

In books you can find long lists of solutions for hyperbolic-shaped conductors 
and other complicated looking things, and you wonder how anyone ever solved 
these terrible shapes. They were solved backwards! Someone solved a simple 
problem with given charges. He then saw that some equipotential surface showed 
up in a new shape, and he wrote a paper in which he pointed out that the field 
outside that particular shape can be described in a certain way. 


6-8 A point charge near a conducting plane 

As the simplest application of the use of this method, let’s make use of the 
plane equipotential surface B of Fig. 6-8. With it, we can solve the problem of a 
charge in front of a conducting sheet. We just cross out the left-hand half of the 
picture. The field lines for our solution are shown in Fig. 6-10. Notice that the 
plane, since it was halfway between the two charges, has zero potential. We have 
solved the problem of a positive charge next to a grounded conducting sheet. 

We have now solved for the total field, but what about the real charges that 
are responsible for it? There are, in addition to our positive point charge, some 
induced negative charges on the conducting sheet that have been attracted by the 
positive charge (from large distances away). Now suppose that for some technical 
reason — or out of curiosity — you would like to know how the negative charges 
are distributed on the surface. You can find the surface charge density by using 
the result we worked out in Section 5-6 with Gauss’ theorem. The normal com- 



Fig. 6-10. The field of a charge near a plane conducting surface, found by the 
method of images. 
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Fig. 6-11. The point charge q in- 
duces charges on a grounded conducting 
sphere whose fields are those of an 
image charge q' placed at the point 
shown. 


ponent of the electric field just outside a conductor is equal to the density of surface 
charge a divided by e n . We can obtain the density of charge at any point on the 
surface by working backwards from the normal component of the electric field at 
the surface. We know that, because we know the field everywhere. 

Consider a point on the surface at the distance p from the point directly be- 
neath the positive charge (Fig. 6-10). The electric field at this point is normal to 
the surface and is directed into it. The component normal to the surface of the 
field from the positive point charge is 


1 aq 

~4xe 0 ( a 2 + p 2 ) 3/2 


(6.28) 


To this we must add the electric field produced by the negative image charge. That 
just doubles the normal component (and cancels all others), so the charge density 
(7 at any point on the surface is 

<rw . .oiw - - 4—^ ■ ■ ; (« ») 


An interesting check on our work is to integrate o' over the whole surface. We 
find that the total induced charge is — q , as it should be. 

One further question: Is there a force on the point charge ? Yes, because there 
is an attraction from the induced negative surface charge on the plate. Now that 
we know what the surface charges are (from Eq. (6.29)), we could compute the 
force on our positive point charge by an integral. But we also know that the force 
acting on the positive charge is exactly the same as it would be with the negative 
image charge instead of the plate, because the fields in the neighborhood are the 
same in both cases. The point charge feels a force toward the plate whose magni- 
tude is 


F 


_ JL A 

4 x 60 (2a)- 


(6 30) 


We have found the force much more easily than by integrating over all the nega- 
tive charges. 


6-9 A point charge near a conducting sphere 

What other surfaces besides a plane have a simple solution 9 The next most 
simple shape is a sphere. Let’s find the fields around a metal sphere which has a 
point charge q near it, as shown in Fig. 6-1 1. Now we must look for a simple 
physical situation which gives a sphere for an equipotential surface. If we look 
around at problems people have already solved, we find that someone has noticed 
that the field of two unequal point charges has an equipotential that is a sphere 
Aha 1 If we choose the location of an image charge — and pick the right amount 
of charge — maybe we can make the equipotential surface fit our sphere. Indeed, 
it can be done with the following prescription. 

Assume thai you want the equipotential surface to be a sphere of radius a 
with its center at the distance b from the charge q . Put an image charge of strength 
q ' = —q(a/b) on the line from the charge to the center of the sphere, and at a 
distance a 2 /b from the center. The sphere will be at zero potential. 

The mathematical reason stems from the fact that a sphere is the locus of all 
points for which the distances from two points are in a constant ratio Referring 
to Fig. 6-11, the potential at P from q and q f is proportional to 



The potential will thus be zero at all points for which 

q f q r 2 q f 

- — or -- = — -- • 

r 2 n r 1 q 
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If we place q* at the distance a 2 /b from the center, the ratio r 2 /r j has the constant 
value a/b . Then if 


the sphere is an equipotential. Its potential is, in fact, zero. 

What happens if we are interested in a sphere that is not at zero potential? 
That would be so only if its total charge happens accidentally to be q r Of course if it 
is grounded, the charges induced on it would have to be just that. But what if it 
is insulated, and we have put no charge on it 9 Or if we know that the total charge 
Q has been put on it? Or just that it has a given potential not equal to zero? All 
these questions are easily answered. We can always add a point charge q ,f at the 
center of the sphere The sphere still remains an equipotential by superposition; 
only the magnitude of the potential will be changed. 

If we have, for example, a conducting sphere which is initially uncharged 
and insulated from everything else, and we bring near to it the positive point 
charge g, the total charge of the sphere will remain zero. The solution is found 
by using an image charge g' as before, but, in addition, adding a charge g" at the 
center of the sphere, choosing 

q" = = f q. (6-32) 


The fields everywhere outside the sphere are given by the superposition of the 
fields of g, g\ and g" . The problem is solved. 

We can see now that there will be a force of attraction between the sphere 
and the point charge q. It is not zero even though there is no charge on the neutral 
sphere. Where does the attraction come from? When you bring a positive charge 
up to a conducting sphere, the positive charge attracts negative charges to the 
side closer to itself and leaves positive charges on the surface of the far side. The 
attraction by the negative charges exceeds the repulsion from the positive charges, 
there is a net attraction. We can find out how large the attraction is by computing 
the force on q in the field produced by q' and g" . The total force is the sum of the 
attractive force between g and a charge q' = — (a/b)q % at the distance b — ( a 2 /h ), 
and the repulsive force between q and a charge g" = +(a/b)q at the distance h. 

Those who were entertained in childhood by the baking powder box which 
has on its label a picture of a baking powder box which has on its label a picture 
of a baking powder box which has . may be interested in the following problem. 
Two equal spheres, one with a total charge of + Q and the other with a total charge 
of — Q, are placed at some distance from each other. What is the force between 
them? The problem can be solved with an infinite number of images. One first 
approximates each sphere by a charge at its center. These charges will have image 
charges in the other sphere. The image charges will have images, etc , etc , etc 
The solution is like the picture on the box of baking powder— and it converges 
pretty fast. 

6-10 Condensers; parallel plates 

We take up now another kind of a problem involving conductors. Consider 
two large metal plates which are parallel to each other and separated by a distance 
small compared with their width. Let's suppose that equal and opposite charges 
have been put on the plates. The charges on each plate will be attracted by the 
charges on the other plate, and the charges will spread out uniformly on the inner 
surfaces of the plates. The plates will have surface charge densities +<j and —a, 
respectively, as in Fig. 6-12. From Chapter 5 we know that the field between the 
plates is cr/e 0 , and that the field outside the plates is zero. The plates will have 
different potentials <f>i and 0 2 . For convenience we will call the difference V\ it 
is often called the ‘'voltage"; 
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Fig. 6— 1 2 A parallel-plate con- 
denser. 


(You will find that sometimes people use V for the potential, but we have chosen 
to use 0.) 
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The potential difference V is the work per unit charge required to carry a 
small charge from one plate to the other, so that 

V = Ed = — d = Q, (6.33) 

where is the total charge on each plate, A is the area of the plates, and d is 
the separation. 

We find that the voltage is proportional to the charge. Such a proportionality 
between V and Q is found for any two conductors in space if there is a plus charge 
on one and an equal minus charge on the other. The potential difference between 
them — that is, the voltage — will be proportional to the charge. (We are assuming 
that there are no other charges around.) 

Why this proportionality? Just the superposition principle. Suppose we 
know the solution for one set of charges, and then we superimpose two such 
solutions. The charges are doubled, the fields are doubled, and the work done in 
carrying a unit charge from one point to the other is also doubled. Therefore the 
potential difference between any two points is proportional to the charges. In 
particular, the potential difference between the two conductors is proportional 
to the charges on them. Someone originally wrote the equation of proportionality 
the other way. That is, they wrote 

Q = CV, 

where C is a constant This coefficient of proportionality is called the capacity , 
and such a system of two conductors is called a condenser * For our parallel-plate 
condenser 

C = (parallel plates). (6.34) 



Fig. 6-1 3. The electric field near the 
edge of two parallel plates. 


This formula is not exact, because the field is not really uniform everywhere 
between the plates, as we assumed. The field does not just suddenly quit at the 
edges, but really is more as shown in Fig 6-13 The total charge is not a A, as we 
have assumed — there is a little correction for the effects at the edges. To find out 
what the correction is, we will have to calculate the field more exactly and find 
out just what does happen at the edges. That is a complicated mathematical 
problem which can, however, be solved by techniques which we will not describe 
now. The result of such calculations is that the charge density rises somewhat 
near the edges of the plates This means that the capacity of the plates is a little 
higher than we computed. [A very good approximation for the capacity is ob- 
tained if we use Eq. (6.34) but take for A the area one would get if the plates were 
extended artificially by a distance 3/8 of the separation between the plates.] 

We have talked about the capacity for two conductors only. Sometimes 
people talk about the capacity of a single object. They say, for instance, that the 
capacity of a sphere of radius a is 47 re 0 a. What they imagine is that the other 
terminal is another sphere of infinite radius — that when there is a charge + Q on 
the sphere, the opposite charge, — Q , is on an infinite sphere. One can also speak 
of capacities when there are three or more conductors, a discussion we shall, 
however, defer. 

Suppose that we wish to have a condenser with a very large capacity We 
could get a large capacity by taking a very big area and a very small separation 
We could put waxed paper between sheets of aluminum foil and roll it up. (If 
we seal it in plastic, we have a typical radio-type condenser.) What good is it 9 
It is good for storing charge. If we try to store charge on a ball, for example, its 
potential rises rapidly as we charge it up. It may even get so high that the charge 
begins to escape into the air by way of sparks But if we put the same charge on a 
condenser whose capacity is very large, the voltage developed across the con- 
denser will be small. 


* Some people think the words “capacitance” and “capacitor” should be used, instead 
of “capacity” and “condensor ” We have decided to use the older terminology, because 
it is still more commonly heard in the physics laboratory — even if not in textbooks! 
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In many applications in electronic circuits, it is useful to have something 
which can absorb or deliver large quantities of charge without changing its po- 
tential much. A condenser (or “capacitor”) does just that. There are also many 
applications in electronic instruments and in computers where a condenser is 
used to get a specified change in voltage in response to a particular change in 
charge. We have seen a similar application m Chapter 23, Vol. I, where we de- 
scribed the properties of resonant circuits. 

From the definition of C, we see that its unit is one coul/volt. This unit is 
also called a farad. Looking at Eq. (6.34), we see that one can express the units 
of e 0 as farad/meter, which is the unit most commonly used. Typical sizes of 
condensers run from one micro-microfarad (= 1 picofarad) to millifarads. Small 
condensers of a few picofarads are used in high-frequency tuned circuits, and 
capacities up to hundreds or thousands of microfarads are found in power-supply 
filters. A pair of plates one square centimeter m area with a one millimeter separa- 
tion have a capacity of roughly one micro-microfarad. 


6-11 High-voltage breakdown 

We would like now to discuss qualitatively some of the characteristics of the 
fields around conductors. If we charge a conductor that is not a sphere, but one 
that has on it a point or a very sharp end, as, for example, the object sketched 
in Fig. 6-14, the field around the point is much higher than the field in the other 
regions. The reason is, qualitatively, that charges try to spread out as much as 
possible on the surface of a conductor, and the tip of a sharp point is as far away 
as it is possible to be from most of the surface. Some of the charges on the plate 
get pushed all the way to the tip. A relatively small amount of charge on the tip 
can still provide a large surface density; a high charge density means a high field 
just outside. 

One way to see that the field is highest at those places on a conductor where 
the radius of curvature is smallest is to consider the combination of a big sphere 
and a little sphere connected by a wire, as shown in Fig. 6-15. It is a somewhat 
idealized version of the conductor of Fig. 6-14. The wire will have little influence 
on the fields outside; it is there to keep the spheres at the same potential. Now, 
which ball has the biggest field at its surface? If the ball on the left has the radius 
a and carries a charge Q , its potential is about 


4 >\ = 


1 Q 

47re 0 a 


(Of course the presence of one ball changes the charge distribution on the other, 
so that the charges are not really spherically symmetric on either. But if we are 
interested only in an estimate of the fields, we can use the potential of a spherical 
charge.) If the smaller ball, whose radius is b , carries the charge q , its potential 
is about 

4t6 0 b 


But <j>i = <j > 2 , so 


Q = q, 

a b 


On the other hand, the field at the surface (see Eq. 5.8) is 
surface charge density, which is like the total charge over 

We get that „ 

Eg ~ Q/a 2 ^ b 

E b q/b 2 a 


proportional to the 
the radius squared. 

(6.35) 


Therefore the field is higher at the surface of the small sphere. The fields are in the 
inverse proportion of the radii. 

This result is technically very important, because air will break down if the 
electric field is too great. What happens is that a loose charge (electron, or ion) 
somewhere in the air is accelerated by the field, and if the field is very great, the 
charge can pick up enough speed before it hits another atom to be able to knock an 

6-13 


36~X~10' ,arad /”' ter 



Fig. 6-14. The electric field near a 
sharp point on a conductor is very high. 



Fig. 6—15. The field of a pointed 
object can be approximated by that of 
two spheres at the same potential. 


electron off that atom. As a result, more and more ions are produced. Their 
motion constitutes a discharge, or spark. If you want to charge an object to a 
high potential and not have it discharge itself by sparks in the air, you must be 
sure that the surface is smooth, so that there is no place where the field is ab- 
normally large. 



Fig, 6-1 6. Field-emission microscope. 



Fig. 6-17. Image produced by a 
field-emission microscope. [Courtesy of 
Erwin W. Mueller, Research Prof, of 
Physics, Pennsylvania State University.] 


6-12 The field-emission microscope 

There is an interesting application of the extremely high electric field which 
surrounds any sharp protuberance on a charged conductor. The field-emission 
microscope depends for its operation on the high fields produced at a sharp metal 
point.* It is built in the following way. A very fine needle, with a tip whose diameter 
is about 1000 angstroms, is placed at the center of an evacuated glass sphere (Fig. 
6-16.) The inner surface of the sphere is coated with a thin conducting layer of 
fluorescent material, and a very high potential difference is applied between the 
fluorescent coating and the needle. 

Let’s first consider what happens when the needle is negative with respect to 
the fluorescent coating. The field lines are highly concentrated at the sharp point. 
The electric field can be as high as 40 million volts per centimeter. In such 
intense fields, electrons are pulled out of the surface of the needle and accelerated 
across the potential difference between the needle and the fluorescent layer. When 
they arrive there they cause light to be emitted, just as in a television picture tube. 

The electrons which arrive at a given point on the fluorescent surface are, to 
an excellent approximation, those which leave the other end of the radial field line, 
because the electrons will travel along the field line passing from the point to the 
surface. Thus we see on the surface some kind of an image of the tip of the needle. 
More precisely, we see a picture of the emissivity of the surface of the needle — that 
is the ease with which electrons can leave the surface of the metal tip. If the resolu- 
tion were high enough, one could hope to resolve the positions of the individual 
atoms on the tip of the needle. With electrons, this resolution is not possible for 
the following reasons. First, there is quantum-mechanical diffraction of the 
electron waves which blurs the image. Second, due to the internal motions of the 
electrons in the metal they have a small sideways initial velocity when they leave 
the needle, and this random transverse component of the velocity causes some 
smearing of the image. The combination of these two effects limits the resolution 
to 25 A or so. 

If, however, we reverse the polarity and introduce a small amount of helium 
gas into the bulb, much higher resolutions are possible. When a helium atom col- 
lides with the tip of the needle, the intense field there strips an electron off the 
helium atom, leaving it positively charged. The helium ion is then accelerated 
outward along a field line to the fluorescent screen. Since the helium ion is so much 
heavier than an electron, the quantum-mechanical wavelengths are much smaller. 
If the temperature is not too high, the effect of the thermal velocities is also smaller 
than in the electron case. With less smearing of the image a much sharper picture 
of the point is obtained. It has been possible to obtain magnifications up to 
2,000,000 times with the positive ion field-emission microscope — a magnification 
ten times better than is obtained with the best electron microscope. 

Figure 6-17 is an example of the results which were obtained with a field- 
ion microscope, using a tungsten needle. The center of a tungsten atom ionizes 
a helium atom at a slightly different rate than the spaces between the tungsten 
atoms. The pattern of spots on the fluorescent screen shows the arrangement of 
the individual atoms on the tungsten tip. The reason the spots appear in rings can 
be understood by visualizing a large box of balls packed in a rectangular array, 
representing the atoms in the metal. If you cut an approximately spherical section 
out of this box, you will see the ring pattern characteristic of the atomic structure. 
The field-ion microscope provided human beings with the means of seeing atoms 
for the first time. This is a remarkable achievement, considering the simplicity of 
the instrument. 


* See E. W. Mueller: “The field-ion microscope,” Advances in Electronics and Electron 
Physics , 13, 83-179 (1960). Academic Press, New York 
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The Electric Field in Various Circumstances 

( Continued ) 


7-1 Methods for finding the electrostatic field 

This chapter is a continuation of our consideration of the characteristics of 
electric fields in various particular situations. We shall first describe some of the 
more elaborate methods for solving problems with conductors. It is not expected 
that these more advanced methods can be mastered at this time. Yet it may be of 
interest to have some idea about the kinds of problems that can be solved, using 
techniques that may be learned in more advanced courses. Then we take up two 
examples in which the charge distribution is neither fixed nor is carried by a con- 
ductor, but instead is determined by some other law of physics. 

As we found in Chapter 6, the problem of the electrostatic field is fundamen- 
tally simple when the distribution of charges is specified ; it requires only the evalua- 
tion of an integral. When there are conductors present, however, complications 
arise because the charge distribution on the conductors is not initially known; 
the charge must distribute itself on the surface of the conductor in such a way that 
the conductor is an equipotential. The solution of such problems is neither direct 
nor simple. 

We have looked at an indirect method of solving such problems, in which we 
find the equipotentials for some specified charge distribution and replace one of 
them by a conducting surface. In this way we can build up a catalog of special 
solutions for conductors in the shapes of spheres, planes, etc. The use of images, 
described in Chapter 6, is an example of an indirect method. We shall describe 
another in this chapter. 

If the problem to be solved does not belong to the class of problems for which 
we can construct solutions by the indirect method, we are forced to solve the prob- 
lem by a more direct method. The mathematical problem of the direct method is 
the solution of Laplace’s equation, 
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V 2 <p = 0, (7.1) 

subject to the condition that is a suitable constant on certain boundaries — the 
surfaces of the conductors. Problems which involve the solution of a differential 
field equation subject to certain boundary conditions are called boundary-value 
problems. They have been the object of considerable mathematical study. In 
the case of conductors having complicated shapes, there are no general analytical 
methods. Even such a simple problem as that of a charged cylindrical metal can 
closed at both ends — a beer can — presents formidable mathematical difficulties. 
It can be solved only approximately, using numerical methods. The only general 
methods of solution are numerical. 

There are a few problems for which Eq. (7.1) can be solved directly. For 
example, the problem of a charged conductor having the shape of an ellipsoid of 
revolution can be solved exactly in terms of known special functions. The solution 
for a thin disc can be obtained by letting the ellipsoid become infinitely oblate. 
In a similar manner, the solution for a needle can be obtained by letting the ellipsoid 
become infinitely prolate. However, it must be stressed that the only direct methods 
of general applicability are the numerical techniques. 

Boundary-value problems can also be solved by measurements of a physical 
analog. Laplace’s equation arises in many different physical situations: in steady- 
state heat flow, in irrotational fluid flow, in current flow in an extended medium, 
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and in the deflection of an elastic membrane. It is frequently possible to set up a 
physical model which is analogous to an electrical problem which we wish to solve. 
By the measurement of a suitable analogous quantity on the model, the solution 
to the problem of interest can be determined. An example of the analog technique 
is the use of the electrolytic tank for the solution of two-dimensional problems in 
electrostatics. This works because the differential equation for the potential in a 
uniform conducting medium is the same as it is for a vacuum. 

There are many physical situations in which the variations of the physical 
fields in one direction are zero, or can be neglected in comparison with the varia- 
tions in the other two directions. Such problems are called two-dimensional; the 
field depends on two coordinates only. For example, if we place a long charged 
wire along the z-axis, then for points not too far from the wire the electric field 
depends on x and y , but not on z; the problem is two-dimensional. Since in a two- 
dimensional problem d/dz = 0, the equation for <t> in free space is 


d 2 <t> . a 2 <j> 

dx 2 + dy* 


(7.2) 


Because the two-dimensional equation is comparatively simple, there is a wide 
range of conditions under which it can be solved analytically. There is, in fact, 
a very powerful indirect mathematical technique which depends on a theorem 
from the mathematics of functions of a complex variable, and which we will now 
describe. 


7-2 Two-dimensional fields; functions of the complex variable 

The complex variable b is defined as 

b = x + iy. 

(Do not confuse b with the z-coordinate, which we ignore in the following dis- 
cussion because we assume there is no z-dependence of the fields.) Every point in 
x and y then corresponds to a complex number b. We can use & as a single 
(complex) variable, and with it write the usual kinds of mathematical functions 
F(b). For example, 

m = a 2 , 

or 

F(d) = l/a 3 , 

or 

F(b ) = b log b, 

and so forth. 

Given any particular F(b) we can substitute b — x + iy, and we have a 
function of x and y — with real and imaginary parts. For example, 

b 2 — (x + iy) 2 = x 2 — y 2 + lixy . (7.3) 

Any function F{b) can be written as a sum of a pure real part and a pure 
imaginary part, each part a function of x and y: 

F(i) = U(x,y)+ iV(x >y ), (7.4) 

where t/(x, y) and V(x, y) are real functions. Thus from any complex function 
F(b) two new functions U(x,y) and V(x,y) can be derived. For example, F(b) = b 2 
gives us the two functions 

U(x,y) = x 2 -y 2 , (7.5) 

and 

V(x,y) = 2 xy. (7.6) 

Now we come to a miraculous mathematical theorem which is so delightful 
that we shall leave a proof of it for one of your courses in mathematics. (We 
should not reveal all the mysteries of mathematics, or that subject matter would 
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become too dull.) It is this. For any “ordinary function” (mathematicians will 
define it better) the functions U and V automatically satisfy the relations 


8U = dV 
dx dy 9 

dV = _3U_ 
dx dy 

It follows immediately that each of the functions 

dx 2 dy 2 

f_v d 2 v = 

dx 2 dy 2 


(77) 

(78) 

U and V satisfy Laplace’s equation: 
0, (7.9) 

0. (710) 


These equations are clearly true for the functions of (7.5) and (7.6). 

Thus, starting with any ordinary function, we can arrive at two functions 
U(x, y) and V(x , y), which are both solutions of Laplace’s equation in two dimen- 
sions. Each function represents a possible electrostatic potential. We can pick any 
function F{b ) and it should represent some electric field problem — in fact, two 
problems, because U and V each represent solutions. We can write down as many 
solutions as we wish — by just making up functions — then we just have to find the 
problem that goes with each solution. It may sound backwards, but it’s a possible 
approach. 



Fig. 7-1. Two sets of orthogonal curves which can represent 
equipotentials in a two-dimensional electrostatic field. 

As an example, let’s see what physics the function F(b) = d 2 gives us. From 
it we get the two potential functions of (7.5) and (7.6). To see what problem the 
function V belongs to, we solve for the equipotential surfaces by setting U = A, 
a constant: 

x 2 - y 2 = A. 

This is the equation of a rectangular hyperbola. For various values of A, we get 
the hyperbolas shown in Fig. 7-1. When A = 0, we get the special case of diagonal 
straight lines through the origin. 

Such a set of equipotentials corresponds to several possible physical situations. 
First, it represents the fine details of the field near the point halfway between two 
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equal point charges. Second, it represents the field at an inside right-angle corner 
of a conductor. If we have two electrodes shaped like those in Fig. 7-2, which are 
held at different potentials, the field near the corner marked C will look just like 
the field above the origin in Fig. 7-1. The solid lines are the equipotentials, and 
the broken lines at right angles correspond to lines of E. Whereas at points or 
protuberances the electric field tends to be high, it tends to be low in dents or 
hollows. 

The solution we have found also corresponds to that for a hyperbola-shaped 
electrode near a right-angle corner, or for two hyperbolas at suitable potentials. 
You will notice that the field of Fig. 7-1 has an interesting property. The x-com- 
ponent of the electric field, £ x , is given by 



lens. 


The electric field is proportional to the distance from the axis. This fact is used to 
make devices (called quadrupole lenses) that are useful for focusing particle beams 
(see Section 29-9). The desired field is usually obtained by using four hyperbola- 
shaped electrodes, as shown in Fig. 7-3. For the electric field lines in Fig. 7-3, 
we have simply copied from Fig. 7-1 the set of broken-line curves that represent 
V = constant. We have a bonus! The curves for V — constant are orthogonal 
to the ones for U — constant because of the equations (7.7) and (7.8). Whenever 
we choose a function F(d ), we get from U and V both the equipotentials and field 
lines. And you will remember that we have solved either of two problems, depend- 
ing on which set of curves we call the equipotentials. 

As a second example, consider the function 


If we write 
where 
and 
then 


F(3) = Vb. 
b = x + iy = pe'\ 

P = VF+7 

tan 6 = y/x, 

m = p w v ’ /2 


from which 
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(7.11) 


(7.12) 


7-4 





A- 4 



x 


Fig. 7-4. Curves of constant U(x # y) 
and V(x, y) from Eq. (7.12). 


The curves for U(x , y) = A and V(x , y) = B , using £/ and K from Eq. (7. 12), 
are plotted in Fig. 7-4. Again, there are many possible situations that could be 
described by these fields. One of the most interesting is the field near the edge of a 
thin plate. If the line B = 0 — to the right of the y- axis — represents a thin charged 
plate, the field lines near it are given by the curves for various values of A. The 
physical situation is shown in Fig. 7-5. 

Further examples are 

F(3) = z 3/2 , _ (7.13) 

which yields the field outside a rectangular corner 


F(d) = logs, (7.14) 

which yields the field for a line charge, and 

F(d) = (7.15) 

which gives the field for the two-dimensional analog of an electric dipole, i.e., 
two parallel line charges with opposite polarities, very close together. 

We will not pursue this subject further in this course, but should emphasize 
that although the complex variable technique is often powerful, it is limited to 
two-dimensional problems; and also, it is an indirect method. 



7-3 Plasma oscillations 

We consider now some physical situations in which the field is determined 
neither by fixed charges nor by charges on conducting surfaces, but by a com- 
bination of two physical phenomena. In other words, the field will be governed 
simultaneously by two sets of equations: (1) the equations from electrostatics 
relating electric fields to charge distribution, and (2) an equation from another 
part of physics that determines the positions or motions of the charges in the 
presence of the field. 

The first example that we will discuss is a dynamic one in which the motion 
of the charges is governed by Newton’s laws. A simple example of such a situation 
occurs in a plasma, which is an ionized gas consisting of ions and free electrons 
distributed over a region in space. The ionosphere — an upper layer of the atmos- 
phere — is an example of such a plasma. The ultraviolet rays from the sun knock 
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Fig. 7-5. The electric field near the 
edge of a thin grounded plate. 
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Fig. 7-6. Motion in a plasma wave. 
The electrons at the plane a move to a 1 , 
and those at b move to b 1 . 


electrons off the molecules of the air, creating free electrons and ions. In such a 
plasma the positive ions are very much heavier than the electrons, so we may 
neglect the ionic motion, in comparison to that of the electrons. 

Let n o be the density of electrons in the undisturbed, equilibrium state. 
This must also be the density of positive ions, since the plasma is electrically 
neutral (when undisturbed). Now we suppose that the electrons are somehow 
moved from equilibrium and ask what happens. If the density of the electrons in 
one region is increased, they will repel each other and tend to return to their 
equilibrium positions. As the electrons move toward their original positions they 
pick up kinetic energy, and instead of coming to rest in their equilibrium configura- 
tion, they overshoot the mark. They will oscillate back and forth. The situation 
is similar to what occurs in sound waves, in which the restoring force is the gas 
pressure. In a plasma, the restoring force is the electrical force on the electrons. 

To simplify the discussion, we will worry only about a situation in which the 
motions are all in one dimension, say x. Let us suppose that the electrons origi- 
nally at x are, at the instant /, displaced from their equilibrium positions by a small 
amount s(x, t). Since the electrons have been displaced, theirdensity will, in general, 
be changed. The change in density is easily calculated. Referring to Fig. 7-6. 
the electrons initially contained between the two planes a and b have moved and 
are now contained between the planes a f and b '. The number of electrons that 
were between a and b is proportional to n 0 Ax; the same number are now contained 
in the space whose width is Ax + As. The density has changed to 


itoAx __ «o 

Ax + As 1 + (As /Ax) 


(7.16) 


If the change in density is small, we can write [using the binomial expansion for 

o + «r J i 

(7|7 » 

We assume that the positive ions do not move appreciably (because of the much 
larger inertia), so their density remains n 0 . Each electron carries the charge —q e , 
so the average charge density at any point is given by 


or 


P = - (« “ n 0 )q e9 

P = n 0 q e (7.18) 


(where we have written the differential form for As/Ax). 

The charge density is related to the electric field by Maxwell's equations, in 
particular, 

V E = ■ (7.19) 

*0 

If the problem is indeed one-dimensional (and if there are no other fields but the 
one due to the displacements of the electrons), the electric field E has a single 
component E x . Equation (7.19), together with (7.18), gives 


SEx _ n 0 q e ds 
dx €o dx 

Integrating Eq. (7.20) gives 

E x = Ms .v + k. 

to 

Since E x = 0 when s — 0, the integration constant K is zero. 

The force on an electron in the displaced position is 

F x = - Ml s> (7.22) 

to 


(7.20) 

(7.21) 
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a restoring force proportional to the displacement s of the electron. This leads to 
a harmonic oscillation of the electrons. The equation of motion of a displaced 
electron is 




d 2 s 

di 2 


noq[ s 

€o 


(7.23) 


We find that s will vary harmonically. Its time variation will be as cos ait, or — 
using the exponential notation of Vol. I — as 

e*V. (7.24) 


The frequency of oscillation co p is determined from (7.23): 


2 _ n Q qt 

Wj> — i 

v e 0 m e 


(7.25) 


and is called the plasma frequency . It is a characteristic number of the plasma. 

When dealing with electron charges many people prefer to express their an- 
swers in terms of a quantity e 2 defined by 

2 

e 2 = = 2.3068 X lO" 28 newton*meter 2 . (7.26) 

47T€o 


Using this convention, Eq. (7.25) becomes 


co 


47re 2 n 0 


m e 


(7.27) 


which is the form you will find in most books. 

Thus we have found that a disturbance of a plasma will set up free oscillations 
of the electrons about their equilibrium positions at the natural frequency 
which is proportional to the square root of the density of the electrons. The plasma 
electrons behave like a resonant system, such as those we described in Chapter 
23 of Vol. I. 

This natural resonance of a plasma has some interesting effects. For example, 
if one tries to propagate a radiowave through the ionosphere, one finds that it 
can penetrate only if its frequency is higher than the plasma frequency. Otherwise 
the signal is reflected back. We must use high frequencies if we wish to communi- 
cate with a satellite in space. On the other hand, if we wish to communicate with 
a radio station beyond the horizon, we must use frequencies lower than the plasma 
frequency, so that the signal will be reflected back to the earth. 

Another interesting example of plasma oscillations occurs in metals. In a 
metal we have a contained plasma of positive ions, and free electrons. The density 
n 0 is very high, so w p is also. But it should still be possible to observe the electron 
oscillations. Now, according to quantum mechanics, a harmonic oscillator with 
a natural frequency co p has energy levels which are separated by the the energy 
increment tua p . If, then, one shoots electrons through, say, an aluminum foil, and 
makes very careful measurements of the electron energies on the other side, one 
might expect to find that the electrons sometimes lose the energy hco p to the plasma 
oscillations. This does indeed happen. It was first observed experimentally in 
1936 that electrons with energies of a few hundred to a few thousand electron volts 
lost energy in jumps when scattering from or going through a thin metal foil. The 
effect was not understood until 1953 when Bohm and Pines* showed that the 
observations could be explained in terms of quantum excitations of the plasma 
oscillations in the metal. 


* For some recent work and a bibliography see C. J. Powell and J. B. Swann, Phys. 
Rev . 115 , 869 (1959). 


7-7 



7-4 Colloidal particles in an electrolyte 

We turn to another phenomenon in which the locations of charges is governed 
by a potential that arises in part from the same charges. The resulting effects 
influence in an important way the behavior of colloids. A colloid consists of a 
suspension in water of small charged particles which, though microscopic, from 
an atomic point of view are still very large. If the colloidal particles were not 
charged, they would tend to coagulate into large lumps; but because of their 
charge, they repel each other and remain in suspension. 

Now if there is also some salt dissolved in the water, it will be dissociated into 
positive and negative ions. (Such a solution of ions is called an electrolyte.) The 
negative ions are attracted to the colloid particles (assuming their charge is positive) 
and the positive ions are repelled. We will determine how the ions which surround 
such a colloidal particle are distributed in space. 

To keep the ideas simple, we will again solve only a one-dimensional case. 
If we think of a colloidal particle as a sphere having a very large radius — on an 
atomic scale! — we can then treat a small part of its surface as a plane. (Whenever 
one is trying to understand a new phenomenon it is a good idea to take a somewhat 
oversimplified model; then, having understood the problem with that model, one 
is better able to proceed to tackle the more exact calculation.) 

We suppose that the distribution of ions generates a charge density p(x), and 
an electrical potential <j> 9 related by the electrostatic law V 2 <£ = — p/e 0 or, for 
fields that vary in only one dimension, by 

Now supposing there were such a potential <£(; c), how would the ions dis- 
tribute themselves in it? This we can determine by the principles of statistical 
mechanics. Our problem then is to determine so that the resulting charge density 
from statistical mechanics also satisfies (7.28). 

According to statistical mechanics (see Chapter 40, Vol. I), particles in thermal 
equilibrium in a force field are distributed in such a way that the density n of 
particles at the position x is given by 

n(x) = nQ e- U(x)lhT , (7.29) 

where U(x) is the potential energy, k is Boltzmann’s constant, and T is the absolute 
temperature. 

We assume that the ions carry one electronic charge, positive or negative. 
At the distance x from the surface of a colloidal particle, a positive ion will have 
potential energy q e <t>(x ), so that 


U{x) = q e <t>(x). 

The density of positive ions, n + , is then 

n + (x) = noe- q ‘* wlkT . 

Similarly, the density of negative ions is 

«_(*) = noe +q '+ wlkT . 

The total charge density is 

p = q e n+ - 
or 

p = q e n 0 (.e- q °* lkT -e +q °* lkr ). (7.30) 

Combining this with Eq. (7.28), we find that the potential must satisfy 


djt _ _ o (o -<le*lkT_ p +qjHkT\ 

dx* '« 0 1 


(7.31) 
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This equation is readily solved in general [multiply both sides by 2(d<t>/dx), and 
integrate with respect to x], but to keep the problem as simple as possible, we will 
consider here only the limiting case in which the potentials are small or the tem- 
perature T is high. The case where <p is small corresponds to a dilute solution. For 
these cases the exponent is small, and we can approximate 


Equation (7.31) then gives 


„±<le*fkT « q<4> 

kT 


d 2 4> _ , 2n 0 q* 
d^~ + l^r * (X) ' 


(7.32) 

(7.33) 


Notice that this time the sign on the right is positive. The solutions for <f> are not 
oscillatory, but exponential. 

The general solution of Eq. (7.33) is 


with 


<t> = Ae~ x,D + Be +xlD , 

r,2 = 

2n 0 q 2 ' 


(7.34) 

(7.35) 


The constants A and B must be determined from the conditions of the problem. 
In our case, B must be zero; otherwise the potential would go to infinity for large 
x. So we have that 

$ = Ae~ xlD , (7.36) 

in which A is the potential at x = 0, the surface of the colloidal particle. 



The potential decreases by a factor 1 /e each time the distance increases by D, 
as shown in the graph of Fig. 7-7. The number D is called the Debye length , and 
is a measure of the thickness of the ion sheath that surrounds a large charged 
particle in an electrolyte. Equation (7.36) says that the sheath gets thinner with 
increasing concentration of the ions (n 0 ) or with decreasing temperature. 

The constant A in Eq. (7.36) is easily obtained if we know the surface chargee 
on the colloid particle. We know that 


En = £c(0) = f 
€o 


But E is also the gradient of <f>: 


E x ( 0) = 


d<f> 

dX 


= +i. 

o D 


from which we get 


a vD 

A = 

Co 


(7.37) 

(7.38) 

(7.39) 
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Using this result in (7.36), we find (by taking x = 0) that the potential of the 
colloidal particle is 

rn = a r • (7.40) 

CO 

You will notice that this potential is the same as the potential difference across a 
condenser with a plate spacing D and a surface charge density <r. 

We have said that the colloidal particles are kept apart by their electrical 
repulsion. But now we see that the field a little way from the surface of a particle 
is reduced by the ion sheath that collects around it. If the sheaths get thin enough, 
the particles have a good chance of knocking against each other. They will then 
stick, and the colloid will coagulate and precipitate out of the liquid. From our 
analysis, we understand why adding enough salt to a colloid should cause it to 
precipitate out. The process is called “salting out a colloid.” 

Another interesting example is the effect that a salt solution has on protein 
molecules. A protein molecule is a long, complicated, and flexible chain of amino 
acids. The molecule has various charges on it, and it sometimes happens that 
there is a net charge, say negative, which is distributed along the chain. Because 
of mutual repulsion of the negative charges, the protein chain is kept stretched out. 
Also, if there are other similar chain molecules present in the solution, they will 
be kept apart by the same repulsive effects. We can, therefore, have a suspension 
of chain molecules in a liquid. But if we add salt to the liquid we change the proper- 
ties of the suspension. As salt is added to the solution, decreasing the Debye 
distance, the chain molecules can approach one another, and can also coil up. 
If enough salt is added to the solution, the chain molecules will precipitate out of 
the solution. There are many chemical effects of this kind that can be understood 
in terms of electrical forces. 


7-5 The electrostatic field of a grid 

As our last example, we would like to describe another interesting property 
of electric fields. It is one which is made use of in the design of electrical instru- 
ments, in the construction of vacuum tubes, and for other purposes. This is the 
character of the electric field near a grid of charged wires. To make the problem 
as simple as possible, let us consider an array of parallel wires lying in a plane, 
the wires being infinitely long and with a uniform spacing between them. 

If we look at the field a large distance above the plane of the wires, we see a 
constant electric field, just as though the charge were uniformly spread over a 
plane. As we approach the grid of wires, the field begins to deviate from the 
uniform field we found at large distances from the grid. We would like to estimate 
how close to the grid we have to be in order to see appreciable variations in the 
potential. Figure 7-8 shows a rough sketch of the equipotentials at various 
distances from the grid. The closer we get to the grid, the larger the variations. 
As we travel parallel to the grid, we observe that the field fluctuates in a periodic 
manner. 


t* 
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Fig. 7-8. Equipotential surfaces 
above a uniform grid of charged wires. 
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Now we have seen (Chapter 50, Vol. I) that any periodic quantity can be 
expressed as a sum of sine waves (Fourier’s theorem). Let’s see if we can find a 
suitable harmonic function that satisfies our field equations. 

If the wires lie in the xy-plane and run parallel to the y- axis, then we might 
try terms like 

<t>(x,z) = F n (z) cos (7.41) 

where a is the spacing of the wires and n is the harmonic number. (We have as- 
sumed long wires, so there should be no variation with y.) A complete solution 
would be made up of a sum of such terms for n = 1, 2, 3, ... . 

If this is to be a valid potential, it must satisfy Laplace’s equation in the 
region above the wires (where there are no charges). That is, 

dx 2 dz2 


Trying this equation on the <f> in (7.41), we find that 


a ^ 


(7.42) 


or that F n (z) must satisfy 


So we must have 
where 


d 2 F n 47tV 

fife 2 a 2 n 


F n - A n e 


— ZfZQ 


_ a 
Z ° lint 


(7.43) 

(7.44) 

(7.45) 


We have found that if there is a Fourier component of the field of harmonic n 9 
that component will decrease exponentially with a characteristic distance z 0 = 
a/l'KTi, For the first harmonic (n = 1), the amplitude falls by the factor e~ 2ir 
(a large decrease) each time we increase z by one grid spacing a. The other har- 
monics fall off even more rapidly as we move away from the grid. We see that if 
we are only a few times the distance a away from the grid, the field is very nearly 
uniform, i.e., the oscillating terms are small. There would, of course, always 
remain the “zero harmonic” field 


0o — ~EqZ 

to give the uniform field at large z. For a complete solution, we would combine 
this term with a sum of terms like (7.41) with F n from (7.44). The coefficients A n 
would be adjusted so that the total sum would, when differentiated, give an electric 
field that would fit the charge density X of the grid wires. 

The method we have just developed can be used to explain why electrostatic 
shielding by means of a screen is often just as good as with a solid metal sheet. 
Except within a distance from the screen a few times the spacing of the screen 
wires, the fields inside a closed screen are zero. We see why copper screen — 
lighter and cheaper than copper sheet — is often used to shield sensitive electrical 
equipment from external disturbing fields. 
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Electrostatic Energy 


8-1 The electrostatic energy of charges. A uniform sphere 

In the study of mechanics, one of the most interesting and useful discoveries 
was the law of the conservation of energy. The expressions for the kinetic and 
potential energies of a mechanical system helped us to discover connections between 
the states of a system at two different times without having to look into the details 
of what was occurring in between. We wish now to consider the energy of electro- 
static systems. In electricity also the principle of the conservation of energy will 
be useful for discovering a number of interesting things. 

The law of the energy of interaction in electrostatics is very simple; we have, 
in fact, already discussed it. Suppose we have two charges q i and # 2 separated by 
the distance ri 2 . There is some energy in the system, because a certain amount of 
work was required to bring the charges together. We have already calculated the 
work done in bringing two charges together from a large distance. It is 


47reo/-i2 


( 8 . 1 ) 


We also know, from the principle of superposition, that if we have many charges 
present, the total force on any charge is the sum of the forces from the others. It 
follows, therefore, that the total energy of a system of a number of charges is the 
sum of terms due to the mutual interaction of each pair of charges. If q % and qj 
are any two of the charges and r %i is the distance between them (Fig. 8-1), the 
energy of that particular pair is 


mi 

47T€ 0 /\7 


( 8 . 2 ) 


The total electrostatic energy U is the sum of the energies of all possible pairs of 
charges: 


£ 

all pairs 


mi 
4? re 0 r,7 


(8.3) 


If we have a distribution of charge specified by a charge density p, the sum of Eq. 
(8.3) is, of course, to be replaced by an integral. 

We shall concern ourselves with two aspects of this energy. One is the applica- 
tion of the concept of energy to electrostatic problems; the other is the evaluation 
of the energy in different ways. Sometimes it is easier to compute the work done 
for some special case than to evaluate the sum in Eq. (8.3), or the corresponding 
integral. As an example, let us calculate the energy required to assemble a sphere 
of charge with a uniform charge density. The energy is just the work done in 
gathering the charges together from infinity. 

Imagine that we assemble the sphere by building up a succession of thin 
spherical layers of infinitesimal thickness. At each stage of the process, we gather 
a small amount of charge and put it in a thin layer from r to r + dr. We continue 
the process until we arrive at the final radius a (Fig. 8-2). If Q r is the charge of the 
sphere when it has been built up to the radius r, the work done in bringing a charge 
dQ to it is 


dU 


QrdQ 
47reo r 


(8.4) 
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Fig. 8-1. The electrostatic energy of 
a system of particles is the sum of the 
electrostatic energy of each pair. 



If the density of charge in the sphere is p, the charge Q r is 



Fig. 8-2. The energy of a uniform 
sphere of charge can be computed by 
imagining that it is assembled from 
successive spherical shells. 


and the charge dQ is 


Equation (8.4) becomes 


4 * 

Qr = p • J 7Tr, 
dQ = p • 47rr 2 dr, 

du = i a&L*. 

3e 0 


(8.5) 


The total energy required to assemble the sphere is the integral of dU from r = 
0 to r = a, or 


47t p 2 a 5 




15e 0 


( 8 . 6 ) 


Or if we wish to express the result in terms of the total charge Q of the sphere, 


u-l 


5 4t€o^ 


(8.7) 


The energy is proportional to the square of the total charge and inversely pro- 
portional to the radius. We can also interpret Eq. (8.7) as saying that the average 
of (1 /r %3 ) for all pairs of points in the sphere is 3/5 a. 


8-2 The energy of a condenser. Forces on charged conductors 

We consider now the energy required to charge a condenser. If the charge Q 
has been taken from one of the conductors of a condenser and placed on the other, 
the potential difference between them is 

V = § > (8.8) 


where C is the capacity of the condenser. How much work is done in charging 
the condenser? Proceeding as for the sphere, we imagine that the condenser has 
been charged by transferring charge from one plate to the other in small increments 
dQ, The work required to transfer the charge dQ is 

dU = VdQ. 


Taking V from Eq. (8.8), we write 


dU 


QdQ 

C 


Or integrating from zero charge to the final charge Q , we have 


u = l&. 

2 C 

(8.9) 

This energy can also be written as 

U = tcv 2 . 

(8.10) 


Recalling that the capacity of a conducting sphere (relative to infinity) is 

^sphere = 47T6ofl) 

we can immediately get from Eq. (8.9) the energy of a charged sphere, 


U=* 


1 Qr 


2 47reoa 


(8.11) 
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This, of course, is also the energy of a thin spherical shell of total charge Q and is 
just 5/6 of the energy of a uniformly charged sphere, Eq. (8.7). 

We now consider applications of the idea of electrostatic energy. Consider 
the following questions: What is the force between the plates of a condenser? Or 
what is the torque about some axis of a charged conductor in the presence of an- 
other with opposite charge? Such questions are easily answered by using our 
result Eq. (8.9) for electrostatic energy of a condenser, together with the principle 
of virtual work (Chapters 4, 13, and 14 of Vol. I). 

Let’s use this method for determining the force between the plates of a 
parallel-plate condenser. If we imagine that the spacing of the plates is increased 
by the small amount Az, then the mechanical work done from the outside in 
moving the plates would be 

AW = FAz, (8.12) 

where F is the force between the plates. This work must be equal to the change 
in the electrostatic energy of the condenser. 

By Eq. (8.9), the energy of the condenser was originally 


The change in energy (if we do not let the charge change) is 



Equating (8.12) and (8.13), we have 


This can also be written as 



FAz 



(8.13) 

(8.14) 

(8.15) 


The force, of course, results from the attraction of the charges on the plates, but 
we see that we do not have to worry in detail about how they are distributed; 
everything we need is taken care of in the capacity C. 

It is easy to see how the idea is extended to conductors of any shape, and for 
other components of the force. In Eq. (8.14), we replace F by the component we 
are looking for, and we replace A z by a small displacement in the corresponding 
direction. Or if we have an electrode with a pivot and we want to know the torque 
r, we write the virtual work as 


AW — r AO, 


where Ad is a small angular displacement. Of course, A(l/C) must be the change in 
1 / C which corresponds to Ad . We could, in this way, find the torque on the mov- 
able plates in a variable condenser of the type shown in Fig. 8-3. 

Returning to the special case of a parallel-plate condenser, we can use the 
formula we derived in Chapter 6 for the capacity: 


1 = d 
C toA 


(8.16) 



where A is the area of each plate. If we increase the separation by Az, 



From Eq. (8.14) we get that the force between the plates is 



Fig. 8-3. What is the torque on a 
variable capacitor? 


(8.17) 
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Let’s look at Eq. (8.17) a little more closely and see if we can tell how the force 
arises. If for the charge on one plate we write 


Eq. (8.17) can be rewritten as 


Q = <rA, 


f = \ ef 

4 €o 


Or, since the electric field between the plates is 


then 


E 0 = f - 
*0 


F = $QE 0 . 


(8.18) 



One would immediately guess that the force acting on one plate is. the charge 
Q on the plate times the field acting on the charge. But we have a surprising factor 
of one-half. The reason is that E 0 is not the field at the charges. If we imagine that 
the charge at the surface of the plate occupies a thin layer, as indicated in Fig. 8-4, 
the field will vary from zero at the inner boundary of the layer to E 0 in the space 
outside of the plate. The average field acting on the surface charges is E 0 /2. That 
is why the factor one-half is in Eq. (8.18). 

You should notice that in computing the virtual work we have assumed that 
the charge on the condenser was constant — that it was not electrically connected 
to other objects, and so the total charge could not change. 

Suppose we had imagined that the condenser was held at a constant potential 
difference as we made the virtual displacement. Then we should have taken 

U = \CV 2 

and in place of Eq. (8.15) we would have had 

FAz - iV 2 AC, 


Fig. 8-4. The field at the surface of 
a conductor varies from zero to E 0 = 
a/eo, as one passes through the layer of 
surface charge. 


which gives a force equal in magnitude to the one in Eq. (8.15) (because V — Q/C ), 
but with the opposite sign! Surely the force between the condenser plates doesn’t 
reverse in sign as we disconnect it from its charging source. Also, we know that 
two plates with opposite electrical charges must attract. The principle of virtual 
work has been incorrectly applied in the second case — we have not taken into 
account the virtual work done on the charging source. That is, to keep the po- 
tential constant at V as the capacity changes, a charge V AC must be supplied by 
a source of charge. But this charge is supplied at a potential V, so the work done 
by the electrical system which keeps the potential constant is V 2 AC . The mechan- 
ical work F A z plus this electrical work V 2 AC together make up the change in the 
total energy \V 2 AC of the condenser. Therefore FAz is —\V 2 AC , as before. 


8-3 The electrostatic energy of an ionic crystal 

We now consider an application of the concept of electrostatic energy in atomic 
physics. We cannot easily measure the forces between atoms, but we are often 
interested in the energy differences between one atomic arrangement and another, 
as, for example, the energy of a chemical change. Since atomic forces are basically 
electrical, chemical energies are in large part just electrostatic energies. 

Let’s consider, for example, the electrostatic energy of an ionic lattice. An 
ionic crystal like NaCl consists of positive and negative ions which can be thought 
of as rigid spheres. They attract electrically until they begin to touch; then there is 
a repulsive force which goes up very rapidly if we try to push them closer together. 

For our first approximation, therefore, we imagine a set of rigid spheres 
that represent the atoms in a salt crystal. The structure of the lattice has been 
determined by x-ray diffraction. It is a cubic lattice— like a three-dimensional 
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checkerboard. Figure 8-5 shows a cross-sectional view. The spacing of the ions is 
2.81 A (=2.81 X 10“ 8 cm). 

If our picture of this system is correct, we should be able to check it by asking 
the following question: How much energy will it take to pull all these ions apart — 
that is, to separate the crystal completely into ions? This energy should be equal 
to the heat of vaporization of NaCl plus the energy required to dissociate the 
molecules into ions. This total energy to separate NaCl to ions is determined experi- 
mentally to be 7.92 electron volts per molecule. Using the conversion 

1 ev = 1.602 X 10~ 1& joule, 

and Avogadro’s number for the number of molecules in a mole, 

N 0 = 6.02 X 10 23 , 

the energy of vaporization can also be given as 

W = 7.64 X 10 5 joules/mole. 

Physical chemists prefer for an energy unit the kilocalorie, which is 4190 joules; 
so that 1 ev per molecule is 23 kilocalories per mole. A chemist would then say 
that the dissociation energy of NaCl is 

W = 183 kcal/mole. 

Can we obtain this chemical energy theoretically by computing how much 
work it would take to pull apart the crystal? According to our theory, this work is 
the sum of the potential energies of all the pairs of ions. The easiest way to figure 
out this sum is to pick out a particular ion and compute its potential energy with 
each of the other ions. That will give us twice the energy per ion, because the energy 
belongs to the pairs of charges. If we want the energy to be associated with one 
particular ion, we should take half the sum. But we really want the energy per 
molecule , which contains two ions, so that the sum we compute will give directly 
the energy per molecule. 

The energy of an ion with one of its nearest neighbors is e 2 /a, where e 2 = 
q 2 / 47T€ 0 and a is the center-to-center spacing between ions. (We are considering 
monovalent ions.) This energy is 5.12 ev, which we already see is going to give us 
a result of the correct order of magnitude. But it is still a long way from the infinite 
sum of terms we need. 

Let’s begin by summing all the terms from the ions along a straight line. 
Considering that the ion marked Na in Fig. 8-5 is our special ion, we shall consider 
first those ions on a horizontal line with it. There are two nearest Cl ions with 
negative charges, each at the distance a. Then there are two positive ions at the 
distance 2a, etc. Calling the energy of this sum U x , we write 



2.61 X 

Fig. 8-5. Cross section of a salt 
crystal on an atomic scale. The checker- 
board arrangement of Na and Cl ions is 
the same in the two cross sections per- 
pendicular to the one shown. (See Vol. I, 
Fig. 1-7.) 
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The series converges slowly, so it is difficult to evaluate numerically, but it is known 
to be equal to In 2. So 

j 2 2 

C/i = -—In 2 = -1.386 — . (8.20) 

a a 


Now consider the next adjacent line of ions above. The nearest is negative 
and at the distance a. Then there are two positives at the distance y/2 a. The next 
pair are at the distance y/5 a, the next at yTO a, and so on. So for the whole line 
we get the series 
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There are four such lines: above, below, in front, and in back. Then there are the 
four lines which are the nearest lines on diagonals, and on and on. 

If you work patiently through for all the lines, and then take the sum, you 
find that the grand total is 

e 2 

U = 1.747 — j 
a 

which is just somewhat more than what we obtained in (8.20) for the first line. 
Using e 2 /a = 5.12 ev, we get 


U — 8.94 ev. 

Our answer is about 10% above the experimentally observed energy. It shows that 
our idea that the whole lattice is held together by electrical Coulomb forces is 
fundamentally correct. This is the first time that we have obtained a specific 
property of a macroscopic substance from a knowledge of atomic physics. We 
will do much more later. The subject that tries to understand the behavior of 
bulk matter in terms of the laws of atomic behavior is called solid-state physics . 

Now what about the error in our calculation? Why is it not exactly right? 
It is because of the repulsion between the ions at close distances. They are not 
perfectly rigid spheres, so when they are close together they are partly squashed. 
They are not very soft, so they squash only a little bit. Some energy, however, is 
used in deforming them, and when the ions are pulled apart this energy is released. 
The actual energy needed to pull the ions apart is a little less than the energy that 
we calculated ; the repulsion helps in overcoming the electrostatic attraction. 

Is there any way we can make an allowance for this contribution? We could 
if we knew the law of the repulsive force. We are not ready to analyze the details 
of this repulsive mechanism, but we can get some idea of its characteristics from 
some large-scale measurements. From a measurement of the compressibility of 
the whole crystal, it is possible to obtain a quantitative idea of the law of repulsion 
between the ions and therefore of its contribution to the energy. In this Way it 
has been found that this contribution must be 1/9.4 of the contribution from the 
electrostatic attraction and, of course, of opposite sign. If we subtract this contribu- 
tion from the pure electrostatic energy, we obtain 7.99 ev for the dissociation energy 
per molecule. It is much closer to the observed result of 7.92 ev, but still not in 
perfect agreement. There is one more thing we haven’t taken into account: we 
have made no allowance for the kinetic energy of the crystal vibrations. If a cor- 
rection is made for this effect, very good agreement with the experimental number 
is obtained. The ideas are then correct; the major contribution to the energy of a 
crystal like NaCl is electrostatic. 


8-4 Electrostatic energy in nuclei 

We will now take up another example of electrostatic energy in atomic 
physics, the electrical energy of atomic nuclei. Before we do this we will have to 
discuss some .properties of the main forces (called nuclear forces) that hold the 
protons and neutrons together in a nucleus. In the early days of the discovery of 
nuclei — and of the neutrons and protons that make them up — it was hoped that 
the law of the strong, nonelectrical part of the force between, say, a proton and 
another proton would have some simple law, like the inverse square law of elec- 
tricity. For once one had determined this law of force, and the corresponding ones 
between a proton and a neutron, and a neutron and a neutron, it would be possible 
to describe theoretically the complete behavior of these particles in nuclei. There- 
fore a big program was started for the study of the scattering of protons, in the 
hope of finding the law of force between them; but after thirty years of effort, 
nothing simple has emerged. A considerable knowledge of the force between proton 
and proton has been accumulated, but we find that the force is as complicated as 
it can possibly be. 

What we mean by “as complicated as it can be” is that the force depends on 
as many things as it possibly can. 
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First, the force is not a simple function of the distance between the two protons. 
At large distances there is an attraction, but at closer distances there is a repulsion. 
The distance dependence is a complicated function, still imperfectly known. 

Second, the force depends on the orientation of the protons’ spin. The protons 
have a spin, and any two interacting protons may be spinning with their angular 
momenta in the same direction or in opposite directions. And the force is different 
when the spins are parallel from what it is when they are antiparallel, as in (a) 
and (b) of Fig. 8-6. The difference is quite large; it is not a small effect. 

Third, the force is considerably different when the separation of the two 
protons is in the direction parallel to their spins, as in (c) and (d) of Fig. 8-6, than 
it is when the separation is in a direction perpendicular to the spins, as in (a) and (b). 

Fourth, the force depends, as it does in magnetism, on the velocity of the 
protons, only much more strongly than in magnetism. And this velocity-dependent 
force is not a relativistic effect; it is strong even at speeds much less than the speed 
of light. Furthermore, this part of the force depends on other things besides the 
magnitude of the velocity. For instance, when a proton is moving near another 
proton, the force is different when the orbital motion has the same direction of 
rotation as the spin, as in (e) of Fig. 8-6, than when it has the opposite direction 
of rotation, as in (f). This is called the “spin orbit” part of the force. 

The force between a proton and a neutron and between a neutron and a 
neutron are also equally complicated. To this day we do not know the machinery 
behind these forces — that is to say, any simple way of understanding them. 

There is, however, one important way in which the nucleon forces are simpler 
than they could be. That is that the nuclear force between two neutrons is the same 
as the force between a proton and a neutron, which is the same as the force between 
two protons! If, in any nuclear situation, we replace a proton by a neutron (or vice 
versa),- the nuclear interactions are not changed. The “fundamental reason” for 
this equality is not known, but it is an example of an important principle that can 
be extended also to the interaction laws of other strongly interacting particles— 
such as the 7r-mesons and the “strange” particles. 

This fact is nicely illustrated by the locations of the energy levels in similar 
nuclei. Consider a nucleus like B 11 (boron-eleven), which is composed of five 
protons and six neutrons. In the nucleus the eleven particles interact with one 
another in a most complicated dance. Now, there is one configuration of all the 
possible interactions which has the lowest possible energy; this is the normal state 
of the nucleus, and is called the ground state . If the nucleus is disturbed (for exam- 
ple, by being struck by a high-energy proton or other particle) it can be put into 
any number of other configurations, called excited states , each of which will have 
a characteristic energy that is higher than that of the ground state. In nuclear 
physics research, such as is carried on with Van de Graaff generator (for example, 
in Caltech’s Kellogg and Sloan Laboratories), the energies and other properties 
of these excited states are determined by experiment. The energies of the fifteen 
lowest known excited states of B 1 1 are shown in a one-dimensional graph on the 
left half of Fig. 8-7. The lowest horizontal line represents the ground state. 
The first excited state has an energy 2.14 Mev higher than the ground state, 
the next an energy 4.46 Mev higher than the ground state, and so on. The study 
of nuclear physics attempts to find an explanation for this rather complicated 
pattern of energies; there is as yet, however, no complete general theory of 
such nuclear energy levels. 

If we replace one of the neutrons in B 1 1 with a proton, we have the nucleus 
of an isotope of carbon, C 11 . The energies of the lowest sixteen excited states of 
C 11 have also been measured; they are shown in the right half of Fig. 8-7. 
(The broken lines indicate levels for which the experimental information is 
questionable.) 

Looking at Fig. 8-7, we see a striking similarity between the pattern of the 
energy levels in the two nuclei. The first excited states are about 2 Mev above the 
ground states. There is a large gap of about 2.3 Mev to the second excited state, 
then a small jump of only 0.5 Mev to the third level. Again, between the fourth 
and fifth levels, a big jump; but between the fifth and sixth a tiny separation of the 
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Fig. 8-6. The force between two 
protons depends on every possible 
parameter. 
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Fig. 8-7. The energy levels of B 11 
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state of C 11 is 1.982 Mev higher than 
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order of 0.1 Mev. And so on. After about the tenth level, the correspondence 
seems to become lost, but can still be seen if the levels are labeled with their other 
defining characteristics — for instance, their angular momentum and what they do 
to lose their extra energy. 

The striking similarity of the pattern of the energy levels of B 11 and C 11 is 
surely not just a coincidence. It must reveal some physical law. It shows, in fact, 
that even in the complicated situation in a nucleus, replacing a neutron by a proton 
makes very little change. This can mean only that the neutron-neutron and proton- 
proton forces must be nearly identical. Only then would we expect the nuclear 
configurations with five protons and six neutrons to be the same as with six protons 
and five neutrons. 

Notice that the properties of these two nuclei tell us nothing about the neutron- 
proton force; there are the same number of neutron-proton combinations in both 
nuclei. But if we compare two other nuclei, such as C 14 , which has six protons and 
eight neutrons, with N 14 , which has seven of each, we find a similar correspondence 
of energy levels. So we can conclude that the p-p, n-n, and p-n forces are identical 
in all their complexities. There is an unexpected principle in the laws of nuclear 
forces. Even though the force between each pair of nuclear particles is very compli- 
cated, the force between the three possible different pairs is the same. 

But there are some small differences. The levels do not correspond exactly; 
also, the ground state of C 1 1 has an absolute energy (its mass) which is higher than 
the ground state of B 11 by 1.982 Mev. All the other levels are also higher in 
absolute energy by this same amount. So the forces are not exactly equal. But 
we know very well that the complete forces are not exactly equal; there is an elec- 
trical force between two protons because each has a positive charge, while between 
two neutrons there is no such electrical force. Can we perhaps explain the differ- 
ences between B 1 1 and C 1 1 by the fact that the electrical interaction of the protons 
is different in the two cases? Perhaps even the remaining minor differences in the 
levels are caused by electrical effects? Since the nuclear forces are so much stronger 
than the electrical force, electrical effects would have only a small perturbing effect 
on the energies of the levels. 

In order to check this idea, or rather to find out what the consequences of this 
idea are, we first consider the difference in the ground-state energies of the two 
nuclei. To take a very simple model, we suppose that the nuclei are spheres of 
radius r (to be determined), containing Z protons. If we consider that a nucleus 
is like a sphere with uniform charge density, we would expect the electrostatic 
energy (from Eq. 8.7) to be 

U ~54 w’ (822) 


where q e is the elementary charge of the proton. Since Z is five for B 11 and six for 
C 11 , their electrostatic energies would be different. 

With such a small number of protons, however, Eq. (8.22) is not quite correct. 
If we compute the electrical energy between all pairs of protons, considered as points 
which we assume to be nearly uniformly distributed throughout the sphere, we 
find that in Eq. (8.22) the quantity Z 2 should be replaced by Z(Z — 1), so the 
energy is 


3 Z(Z - 1 )q 2 = 3 Z(Z - 1> 2 
5 47T€o r 5 r 


(8.23) 


If we knew the nuclear radius /*, we could use (8.23) to find the electrostatic energy 
difference between B ! 1 and C 11 . But let’s do the opposite; let’s instead use the 
observed energy difference to compute the radius, assuming that the energy differ- 
ence is all electrostatic in origin. 

That is, however, not quite right. The energy difference of 1.982 Mev between 
the ground states of B 11 and C 11 includes the rest energies — that is, the energy 
me 2 — of all the particles. In going from B 11 to C 11 , we replace a neutron by a 
proton, which has less mass. So part of the energy difference is the difference in 
the rest energies of a neutron and a proton, which is 0.784 Mev. The difference, 
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to be accounted for by electrostatic energy, is thus more than 1.982 Mev; it is 

1.982 + 0.784 = 2.786 Mev. 

Using this energy in Eq. (8.23), for the radius of either B 11 or C 11 we find 
r = 3.12 X 10" 13 cm. (8.24) 

Does this number have any meaning? To see whether it does, we should 
compare it with some other determination of the radius of these nuclei. For 
example, we can make another measurement of the radius of a nucleus by seeing 
how it scatters fast particles. From such measurements it has been found, in fact, 
that the density of matter in all nuclei is nearly the same, i.e., their volumes are 
proportional to the number of particles they contain. If we let A be the number of 
protons and neutrons in a nucleus (a number very nearly proportional to its mass), 
it is found that its radius is given by 

t = A lf3 r 0i (8.25) 

where 

7*0 = 1.2 X 10" 13 cm. (8.26) 

From these measurements we find that the radius of a B 1 1 (or a C 11 ) nucleus 
is expected to be 


r = (1.2 X 10" 13 )(11) 1/3 = 2.7 X 10~ 13 cm. 

Comparing this result with (8.24), we see that our assumptions that the 
energy difference between B 11 and C 11 is electrostatic is fairly good; the dis- 
crepancy is only about 15% (not bad for our first nuclear computation!). 

The reason for the discrepancy is probably the following. According to the 
current understanding of nuclei, an even number of nuclear particles — in the case 
of B 11 , five neutrons together with five protons — makes a kind of core; when one 
more particle is added to this core, it revolves around on the outside to make a new 
spherical nucleus, rather than being absorbed. If this is so, we should have taken 
a different electrostatic energy for the additional proton. We should have taken 
the excess energy of C 11 over B 11 to be just 

Z B ql 

4ir€oa ’ 

which is the energy needed to add one more proton to the outside of the core. 
This number is just 5/6 of what Eq. (8.23) predicts, so the new prediction for the 
radius is 5/6 of (8.24), which is in much closer agreement with what is directly 
measured. 

We can draw two conclusions from this agreement. One is that the electrical 
laws appear to be working at dimensions as small as 10" 13 cm. The other is that 
we have verified the remarkable coincidence that the nonelectrical part of the forces 
between proton and proton, neutron and neutron, and proton and neutron are 
all equal. 


8-5 Energy in the electrostatic field 

We now consider other methods of calculating electrostatic energy. They can 
all be derived from the basic relation Eq. (8.3), the sum, over all pairs of charges, 
of the mutual energies of each charge-pair. First we wish to write an expression 
for the energy of a charge distribution. As usual, we consider that each volume 
element dV contains the element of charge p dV. Then Eq. (8.3) should be written 

U = i [ dVi dV 2 . (8.27) 

2 J 4xeori2 

all 

space 
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Notice the factor which is introduced because in the double integral over dV \ 
and dV 2 we have counted all pairs of charge elements twice. (There is no convenient 
way of writing an integral that keeps track of the pairs so that each pair is counted 
only once.) Next we notice that the integral over dV 2 in (8.27) is just the potential 
at (1). That is, 

so that (8.27) can be written as 

u = \jp(\yK\)dv l . 

Or, since the point (2) no longer appears, we can simply write 


U=\jp*dV. ( 8 . 28 ) 

This equation can be interpreted as follows. The potential energy of the charge 
p dV is the product of this charge and the potential at the same point. The total 
energy is therefore the integral over <pp dV. But there is again the factor It is 
still required because we are counting energies twice. The mutual energies of two 
charges is the charge of one times the potential at it due to the other. Or, it can be 
taken as the second charge times the potential at it from the first. Thus for two 
point charges we would write 

a - n.Ki) - ?i 

or 

Notice that we could also write 

U = *[?i*(l) + ?2<K2)]. (8.29) 

The integral in (8.28) corresponds to the sum of both terms in the brackets of 
(8.29). That is why we need the factor 

An interesting question is: Where is the electrostatic energy located? One 
might also ask: Who cares? What is the meaning of such a question? If there is 
a pair of interacting charges, the combination has a certain energy. Do we need 
to say that the energy is located at one of the charges or the other, or at both, or in 
between? These questions may not make sense because we really know only that 
the total energy is conserved. The idea that the energy is located somewhere is 
not necessary. 

Yet suppose that it did make sense to say, in general, that energy is located 
at a certain place, as it does for heat energy. We might then extend our principle 
of the conservation of energy with the idea that if the energy in a given volume 
changes, we should be able to account for the change by the flow of energy into 
or out of that volume. You realize that our early statement of the principle of the 
conservation of energy is still perfectly all right if some energy disappears at one 
place and appears somewhere else far away without anything passing (that is, with- 
out any special phenomena occurring) in the space between. We are, therefore, 
now discussing an extension of the idea of the conservation of energy. We might 
call it a principle of the local conservation of energy. Such a principle would say 
that the energy in any given volume changes only by the amount that flows into or 
out of the volume. It is indeed possible that energy is conserved locally in such a 
way. If it is, we would have a much more detailed law than the simple statement 
of the conservation of total energy. It does turn out that in nature energy is 
conserved locally . We can find formulas for where the energy is located and how it 
travels from place to place. 

There is also a physical reason why it is imperative that we be able to say 
where energy is located. According to the theory of gravitation, all mass is a source 
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of gravitational attraction. We also know, by E — me 2 , that mass and energy are 
equivalent. All energy is, therefore, a source of gravitational force. If we could not 
locate the energy, we could not locate all the mass. We would not be able to say 
where the sources of the gravitational field are located. The theory of gravitation 
would be incomplete. 

If we restrict ourselves to electrostatics there is really no way to tell where the 
energy is located. The complete Maxwell equations of electrodynamics give us 
much more information (although even then the answer is, strictly speaking, not 
unique.) We will therefore discuss this question in detail again in a later chapter. 
We will give you now only the result for the particular case of electrostatics. 
The energy is located in space, where the electric field is. This seems reasonable 
because we know that when charges are accelerated they radiate electric fields. 
We would like to say that when light or radiowaves travel from one point to another, 
they carry their energy with them. But there are no charges in the waves. So we 
would like to locate the energy where the electromagnetic field is and not at the 
charges from which it came. We thus describe the energy, not in terms of the 
charges, but in terms of the fields they produce. We can, in fact, show that Eq. 
(8.28) is numerically equal to 

U = -J fE-EdV. (8.30) 

We can then interpret this formula as saying that when an electric field is present, 
there is located in space an energy whose density (energy per unit volume) is 

u = ^ E E = (8.31) 

This idea is illustrated in Fig. 8-8. 

To show that Eq. (8.30) is consistent with our laws of electrostatics, we begin 
by introducing into Eq. (8.28) the relation between p and <p that we obtained in 
Chapter 6: 

p = — e 0 VV 

We get 

U = J<t>V 2 <l>dV. (8.32) 

Writing out the components of the integrand, we see that 

-h (♦$ - + £(♦£)■ - (£)* + 1 (*t) - (S)’ 

= V • (<*> V<t>) - (V<#>) • ( V<t> ). (8.33) 

Our energy integral is then 

V = y / (V*) • (V*) dV-^Jv-(4> V*) dv. 

We can use Gauss’ theorem to change the second integral into a surface integral: 

J V • (<t> V<t>) dV = J ( <t> v<t>) ■ n da. (8.34) 

vol. surface 

We evaluate the surface integral in the case that the surface goes to infinity 
(so the volume integrals become integrals over all space), supposing that all the 
charges are located within some finite distance. The simple way to proceed is to 
take a spherical surface of enormous radius R whose center is at the origin of 
coordinates. We know that when we are very far away from all charges, varies 
as 1 /R and V<f> as 1 /R 2 . (Both will decrease even faster with R if there the net 
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Fig. 8-8. Each volume element dV = 
dx dy dz in an electric field contains the 
energy Uo/2 )E 2 dV. 



charge in the distribution is zero.) Since the surface area of the large sphere in- 
creases as R 2 , we see that the surface integral falls off as (l/R)(l/R 2 )R 2 = (1 /R) 
as the radius of the sphere increases. So if we include all space in our integration 
(R — ► oo ), the surface integral goes to zero and we have that 

= J (V0) (V«)rfF = ^ f E • E dV. (8.35) 

all all 

space space 

We see that it is possible for us to represent the energy of any charge distribution 
as being the integral over an energy density located in the field. 


8-6 The energy of a point charge 

Our new relation, Eq. (8.35), says that even a single point charge q will have 
some electrostatic energy. In this case, the electric field is given by 

£ = $ . 

47 T€ 0 r2 

So the energy density at the distance r from the charge is 


e 0 E 2 = q 2 
2 327r 2 e 0 r 4 


We can take for an element of volume a spherical shell of thickness dr and area 
Airr 2 . The total energy is 



r= 0 


8 7re 0 r 2 


dr = 


q 2 1 

87T€o r 


**=00 


lr=0 


(8.36) 


Now the limit at r = oo gives no difficulty. But for a point charge we are 
supposed to integrate down to r = 0, which gives an infinite integral. Equation 
(8.35) says that there is an infinite amount of energy in the field of a point charge, 
although we began with the idea that there was energy only between point charges. 
In our original energy formula for a collection of point charges (Eq. 8.3), we did 
not include any interaction energy of a charge with itself. What has happened is 
that when we went over to a continuous distribution of charge in Eq. (8.27), we 
counted the energy of interaction of every infinitesimal charge with all other 
infinitesimal charges. The same account is included in Eq. (8.35), so when we 
apply it to a finite point charge, we are including the energy it would take to 
assemble that charge from infinitesimal parts. You will notice, in fact, that we 
would also get the result in Eq. (8.36) if we used our expression (8.1 1) for the energy 
of a charged sphere and let the radius tend toward zero. 

We must conclude that the idea of locating the energy in the field is incon- 
sistent with the assumption of the existence of point charges. One way out of the 
difficulty would be to say that elementary charges, such as an electron, are not 
points but are really small distributions of charge. Alternatively, we could say 
that there is something wrong in our theory of electricity at very small distances, 
or with the idea of the local conservation of energy. There are difficulties with 
either point of view. These difficulties have never been overcome ; they exist to this 
day. Sometime later, when we have discussed some additional ideas, such as the 
momentum in an electromagnetic field, we will give a more complete account of 
these fundamental difficulties in our understanding of nature. 
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Electricity in the Atmncph&re 


9-1 The electric potential gradient of the amosphere 


On an ordinary day over flat desert country, or over the sea, as one goes up- 
ward from the surface of the ground the electric potential increases by about 100 
volts per meter. Thus there is a vertical electric field E of 100 volts/m in the air. The 
sign of the field corresponds to a negative charge on the earth’s surface. This 
means that outdoors the potential at the height of your nose is 200 volts higher 
than the potential at your feet! You might ask: ‘‘Why don’t we just stick a pair of 
electrodes out in the air one meter apart and use the 100 volts to power our electric 
lights?” Or you might wonder: “If there is really a potential difference of 200 
volts between my nose and my feet, why is it I don’t get a shock when I go out into 
the street?” 

We will answer the second question first. Your body is a relatively good 
conductor. If you are in contact with the ground, you and the ground will tend to 
make one equipotential surface. Ordinarily, the equipotentials are parallel to the 
surface, as shown in Fig. 9-1 (a), but when you are there, the equipotentials are 
distorted, and the field looks somewhat as shown in Fig. 9-1 (b). So you still have 
very nearly zero potential difference between your head and your feet. There are 
charges that come from the earth to your head, changing the field. Some of them 
may be discharged by ions collected from the air, but the current of these is very 
small because air is a poor conductor. 
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Fig. 9-1. (a) The potential distribution above the earth, (b) The potential 

distribution near a man in an open flat place. 

How can we measure such a field if the field is changed by putting something 
there? There are several ways. One way is to place an insulated conductor at some 
distance above the ground and leave it there until it is at the same potential as the 
air. If we leave it long enough, the very small conductivity in the air will let the 
charges leak off (or onto) the conductor until it comes to the potential at its level. 

Then we can bring it back to the ground, and measure the shift of its potential as 
we do so. A faster way is to let the conductor be a bucket of water with a small 
leak. As the water drops out, it carries away any excess charges and the bucket 
will approach the same potential as the air. (The charges, as you know, reside on 
the surface, and as the drops come off “pieces of surface” break off.) We can meas- 
ure the potential of the bucket with an electrometer. 
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There is another way to directly measure the potential gradient . Since there 
is an electric field, there is a surface charge on the earth (<r = If we place 

a flat metal plate at the earth’s surface and ground it, negative charges appear on 
it (Fig. 9-2a). If this plate is now covered by another grounded conducting cover B, 
the charges will appear on the cover, and there will be no charges on the original 
plate A . If we measure the charge that flows from plate A to the ground (by, say, 
a galvanometer in the grounding wire) as we cover it, we can find the surface 
charge density that was there, and therefore also find the electric field. 

Having suggested how we can measure the electric field in the atmosphere, 
we now continue our description of it. Measurements show, first of all, that the 
field continues to exist, but gets weaker, as one goes up to high altitudes. By about 
50 kilometers, the field is very small, so most of the potential change (the integral 
of E) is at lower altitudes. The total potential difference from the surface of the 
earth to the top of the atmosphere is about 400,000 volts. 

9-2 Electric currents in the atmosphere 


Fig. 9-2. (a) A grounded metal plate 

will have the same surface charge as the 
earth, (b) If the plate is covered with a 
grounded conductor it will have no 
surface charge. 



ELECTROMETER 


Fig. 9-3. Measuring the conductivity 
of air due to the motion of ions. 


Another thing that can be measured, in addition to the potential gradient, is 
the current in the atmosphere. The current density is small — about 10 micromicro- 
amperes crosses each square meter parallel to the earth. The air is evidently not a 
perfect insulator, and because of this conductivity, a small current — caused by the 
electric field we have just been describing — passes from the sky down to the earth. 

Why does the atmosphere have conductivity? Here and there among the air 
molecules there is an ion — a molecule of oxygen, say, which has acquired an 
extra electron, or perhaps lost one. These ions do not stay as single molecules; 
because of their electric field they usually accumulate a few other molecules around 
them. Each ion then becomes a little lump which, along with other lumps, drifts 
in the field — moving slowly upward or downward — making the observed current. 
Where do the ions come from? It was first guessed that the ions were produced by 
the radioactivity of the earth. (It was known that the radiation from radioactive 
materials would make air conducting by ionizing the air molecules.) Particles 
like £-rays coming out of the atomic nuclei are moving so fast that they tear elec- 
trons from the atoms, leaving ions behind. This would imply, of course, that if 
we were to go to higher altitudes, we should find less ionization, because the radio- 
activity is all in the dirt on the ground — in the traces of radium, uranium, po- 
tassium, etc. 

To test this theory, some physicists carried an experiment up in balloons to 
measure the ionization of the air (Hess, in 1912) and discovered that the opposite 
was true — the ionization per unit volume increased with altitude! (The apparatus 
was like that of Fig. 9-3. The two plates were charged periodically to the potential 
V. Due to the conductivity of the air, the plates slowly discharged; the rate of 
discharge was measured with the electrometer.) This was a most mysterious 
result — the most dramatic finding in the entire history of atmospheric electricity. 
It was so dramatic, in fact, that it required a branching off of an entirely new 
subject — cosmic rays. Atmospheric electricity itself remained less dramatic. 
Ionization was evidently being produced by something from outside the earth; 
the investigation of this source led to the discovery of the cosmic rays. We will 
not discuss the subject of cosmic rays now, except to say that they maintain the 
supply of ions. Although the ions are being swept away all the time, new ones are 
being created by the cosmic-ray particles coming from the outside. 

To be precise, we must say that besides the ions made of molecules, there are 
also other kinds of ions. Tiny pieces of dirt, like extremely fine bits of dust, float 
in the air and become charged. They are sometimes called '‘nuclei.” For example, 
when a wave breaks in the sea, little bits of spray are thrown into the air. When 
one of these drops evaporates, it leaves an infinitesimal crystal of NaCl floating in 
the air. These tiny crystals can then pick up charges and become ions; they 
are called “large ions.” 

The small ions — those formed by cosmic rays — are the most mobile. Because 
they are so small, they move rapidly through the air — with a speed of about 1 
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cm/sec in a field of 100 volts/meter, or 1 volt/cm. The much bigger and heavier 
ions move much more slowly. It turns out that if there are many “nuclei,” they will 
pick up the charges from the small ions. Then, since the “large ions” move so 
slowly in a field, the total conductivity is reduced. The conductivity of air, there- 
fore, is quite variable, since it is very sensitive to the amount of “dirt” there is in it. 
There is much more of such dirt over land — where the winds can blow up dust 
or where man throws all kinds of pollution into the air — than there is over water. 
It is not surprising that from day to day, from moment to moment, from place 
to place, the conductivity near the earth’s surface varies enormously. The voltage 
gradient observed at any particular place on the earth’s surface also varies greatly 
because roughly the same current flows down from high altitudes in different places, 
and the varying conductivity near the earth results in a varying voltage gradient. 

The conductivity of the air due to the drifting of ions also increases rapidly 
with altitude — for two reasons. First of all, the ionization from cosmic rays in- 
creases with altitude. Secondly, as the density of air goes down, the mean free path 
of the ions increases, so that they can travel farther in the electric field before they 
have a collision — resulting in a rapid increase of conductivity as one goes up. 

Although the electric current-density in the air is only a few micromicro- 
amperes per square meter, there are very many square meters on the earth’s surface. 
The total electric current reaching the earth’s surface at any time is very nearly 
constant at 1800 amperes. This current, of course, is “positive” — it carries plus 
charges to the earth. So we have a voltage supply of 400,000 volts with a current 
of 1800 amperes — a power of 700 megawatts! 

With such a large current coming down, the negative charge on the earth 
should soon be discharged. In fact, it should take only about half an hour to dis- 
charge the entire earth. But the atmospheric electric field has already lasted more 
than a half-hour since its discovery. How is it maintained? What maintains the 
voltage? And between what and the earth? There are many questions. 

The earth is negative, and the potential in the air is positive. If you go high 
enough, the conductivity is so great that horizontally there is no more chance for 
voltage variations. The air, for the scale of times that we are talking about, be- 
comes effectively a conductor. This occurs at a height in the neighborhood of 50 
kilometers. This is not as high as what is called the “ionosphere,” in which there 
are very large numbers of ions produced by photoelectricity from the sun. Never- 
theless, for our discussions of atmospheric electricity, the air becomes sufficiently 
conductive at about 50 kilometers that we can imagine that there is practically a 
perfect conducting surface at this height, from which the currents come down. 
Our picture of the situation is shown in Fig. 9-4. The problem is: How is the 
positive charge maintained there? How is it pumped back? Because if it comes 
down to the earth, it has to be pumped back somehow. That was one of the 
greatest puzzles of atmospheric electricity for quite a while. 

Each piece of information we can get should give a clue or, at least, tell you 
something about it. Here is an interesting phenomenon: If we measure the current 
(which is more stable than the potential gradient) over the sea, for instance, or in 
careful conditions, and average very carefully so that we get rid of the irregularities, 
we discover that there is still a daily variation. The average of many measurements 
over the oceans has a variation with time roughly as shown in Fig. 9-5. The 
current varies by about ±15 percent, and it is largest at 7 :00 p.m. in London. The 
strange part of the thing is that no matter where you measure the current — in the 
Atlantic Ocean, the Pacific Ocean, or the Arctic Ocean — it is at its peak value 
when the clocks in London say 7:00 p.m.! All over the world the current is at its 
maximum at 7:00 p.m. London time and it is at a minimum at 4:00 a.m. London 
time. In other words, it depends upon the absolute time on the earth, not upon 
the local time at the place of observation. In one respect this is not mysterious; 
it checks with our idea that there is a very high conductivity laterally at the top, 
because that makes it impossible for the voltage difference from the ground to 
the top to vary locally. Any potential variations should be worldwide, as indeed 
they are. What we now know, therefore, is that the voltage at the “top” surface 
is dropping and rising by 15 percent with the absolute time on the earth. 
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Fig. 9-4. Typical electrical condi- 
tions in a clear atmosphere. 



Fig. 9-5. The average daily varia- 
tion of the atmospheric potential gradient 
on a clear day over the oceans; referred 
to Greenwich time. 
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9-3 Origin of the atmospheric currents 

We must next talk about the source of the large negative currents which 
must be flowing from the “top 1 ' to the surface of the earth to keep charging it up 
negatively. Where are the batteries that do this? The “battery 11 is shown in Fig. 
9-6. It is the thunderstorm and its lightning. It turns out that the bolts of lightning 
do not “discharge 11 the potential we have been talking about (as you might at 
first guess). Lightning storms carry negative charges to the earth. When a lightning 
bolt strikes, ten-to-one it brings down negative charges to the earth in large amounts. 
It is the thunderstorms throughout the world that are charging the earth with an 
average of 1800 amperes, which is then being discharged through regions of 
fair weather. 

There are about 300 thunderstorms per day all over the earth, and we can 
think of them as batteries pumping the electricity to the upper layer and maintain- 
ing the voltage difference. Then take into account the geography of the earth — 
there are thunderstorms in the afternoon in Brazil, tropical thunderstorms in 
Africa, and so forth. People have made estimates of how much lightning is striking 
world-wide at any time, and perhaps needless to say, their estimates more or less 
agree with the voltage difference measurements: the total amount of thunderstorm 
activity is highest on the whole earth at about 7:00 p.m. in London. However, 
the thunderstorm estimates are very difficult to make and were made only after 
it was known that the variation should have occurred. These things are very 
difficult because we don’t have enough observations on the seas and over all parts 
of the world to know the number of thunderstorms accurately. But those people 
who think they “do it right 11 obtain the result that there is a peak in the activity 
at 7:00 p.m. Greenwich Mean Time. 
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Pi g The mechanism that generates the atmospheric electric field. [Photo by William L. Widmayer.] 
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In order to understand how these batteries work, we will look at a thunder- 
storm in detail. What is going on inside a thunderstorm? We will describe this 
insofar as it is known. As we get into this marvelous phenomenon of real nature — 
instead of the idealized spheres of perfect conductors inside of other spheres that 
we can solve so neatly — we discover that we don't know very much. Yet it is really 
quite exciting. Anyone who has been in a thunderstorm has enjoyed it, or has been 
frightened, or at least has had some emotion. And in those places in nature where 
we get an emotion, we find that there is generally a corresponding complexity and 
mystery about it. It is not going to be possible to describe exactly how a thunder- 
storm works, because we do not yet know very much. But we will try to describe 
a little bit about what happens. 


9-4 Thunderstorms 

In the first place, an ordinary thunderstorm is made up of a number of “cells” 
fairly close together, but almost independent of each other. So it is best to analyze 
one cell at a time. By a “cell” we mean a region with a limit area in the horizontal 
direction in which all of the basic processes occur. Usually there are several cells 
side by side, and in each one about the same thing is happening, although perhaps 
with a different timing. Figure 9-7 indicates in an idealized fashion what such a 
cell looks like in the early stage of the thunderstorm. It turns out that in a certain 
place in the air, under certain conditions which we shall describe, there is a general 
rising of the air, with higher and higher velocities near the top. As the warm, 
moist air at the bottom rises, it cools and condenses. In the figure the little crosses 
indicate snow and the dots indicate rain, but because the updraft currents are great 
enough and the drops are small enough, the snow and rain do not come down at 
this stage. This is the beginning stage, and not the real thunderstorm yet — in the 
sense that we don't have anything happening at the ground. At the same time that 
the warm air rises, there is an entrainment of air from the sides — an important 
point which was neglected for many years. Thus it is not just the air from below 
which is rising, but also a certain amount of other air from the sides. 

Why does the air rise like this? As you know, when you go up in altitude the 
air is colder. The ground is heated by the sun, and the re-radiation of heat to the 
sky comes from water vapor high in the atmosphere; so at high altitudes the air 
is cold — very cold — whereas lower down it is warm. You may say, “Then it’s 
very simple. Warm air is lighter than cold ; therefore the combination is mechan- 
ically unstable and the warm air rises.” Of course, if the temperature is different 
at different heights, the air is unstable thermodynamically . Left to itself infinitely 
long, the air would all come to the same temperature. But it is not left to itself; 
the sun is always shining (during the day). So the problem is indeed not one of 
thermodynamic equilibrium, but of mechanical equilibrium. Suppose we plot — as 
in Fig. 9-8 — the temperature of the air against height above the ground. In 
ordinary circumstances we would get a decrease along a curve like the one labeled 
(a); as the height goes up, the temperature goes down. How can the atmosphere 
be stable? Why doesn’t the hot air below simply rise up into the cold air? The 
answer is this: if the air were to go up, its pressure would go down, and if we 
consider a particular parcel of air going up, it would be expanding adiabatically. 
(There would be no heat coming in or out because in the large dimensions con- 
sidered here, there isn’t time for much heat flow.) Thus the parcel of air would 
cool as it rises. Such an adiabatic process would give a temperature-height relation- 
ship like curve (b) in Fig. 9-8. Any air which rose from below would be colder 
than the environment it goes into. Thus there is no reason for the hot air below 
to rise; if it were to rise, it would cool to a lower temperature than the air already 
there, would be heavier than the air there, and would just want to come down again. 
On a good, bright day with very little humidity there is a certain rate at which the 
temperature in the atmosphere falls, and this rate is, in general, lower than the 
“maximum stable gradient,” which is represented by curve (b). The air is in 
stable mechanical equilibrium. 

9-5 



Fig. 9-7. A thunderstorm cell in the 
early stages of development. [From U.S. 
Department of Commerce Weather Bureau 
Report, June 1 949.] 
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Fig. 9-8. Atmospheric temperature, 
(a) Static atmosphere; (b) adiabatic 
cooling of dry air; (c) adiabatic cooling 
of wet air; (d) wet air with some mixing 
of ambient air. 
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Fig. 9-9. A mature thunderstorm cell. 
[From U.S. Department of Commerce 
Weather Bureau Report, June 1949.] 


On the other hand, if we think of a parcel of air that contains a lot of water 
vapor being carried up into the air, its adiabatic cooling curve will be different. As 
it expands and cools, the water vapor in it will condense, and the condensing water 
will liberate heat. Moist air, therefore, does not cool nearly as much as dry air 
does. So if air that is wetter than the average starts to rise, its temperature will 
follow a curve like (c) in Fig. 9-8. It will cool off somewhat, but will still be warmer 
than the surrounding air at the same level. If we have a region of warm moist 
air and something starts it rising, it will always find itself lighter and warmer than 
the air around it and will continue to rise until it gets to enormous heights. This 
is the machinery that makes the air in the thunderstorm cell rise. 

For many years the thunderstorm cell was explained simply in this manner. 
But then measurements showed that the temperature of the cloud at different 
heights was not nearly as high as indicated by curve (c). The reason is that as the 
moist air “bubble” goes up, it entrains air from the environment and is cooled 
off by it. The temperature-versus-height curve looks more like curve (d), which 
is much closer to the original curve (a) than to curve (c). 

After the convection just described gets under way, the cross section of a 
thunderstorm cell looks like Fig. 9-9. We have what is called a “mature” thunder- 
storm. There is a very rapid updraft which, in this stage, goes up to about 10,000 
to 15,000 meters — sometimes even much higher. The thunderheads, with their 
condensation, climb way up out of the general cloud bank, carried by an updraft 
that is usually about 60 miles an hour. As the water vapor is carried up and 
condenses, it forms tiny drops which are rapidly cooled to temperatures below 
zero degrees. They should freeze, but do not freeze immediately — they are “super- 
cooled.” Water and other liquids will usually cool well below their freezing points 
before crystallizing if there are no “nuclei” present to start the crystallization 
process. Only if there is some small piece of material present, like a tiny crystal of 
NaCl, will the water drop freeze into a little piece of ice. Then the equilibrium is 
such that the water drops evaporate and the ice crystals grow. Thus at a certain 
point there is a rapid disappearance of the water and a rapid buildup of ice. Also, 
there may be direct collisions between the water drops and the ice — collisions in 
which the supercooled water becomes attached to the ice crystals, which causes it 
to suddenly crystallize. So at a certain point in the cloud expansion there is a rapid 
accumulation of large ice particles. 

When the ice particles are heavy enough, they begin to fall through the rising 
air — they get too heavy to be supported any longer in the updraft. As they come 
down, they draw a little air with them and start a downdraft. And surprisingly 
enough, it is easy to see that once the downdraft is started, it will maintain itself. 
The air now drives itself down! 

Notice that the curve (d) in Fig. 9-8 for the actual distribution of temperature 
in the cloud is not as steep as curve (c), which applies to wet air. So if we have wet 
air falling, its temperature will drop with the slope of curve (c) and will go below 
the temperature of the environment if it gets down far enough, as indicated by 
curve (e) in the figure. The moment it does that, it is denser than the environment 
and continues to fall rapidly. You say, “That is perpetual motion. First, you argue 
that the air should rise, and when you have it up there, you argue equally well that 
the air should fall.” But it isn’t perpetual motion. When the situation is unstable 
and the warm air should rise, then clearly something has to replace the warm air. 
It is equally true that cold air coming down would energetically replace the warm 
air, but you realize that what is coming down is not the original air. The early 
arguments, that had a particular cloud without entrainment going up and then 
coming down, had some kind of a puzzle. They needed the rain to maintain the 
downdraft — an argument which is hard to believe. As soon as you realize that there 
is a lot of original air mixed in with the rising air, the thermodynamic argument 
shows that there can be a descent of the cold air which was originally at some great 
height. This explains the picture of the active thunderstorm sketched in Fig. 9-9. 

As the air comes down, rain begins to come out of the bottom of the thunder- 
storm. In addition, the relatively cold air spreads out when it arrives at the earth’s 
surface. So just before the rain comes there is a certain little cold wind that gives 
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us a forewarning of the coming storm. In the storm itself there are rapid and ir- 
regular gusts of air, there is an enormous turbulence in the cloud, and so on. But 
basically we have an updraft, then a downdraft — in general, a very complicated 
process. 

The moment at which precipitation starts is the same moment that the large 
downdraft begins and is the same moment, in fact, when the electrical phenomena 
arise. Before we describe lightning, however, we can finish the story by looking 
at what happens to the thunderstorm cell after about one-half an hour to an hour. 
The cell looks as shown in Fig. 9-10. The updraft stops because there is no longer 
enough warm air to maintain it. The downward precipitation continues for a while, 
the last little bits of water come out, and things get quieter and quieter — although 
there are small ice crystals left way up in the air. Because the winds at very great 
altitude are in different directions, the top of the cloud usually spreads into an 
anvil shape. The cell comes to the end of its life. 




Fig. 9-10. The late phase of a thunderstorm 
cell. [From U.S. Department of Commerce Weather 
Bureau Report, June 1 949.] 


Fig. 9-11. The distribution of electrical charges in a 
mature thunderstorm cell. [From U.S. Department of Com- 
merce Weather Bureau Report, June 1949.] 


9-5 The mechanism of charge separation 

We want now to discuss the most important aspect for our purposes — the 
development of the electrical charges. Experiments of various kinds — including 
flying airplanes through thunderstorms (the pilots who do this are brave men!) — 
tell us that the charge distribution in a thunderstorm cell is something like that 
shown in Fig. 9-11. The top of the thunderstorm has a positive charge, and the 
bottom a negative one — except for a small local region of positive charge in the 
bottom of the cloud, which has caused everybody a lot of worry. No one seems to 
know why it is there, how important it is — whether it is a secondary effect of the 
positive rain coming down, or whether it is an essential part of the machinery. 
Things would be much simpler if it weren’t there. Anyway, the predominantly 
negative charge at the bottom and the positive charge at the top have the correct 
sign for the battery needed to drive the earth negative. The positive charges are 
6 or 7 kilometers up in the air, where the temperature is about — 20°C, whereas 
the negative charges are 3 or 4 kilometers high, where the temperature is between 
zero and — 10°C. 

The charge at the bottom of the cloud is large enough to produce potential 
differences of 20, or 30, or even 100 million volts between the cloud and the earth — 
much bigger than the 0.4 million volts from the “sky” to the ground in a clear 
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TO WATER 
SUPPLY 


Fig. 9-12. A jet of water with an 
electric field near the nozzle. 


atmosphere. These large voltages break down the air and create giant arc dis- 
charges. When the breakdown occurs the negative charges at the bottom of the 
thunderstorm are carried down to the earth in the lightning strokes. 

Now we will describe in some detail the character of the lightning. First of 
all, there are large voltage differences around, so that the air breaks down. There 
are lightning strokes between one piece of a cloud and another piece of a cloud, 
or between one cloud and another cloud, or between a cloud and the earth. In 
each of the independent discharge flashes — the kind of lightning strokes you see — 
there are approximately 20 or 30 coulombs of charge brought down. One question 
is: How long does it take for the cloud to regenerate the 20 or 30 coulombs which 
are taken away by the lightning bolt? This can be seen by measuring, far from a 
cloud, the electric field produced by the cloud’s dipole moment. In such measure- 
ments you see a sudden decrease in the field when the lightning strikes, and then 
an exponential return to the previous value with a time constant which is slightly 
different for different cases but which is in the neighborhood of 5 seconds. It takes 
a thunderstorm only 5 seconds after each lightning stroke to build its charge up 
again. That doesn’t necessarily mean that another stroke is going to occur in 
exactly 5 seconds every time, because, of course, the geometry is changed, and so on. 
The strokes occur more or less irregularly, but the important point is that it takes 
about 5 seconds to recreate the original condition. Thus there are approximately 
4 amperes of current in the generating machine of the thunderstorm. This means 
that any model made to explain how this storm generates its electricity must be one 
with plenty of juice — it must be a big, rapidly operating device. 

Before we go further we shall consider something which is almost certainly 
completely irrelevant, but nevertheless interesting, because it does show the effect 
of an electric field on water drops. We say that it may be irrelevant because it 
relates to an experiment one can do in the laboratory with a stream of water to 
show the rather strong effects of the electric field on drops of water. In a thunder- 
storm there is no stream of water; there is a cloud of condensing ice and drops of 
water. So the question of the mechanisms at work in a thunderstorm is probably 
not at all related to what you can see in the simple experiment we will describe. 
If you take a small nozzle connected to a water faucet and direct it upward at a 
steep angle, as in Fig. 9-12, the water will come out in a fine stream that eventually 
breaks up into a spray of fine drops. If you now put an electric field across the 
stream at the nozzle (by bringing up a charged rod, for example), the form of the 
stream will change. With a weak electric field you will find that the stream breaks 
up into a smaller number of large-sized drops. But if you apply a stronger field, 
the stream breaks up into many, many fine drops — smaller than before. 1 " With a 
weak electric field there is a tendency to inhibit the breakup of the stream into 
drops. With a stronger field, however, there is an increase in the tendency to sepa- 
rate into drops. 

The explanation of these effects is probably the following. If we have the 
stream of water coming out of the nozzle and we put a small electric field across it 
one side of the water gets slightly positive and the other side gets slightly negative. 
Then, when the stream breaks, the drops on one side may be positive, and those on 
the other side may be negative. They will attract each other and will have a tend- 
ency to stick together more than they would have before — the stream doesn’t 
break up as much. On the other hand, if the field is stronger, the charge in each 
one of the drops gets much larger, and there is a tendency for the charge itself to 
help break up the drops through their own repulsion. Each drop will break into 
many smaller ones, each carrying a charge, so that they are all repelled, and 
spread out so rapidly. So as we increase the field, the stream becomes more finely 
separated. The only point we wish to make is that in certain circumstances electric 
fields can have considerable influence on the drops. The exact machinery by 
which something happens in a thunderstorm is not at all known, and is not at all 
necessarily related to what we have just described. We have included it just so that 


* A handy way to observe the sizes of the drops is to let the stream fall on a large thin 
metal plate. The larger drops make a louder noise. 
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you will appreciate the complexities that could come into play. In fact, nobody 
has a theory applicable to clouds based on that idea. 

We would like to describe two theories which have been invented to account 
for the separation of the charges in a thunderstorm. All the theories involve the 
idea that there should be some charge on the precipitation particles and a different 
charge in the air. Then by the movement of the precipitation particles — the water 
or the ice — through the air there is a separation of electric charge. The only ques- 
tion is: How does the charging of the drops begin? One of the older theories is 
called the “breaking-drop” theory. Somebody discovered that if you have a drop 
of water that breaks into two pieces in a windstream, there is positive charge on the 
water and negative charge in the air. This breaking-drop theory has several 
disadvantages, among which the most serious is that the sign is wrong. Second, 
in the large number of temperate-zone thunderstorms which do exhibit lightning, 
the precipitation effects at high altitudes are in ice, not in water. 

From what we have just said, we note that if we could imagine some way for 
the charge to be different at the top and bottom of a drop and if we could also see 
some reason why drops in a high-speed airstream would break up into unequal 
pieces — a large one in the front and a smaller one in the back because of the motion 
through the air or something — we would have a theory. (Different from any known 
theory!) Then the small drops would not fall through the air as fast as the big 
ones, because of the air resistance, and we would get a charge separation. You 
see, it is possible to concoct all kinds of possibilities. 

One of the more ingenious theories, which is more satisfactory in many re- 
spects than the breaking-drop theory, is due to C. T. R. Wilson. We will describe 
it, as Wilson did, with reference to water drops, although the same phenomenon 
would also work with ice. Suppose we have a water drop that is falling in the elec- 
tric field of about 100 volts per meter toward the negatively charged earth. The 
drop will have an induced dipole moment — with the bottom of the drop positive 
and the top of the drop negative, as drawn in Fig. 9-13. Now there are in the air 
the “nuclei” that we mentioned earlier — the large slow-moving ions. (The fast 
ions do not have an important effect here.) Suppose that as a drop comes down, 
it approaches a large ion. If the ion is positive, it is repelled by the positive bottom 
of the drop and is pushed away. So it does not become attached to the drop. 
If the ion were to approach from the top, however, it might attach to the negative, 
top side. But since the drop is falling through the air, there is an air drift relative 
to it, going upwards, which carries the ions away if their motion through the air 
is slow enough. Thus the positive ions cannot attach at the top either. This 
would apply, you see, only to the large, slow-moving ions. The positive ions of 
this type will not attach themselves either to the front or the back of a falling drop. 
On the other hand, as the large, slow, negative ions are approached by a drop, 
they will be attracted and will be caught. The drop will acquire negative charge — 
the sign of the charge having been determined by the original potential difference 
on the entire earth — and we get the right sign. Negative charge will be brought 
down to the bottom part of the cloud by the drops, and the positively charged ions 
which are left behind will be blown to the top of the cloud by the various updraft 
currents. The theory looks pretty good, and it at least gives the right sign. Also it 
doesn’t depend on having liquid drops. We will see, when we learn about polariza- 
tion in a dielectric, that pieces of ice will do the same thing. They also will develop 
positive and negative charges on their extremities when they are in an electric field. 

There are, however, some problems even with this theory. First of all, the 
total charge involved in a thunderstorm is very high. After a short time, the supply 
of large ions would get used up. So Wilson and others have had to propose that 
there are additional sources of the large ions. Once the charge separation starts, 
very large electric fields are developed, and in these large fields there may be places 
where the air will become ionized. If there is a highly charged point, or any small 
object like a drop, it may concentrate the field enough to make a “brush discharge.” 
When there is a strong enough electric field — let us say it is positive — electrons 
will fall into the field and will pick up a lot of speed between collisions. Their 
speed will be such that in hitting another atom they will tear electrons off at that 
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Fig. 9-1 3. C. T. R. Wilson's theory of 
charge separation in a thundercloud. 




Fig. 9-14. Photograph of a lightning 
flash taken with a “Boys” camera. [From 
Schonland, Mai an, and Col lens, Proc, Roy. 
Soc. London, Vol. 152 (1935).] 
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Fig. 9-1 5. The formation of the "step 
leader." 


atom, leaving positive charges behind. These new electrons also pick up speed 
and collide with more electrons. So a kind of chain reaction or avalanche occurs, 
and there is a rapid accumulation of ions. The positive charges are left near their 
original positions, so the net effect is to distribute the positive charge on the point 
into a region around the point. Then, of course, there is no longer a strong field, 
and the process stops. This is the character of a brush discharge. It is possible that 
the fields may become strong enough in the cloud to produce a little bit of brush 
discharge; there may also be other mechanisms, once the thing is started, to pro- 
duce a large amount of ionization. But nobody knows exactly how it works. So 
the fundamental origin of lightning is really not thoroughly understood. We know 
it comes from the thunderstorms. (And we know, of course, that thunder comes 
from the lightning — from the thermal energy released by the bolt.) 

At least we can understand, in part, the origin of atmospheric electricity. Due 
to the air currents, ions, and water drops on ice particles in a thunderstorm, positive 
and negative charges are separated. The positive charges are carried upward to 
the top of the cloud (see Fig. 9-11), and the negative charges are dumped into the 
ground in lightning strokes. The positive charges leave the top of the cloud, enter 
the high-altitude layers of more highly conducting air, and spread throughout the 
earth. In regions of clear weather, the positive charges in this layer are slowly 
conducted to the earth by the ions in the air — ions formed by cosmic rays, by the 
sea, and by man’s activities. The atmosphere is a busy electrical machine! 

9-6 Lightning 

The first evidence of what happens in a lightning stroke was obtained in 
photographs taken with a camera held by hand and moved back and forth with 
the shutter open — while pointed toward a place where lightning was expected. 
The first photographs obtained this way showed clearly that lightning strokes are 
usually multiple discharges along the same path. Later, the “Boys” camera, 
which has two lenses mounted 180° apart on a rapidly rotating disc, was developed. 
The image made by each lens moves across the film — the picture is spread out in 
time. If, for instance, the stroke repeats, there will be two images side by side. 
By comparing the images of the two lenses, it is possible to work out the details 
of the time sequence of the flashes. Figure 9-14 shows a photograph taken with a 
“Boys” camera. 

We will now describe the lightning. Again, we don’t understand exactly how 
it works. We will give a qualitative description of what it looks like, but we won’t 
go into any details of why it does what it appears to do. We will describe only the 
ordinary case of the cloud with a negative bottom over flat country. Its potential 
is much more negative than the earth underneath, so negative electrons will be 
accelerated toward the earth. What happens is the following. It all starts with a 
thing called a “step leader,” which is not as bright as the stroke of lightning. On 
the photographs one can see a little bright spot at the beginning that starts from the 
cloud and moves downward very rapidly — at a sixth of the speed of light! It goes 
only about 50 meters and stops. It pauses for about 50 microseconds, and then 
takes another step. It pauses again and then goes another step, and so on. It 
moves in a series of steps toward the ground, along a path like that shown in Fig. 
9-15. In the leader there are negative charges from the cloud; the whole column 
is full of negative charge. Also, the air is becoming ionized by the rapidly moving 
charges that produce the leader, so the air becomes a conductor along the path 
traced out. The moment the leader touches the ground, we have a conducting 
“wire” that runs all the way up to the cloud and is full of negative charge. Now, 
at last, the negative charge of the cloud can simply escape and run out. The 
electrons at the bottom of the leader are the first ones to realize this; they dump 
out, leaving positive charge behind that attracts more negative charge from higher 
up in the leader, which in its turn pours out, etc. So finally all the negative charge 
in a part of the cloud runs out along the column in a rapid and energetic way. 
So the lightning stroke you see runs upwards from the ground, as indicated in Fig. 
9-16. In fact, this main stroke — by far the brightest part — is called the return 
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stroke . It is what produces the very bright light, and the heat, which by causing 
a rapid expansion of the air makes the thunder clap. 

The current in a lightning stroke is about 10,000 amperes at its peak, and it 
carries down about 20 coulombs. 

But we are still not finished. After a time of, perhaps, a few hundredths of a 
second, when the return stroke has disappeared, another leader comes down. 
But this time there are no pauses. It is called a “dark leader” this time, and it 
goes all the way down — from top to bottom in one swoop. It goes full steam on 
exactly the old track, because there is enough debris there to make it the easiest 
route. The new leader is again full of negative charge. The moment it touches the 
ground — zing! — there is a return stroke going straight up along the path. So you 
see the lightning strike again, and again, and again. Sometimes it strikes only 
once or twice, sometimes five or ten times — once as many as 42 times on the same 
track was seen — but always in rapid succession. 

Sometimes things get even more complicated. For instance, after one of its 
pauses the leader may develop a branch by sending out two steps — both toward the 
ground but in somewhat different directions, as shown in Fig. 9-15. What happens 
then depends on whether one branch reaches the ground definitely before the other. 
If that does happen, the bright return stroke (of negative charge dumping into the 
ground) works its way up along the branch that touches the ground, and when it 
reaches and passes the branching point on its way up to the cloud, a bright stroke 
appears to go down the other branch. Why? Because negative charge is dumping 
out and that is what lights up the bolt. This charge begins to move at the top of 
the secondary branch, emptying successive, longer pieces of the branch, so the 
bright lightning bolt appears to work its way down that branch, at the same time 
as it works up toward the cloud. If, however, one of these extra leader branches 
happens to have reached the ground almost simultaneously with the original leader, 
it can sometimes happen that the dark leader of the second stroke will take the 
second branch. Then you will see the first main flash in one place and the second 
flash in another place. It is a variant of the original idea. 

Also, our description is oversimplified for the region very near the ground. 
When the step leader gets to within a hundred meters or so from the ground, there 
is evidence that a discharge rises from the ground to meet it. Presumably, the 
field gets big enough for a brush-type discharge to occur. If, for instance, there is 
a sharp object, like a building with a point at the top, then as the leader comes 
down nearby the fields are so large that a discharge starts from the sharp point 
and reaches up to the leader. The lightning tends to strike such a point. 

It has apparently been known for a long time that high objects are struck by 
lightning. There is a quotation of Artabanis, the advisor to Xerxes, giving his 
master advice on a contemplated attack on the Greeks — during Xerxes’ campaign 
to bring the entire known world under the control of the Persians. Artabanis said, 
“See how God with his lightning always smites the bigger animals and will not 
suffer them to wax insolent, while these of a lesser bulk chafe him not. How like- 
wise his bolts fall ever on the highest houses and tallest trees.” And then he explains 
the reason: “So, plainly, doth he love to bring down everything that exalts itself.” 

Do you think — now that you know a true account of lightning striking tall 
trees — that you have a greater wisdom in advising kings on military matters than 
did Artabanis 2300 years ago? Do not exalt yourself. You could only do it less 
poetically. 



Fig. 9-1 6. The return lightning stroke 
runs back up the path made by the leader. 
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10-1 The dielectric constant 

Here we begin to discuss another of the peculiar properties of matter under 
the influence of the electric field. In an earlier chapter we considered the behavior 
of conductors , in which the charges move freely in response to an electric field to 
such points that there is no field left inside a conductor. Now we will discuss 
insulators, materials which do not conduct electricity. One might at first believe 
that there should be no effect whatsoever. However, using a simple electroscope 
and a parallel-plate capacitor, Faraday discovered that this was not so. His experi- 
ments showed that the capacitance of such a capacitor is increased when an in- 
sulator is put between the plates. If the insulator completely fills the space between 
the plates, the capacitance is increased by a factor k which depends only on the 
nature of the insulating material. Insulating materials are also called dielectrics; 
the factor k is then a property of the dielectric, and is called the dielectric constant. 
The dielectric constant of a vacuum is, of course, unity. 

Our problem now is to explain why there is any electrical effect if the insulators 
are indeed insulators and do not conduct electricity. We begin with the experi- 
mental fact that the capacitance is increased and try to reason out what might 
be going on. Consider a parallel-plate capacitor with some charges on the surfaces 
of the conductors, let us say negative charge on the top plate and positive charge on 
the bottom plate. Suppose that the spacing between the plates is d and the area of 
each plate is A. As we have proved earlier, the capacitance is 



and the charge and voltage on the capacitor are related by 

Q « CV. (10.2) 

Now the experimental fact is that if we put a piece of insulating material like 
lucite or glass between the plates, we find that the capacitance is larger. That means, 
of course, that the voltage is lower for the same charge. But the voltage difference 
is the integral of the electric field across the capacitor; so we must conclude that 
inside the capacitor, the electric field is reduced even though the charges on the 
plates remain unchanged. 


10-1 The dielectric constant 

10-2 The polarization vector P 

10-3 Polarization charges 

10-4 The electrostatic equations 
with dielectrics 

10-5 Fields and forces with 
dielectrics 



Fig. 10-1. A parallel-plate capaci- 
tor with a dielectric. The lines of £ are 
shown. 


Now how can that be? We have a law due to Gauss that tells us that the flux 
of the electric field is directly related to the enclosed charge. Consider the gaussian 
surface S shown by broken lines in Fig. 10-1. Since the electric field is reduced 
with the dielectric present, we conclude that the net charge inside the surface must 
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be lower than it would be without the material. There is only one possible conclu- 
sion, and that is that there must be positive charges on the surface of the dielectric. 
Since the field is reduced but is not zero, we would expect this positive charge to 
be smaller than the negative charge on the conductor. So the phenomena can be 
explained if we could understand in some way that when a dielectric material is 
placed in an electric field there is positive charge induced on one surface and nega- 
tive charge induced on the other. 


Fig. 10-2. If we put a conducting 
plate in the gap of a parallel-plate con- 
denser, the induced charges reduce the 
field in the conductor to zero. 
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We would expect that to happen for a conductor. For example, suppose that 
we had a capacitor with a plate spacing d, and we put between the plates a neutral 
conductor whose thickness is b> as in Fig. 10-2. The electric field induces a positive 
charge on the upper surface and a negative charge on the lower surface, so there is 
no field inside the conductor. The field in the rest of the space is the same as it 
was without the conductor, because it is the surface density of charge divided by 
e 0 ; but the distance over which we have to integrate to get the voltage (the potential 
difference) is reduced. The voltage is 

V = f (d - b). 

€0 


The resulting equation for the capacitance is like Eq. (10.1), with ( d — b ) sub- 
stituted for d: 


_ e 0 A 
d[ 1 - (b/d)] 


(10.3) 



Fig. 10-3. A model of a dielectric: 
small conducting spheres embedded in 
an idealized insulator. 


The capacitance is increased by a factor which depends upon (b/d), the proportion 
of the volume which is occupied by the conductor. 

This gives us an obvious model for what happens with dielectrics — that inside 
the material there are many little sheets of conducting material. The trouble with 
such a model is that it has a specific axis, the normal to the sheets, whereas most 
dielectrics have no such axis. However, this difficulty can be eliminated if we 
assume that all insulating materials contain small conducting spheres separated 
from each other by insulation, as shown in Fig. 10-3. The phenomenon of the 
dielectric constant is explained by the effect of the charges which would be induced 
on each sphere. This is one of the earliest physical models of dielectrics used to 
explain the phenomenon that Faraday observed. More specifically, it was assumed 
that each of the atoms of a material was a perfect conductor, but insulated from 
the others. The dielectric constant k would depend on the proportion of space 
which was occupied by the conducting spheres. This is not, however, the model 
that is used today. 


10-2 The polarization vector P 

If we follow the above analysis further, we discover that the idea of regions 
of perfect conductivity and insulation is not essential. Each of the small spheres 
acts like a dipole, the moment of which is induced by the external field. The only 
thing that is essential to the understanding of dielectrics is that there are many 
little dipoles induced in the material. Whether the dipoles are induced because 
there are tiny conducting spheres or for any other reason is irrelevant. 
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Why should a field induce a dipole moment in an atom if the atom is not a 
conducting sphere? This subject will be discussed in much greater detail in the 
next chapter, which will be about the inner workings of dielectric materials. 
However, we give here one example to illustrate a possible mechanism. An atom 
has a positive charge on the nucleus, which is surrounded by negative electrons. 
In an electric field, the nucleus will be attracted in one direction and the electrons in 
the other. The orbits or wave patterns of the electrons (or whatever picture is 
used in quantum mechanics) will be distorted to some extent, as shown in Fig. 10-4; 
the center of gravity of the negative charge will be displaced and will no longer 
coincide with the positive charge of the nucleus. We have already discussed such 
distributions of charge. If we look from a distance, such a neutral configuration 
is equivalent, to a first approximation, to a little dipole. 

It seems reasonable that if the field is not too enormous, the amount of induced 
dipole moment will be proportional to the field. That is, a small field will displace 
the charges a little bit and a larger field will displace them further — and in propor- 
tion to the field — unless the displacement gets too large. For the remainder of this 
chapter, it will be supposed that the dipole moment is exactly proportional to the 
field. 

We will now assume that in each atom there are charges q separated by a 
distance 5, so that qh is the dipole moment per atom. (We use h because we are 
already using d for the plate separation.) If there are N atoms per unit volume, 
there will be a dipole moment per unit volume equal to Nqh. This dipole moment 
per unit volume will be represented by a vector, P. Needless to say, it is in the 
direction of the individual dipole moments, i.e., in the direction of the charge 
separation 5: 

P = Nqh. (10.4) 

In general, P will vary from place to place in the dielectric. However, at any 
point in the material, P is proportional to the electric field E. The constant of 
proportionality, which depends on the ease with which the electron are displaced, 
will depend on the kinds of atoms in the material. 

What actually determines how this constant of proportionality behaves, how 
accurately it is constant for very large fields, and what is going on inside different 
materials, we will discuss at a later time. For the present, we will simply suppose 
that there exists a mechanism by which a dipole moment is induced which is 
proportional to the electric field. 

10-3 Polarization charges 

Now let us see what this model gives for the theory of a condenser with a di- 
electric. First consider a sheet of material in which there is a certain dipole moment 
per unit volume. Will there be on the average any charge density produced by this? 
Not if P is uniform. If the positive and negative charges being displaced relative 
to each other have the same average density, the fact that they are displaced does 
not produce any net charge inside the volume. On the other hand, if P were larger 
at one place and smaller at another, that would mean that more charge would be 
moved into some region than away from it; we would then expect to get a volume 
density of charge. For the parallel-plate condenser, we suppose that P is uniform, 
so we need to look only at what happens at the surfaces. At one surface the nega- 
tive charges, the electrons, have effectively moved out a distance 8; at the other 
surface they have moved in, leaving some positive charge effectively out a distance 
8 . As shown in Fig. 10-5, we will have a surface density of charge, which will be 
called the surface polarization charge . 


Fig. 10-5. A dielectric slab in a 
uniform field. The positive charges dis- 
placed the distance 5 with respect to 
the negatives. 
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Fig. 10-4. An atom in an electric 
field has its distribution of electrons dis- 
placed with respect to the nucleus. 



This charge can be calculated as follows. If A is the area of the plate, the 
number of electrons that appear at the surface is the product of A and N, the 
number per unit volume, and the displacement 5, which we assume here is per- 
pendicular to the surface. The total charge is obtained by multiplying by the 
electronic charge q e . To get the surface density of the polarization charge induced 
on the surface, we divide by A . The magnitude of the surface charge density is 

<r P oi ~ Nq e 5. 

But this is just equal to the magnitude P of the polarization vector P, Eq. (10.4): 

& poi = p. (10.5) 


The surface density of charge is equal to the polarization inside the material. The 
surface charge is, of course, positive on one surface and negative on the other. 

Now let us assume that our slab is the dielectric of a parallel-plate capacitor. 
The plates of the capacitor also have a surface charge, which we will call Cf Tee9 
because they can move “freely” anywhere on the conductor. This is, of course, 
the charge that we put on when we charged the capacitor. It should be emphasized 
that (Tpoi exists only because of af ree . If Ofree is removed by discharging the capacitor, 
then (Tpoi will disappear, not by going out on the discharging wire, but by moving 
back into the material — by the relaxation of the polarization inside the material. 

We can now apply Gauss’ law to the gaussian surface S in Fig. 10-1. The 
electric field E in the dielectric is equal to the total surface charge density divided 
by e 0 . It is clear that (Tpoi and <r free have opposite signs, so 

£ = °' free ~ fo 01 • (10.6) 

Note that the field E 0 between the metal plate and the surface of the dielectric 
is higher than the field E; it corresponds to <Tf ree alone. But here we are concerned 
with the field inside the dielectric which, if the dielectric nearly fills the gap, is the 
field over nearly the whole volume. Using Eq. (10.5), we can write 


E = 


flfree P 

Co 


(10.7) 


This equation doesn’t tell us what the electric field is unless we know what P is. 
Here, however, we are assuming that P depends on E — in fact, that it is proportional 
to E. This proportionality is usually written as 


P = xe 0 E. 


( 10 . 8 ) 


The constant x (Greek “khi”) is called the electric susceptibility of the dielectric. 
Then Eq. (10.7) becomes 


Afree 1 

€o (1 + X) 5 


(10.9) 


which gives us the factor 1/(1 + x) by which the field is reduced. 

The voltage between the plates is the integral of the electric field. Since the 
field is uniform, the integral is just the product of E and the plate separation d. 
We have that 


V = 


Ed = 


free^ . 

€q(1 + X) 


The total charge on the capacitor is <Tf ree A, so that the capacitance defined 
by (10.2) becomes 


^ _ €(^4(1 + x) _ K€qA 
d ' d 


( 10 . 10 ) 


We have explained the observed facts. When a parallel-plate capacitor is 
filled with a dielectric, the capacitance is increased by the factor 


1(M 


K = 1 + X, 


( 10 . 11 ) 



which is a property of the material. Our explanation, of course, is not complete 
until we have explained — as we will do later — how the atomic polarization comes 
about. 

Let’s now consider something a little bit more complicated — the situation in 
which the polarization P is not everywhere the same. As mentioned earlier, if the 
polarization is not constant, we would expect in general to find a charge density 
in the volume, because more charge might come into one side of a small volume 
element than leaves it on the other. How can we find out how much charge is gained 
or lost from a small volume? 

First let’s compute how much charge moves across any imaginary surface 
when the material is polarized. The amount of charge that goes across a surface 
is just P times the surface area if the polarization is normal to the surface. 
Of course, if the polarization is tangential to the surface, no charge moves 
across it. 

Following the same arguments we have already used, it is easy to see that the 
charge moved across any surface element is proportional to the component of P 
perpendicular to the surface. Compare Fig. 10-6 with Fig. 10-5. We see that 
Eq. (10.5) should, in the general case, be written 

<Tpoi = />•». (10.12) 


If we are thinking of an imagined surface element inside the dielectric, Eq. 
(10.12) gives the charge moved across the surface but doesn’t result in a net 
surface charge, because there are equal and opposite contributions from the di- 
electric on the two sides of the surface. 

The displacements of the charges can, however, result in a volume charge 
density. The total charge displaced out of any volume V by the polarization is the 
integral of the outward normal component of P over the surface S that bounds the 
volume (see Fig. 10-7). An equal excess charge of the opposite sign is left behind. 
Denoting the net charge inside V by A£?poi we write 

A0poi =» -f P-nda . (10.13) 

J s 

We can attribute AQpoi to a volume distribution of charge with the density p^i, 
and so 

A0poi = f v Ppol dV. (10.14) 

Combining the two equations yields 

j Ppol dV = ~f P-" da. (10.15) 


We have a kind of Gauss’ theorem that relates the charge density from polarized 
materials to the polarization vector P . We can see that it agrees with the result 
we got for the surface polarization charge or the dielectric in a parallel-plate capaci- 
tor. Using Eq. (10.15) with the gaussian surface of Fig. 10-1, the surface integral 
gives P A A, and the charge inside is 0^1 A A, so we get again that c r = P. 

Just as we did for Gauss’ law of electrostatics, we can convert Eq. (10.15) to 
a differential form — using Gauss’ mathematical theorem: 


We get 



- n da 



VPdV. 


Ppol = -V P. 


(10.16) 


If there is a nonuniform polarization, its divergence gives the net density of charge 
appearing in the material. We emphasize that this is a perfectly real charge density ; 
we call it “polarization charge” only to remind ourselves how it got there. 



Fig. 1 0-6. The charge moved across 
an element of an imaginary surface in a 
dielectric is proportional to the com- 
ponent of P normal to the surface. 



Fig. 10-7. A nonuniform polariza- 
tion P can result in a net charge in the 
body of a dielectric. 
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10-4 The electrostatic equations with dielectrics 

Now let’s combine the above result with our theory of electrostatics. The 
fundamental equation is 

V ■ E = — ■ (10.17) 

The p here is the density of all electric charges. Since it is not easy to keep track of 
the polarization charges, it is convenient to separate p into two parts. Again we 
call ppoi the charges due to nonuniform polarizations, and call pf ree all the rest. 
Usually pf r ee is the charge we put on conductors, or at known places in space. 
Equation (10.17) then becomes 


or 


V E 


Pfree ~H Ppoi Pfree ^ ^ 

€q «0 



Pfree . 
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Of course, the equation for the curl of E is unchanged : 

V X E = 0. 


Taking P from Eq. (10.8), we get the simpler equation 
V • [(1 + X)£] = V • ( kE ) = ^ • 


(10.18) 


(10.19) 

( 10 . 20 ) 


These are the equations of electrostatics when there are dielectrics. They don’t, 
of course, say anything new, but they are in a form which is more convenient for 
computation in cases where Pf ree is known and the polarization P is proportional 
to E . 

Notice that we have not taken the dielectric “constant,” k } out of the diver- 
gence. That is because it may not be the same everywhere. If it has everywhere the 
same value, it can be factored out and the equations are just those of electrostatics 
with the charge density pf ree divided by k. In the form we have given, the equations 
apply to the general case where different dielectrics may be in different places in 
the field. Then the equations may be quite difficult to solve. 

There is a matter of some historical importance which should be mentioned 
here. In the early days of electricity, the atomic mechanism of polarization was 
not known and the existence of Ppoi was not appreciated. The charge pf ree was 
considered to be the entire charge density. In order to write Maxwell’s equations 
in a simple form, a new vector D was defined to be equal to a linear combination 
of E and P: 

D = e 0 E + P. (10.21) 


As a result, Eqs. (10.18) and (10.19) were written in an apparently very simple form: 

V * D = Pfree, V X E = 0. (10.22) 


Can one solve these? Only if a third equation is given for the relationship between 
D and E. When Eq. (10.8) holds, this relationship is 

D = € 0 (1 + X)E = k€qE. (10.23) 

This equation was usually written 

D = eE, (10.24) 

where € is still another constant for describing the dielectric property of materials. 
It is called the “permittivity.” (Now you see why we have € 0 in our equations, it is 
the “permittivity of empty space.”) Evidently, 
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€ = K€ 0 — (1 + x)e 0 . 


(10.25) 



Today we look upon these matters from another point of view, namely, that 
we have simpler equations in a vacuum, and if we exhibit in every case all the 
charges, whatever their origin, the equations are always correct. If we separate 
some of the charges away for convenience, or because we do not want to discuss 
what is going on in detail, then we can, if we wish, write our equations in any other 
form that may be convenient. 

One more point should be emphasized. An equation like D = eE is an attempt 
to describe a property of matter. But matter is extremely complicated, and such 
an equation is in fact not correct. For instance, if E gets too large, then D is no 
longer proportional to E. For some substances, the proportionality breaks down 
even with relatively small fields. Also, the “constant” of proportionality may de- 
pend on how fast E changes with time. Therefore this kind of equation is a kind 
of approximation, like Hooke’s law. It cannot be a deep and fundamental equation. 
On the other hand, our fundamental equations for E, (10.17) and (10.19), represent 
our deepest and most complete understanding of electrostatics. 


10-5 Fields and forces with dielectrics 

We will now prove some rather general theorems for electrostatics in situations 
where dielectrics are present. We have seen that the capacitance of a parallel-plate 
capacitor is increased by a definite factor if it is filled with a dielectric. We can 
show that this is true for a capacitor of any shape, provided the entire region in 
the neighborhood of the two conductors is filled with a uniform linear dielectric. 

Without the dielectric, the equations to be solved are 

V • E 0 = ^2£ and V X E 0 = 0. 

Co 

With the dielectric present, the first of these equations is modified; we have instead 
the equations 

V • 0 <E ) = and v X E = 0. (10.26) 

€0 

Now since we are taking k to be everywhere the same, the last two equations can 
be written as 

V • ( kE ) = £5=2 and V X (icE) = 0. (10.27) 

«o 

We therefore have the same equations for kE as for E 0i so they have the solu- 
tion kE = Fo. In other words, the field is everywhere smaller, by the factor 1 /k, 
than in the case without the dielectric. Since the voltage difference is a line integral 
of the field, the voltage is reduced by this same factor. Since the charge on the 
electrodes of the capacitor has been taken the same in both cases, Eq. (10.2) tells 
us that the capacitance, in the case of an everywhere uniform dielectric, is in- 
creased by the factor k. 

Let us now ask what the force would be between two charged conductors in a 
dielectric. We consider a liquid dielectric that is homogeneous everywhere. We 
have seen earlier that one way to obtain the force is to differentiate the energy with 
respect to the appropriate distance. If the conductors have equal and opposite 
charges, the energy V = Q 2 /2C, where C is their capacitance. Using the principle 
of virtual work, any component is given by a differentiation; for example, 



Since the dielectric increases the capacity by a factor #c, all forces will be reduced 
by this same factor. 

One point should be emphasized. What we have said is true only if the di- 
electric is a liquid. Any motion of conductors that are embedded in solid dielectric 
changes the mechanical stress conditions of the dielectric and alters its electrical 
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properties, as well as causing some mechanical energy change in the dielectric. 
Moving the conductors in a liquid does not change the liquid. The liquid moves 
to a new place but its electrical characteristics are not changed. 

Many older books on electricity start with the “fundamental” law that the 
force between two charges is 


f _ gig2 } 
AirtQKr 2 


(10.29) 



Fig. 10-8. A dielectric object in a 
nonuniform field feels a force toward 
regions of higher field strength. 


a point of view which is thoroughly unsatisfactory. For one thing, it is not true 
in general; it is true only for a world filled with a liquid. Secondly, it depends on 
the fact that k is a constant, which is only approximately true for most real materials. 
It is much better to start with Coulomb’s law for charges in a vacuum , which is 
always right (for stationary charges). 

What does happen in a solid? This is a very difficult problem which has not 
been solved, because it is, in a sense, indeterminate. If you put charges inside a 
dielectric solid, there are many kinds of pressures and ^trains. You cannot deal 
with virtual work without including also the mechanical energy required to com- 
press the solid, and it is a difficult matter, generally speaking, to make a unique 
distinction between the electrical forces and the mechanical forces due to the solid 
material itself. Fortunately, no one ever really needs to know the answer to the 
question proposed. He may sometimes want to know how much strain there is 
going to be in a solid, and that can be worked out. But it is much more complicated 
than the simple result we got for liquids. 

A surprisingly complicated problem in the theory of dielectrics is the follow- 
ing: Why does a charged object pick up little pieces of dielectric? If you comb your 
hair on a dry day, the comb readily picks up small scraps of paper. If you thought 
casually about it, you probably assumed the comb had one charge on it and the 
paper had the opposite charge on it. But the paper is initially electrically neutral. 
It hasn’t any net charge, but it is attracted anyway. It is true that sometimes the 
paper will come up to the comb and then fly away, repelled immediately after it 
touches the comb. The reason is, of course, that when the paper touches the comb, 
it picks up some negative charges and then the like charges repel. But that doesn’t 
answer the original question. Why did the paper come toward the comb in the 
first place? 

The answer has to do with the polarization of a dielectric when it is placed in 
an electric field. There are polarization charges of both signs, which are attracted 
and repelled by the comb. There is a net attraction, however, because the field 
nearer the comb is stronger than the field farther away — the comb is not an infinite 
sheet. Its charge is localized. A neutral piece of paper will not be attracted to 
either plate inside the parallel plates of a capacitor. The variation of the field is 
an essential part of the attraction mechanism. 

As illustrated in Fig. 10-8, a dielectric is always drawn from a region of weak 
field toward a region of stronger field. In fact, one can prove that for small objects 
the force is proportional to the gradient of the square of the electric field. Why 
does it depend on the square of the field? Because the induced polarization charges 
are proportional to the fields, and for given charges the forces are proportional to 
the field. However, as we have just indicated, there will be a net force only if the 
square of the field is changing from point to point. So the force is proportional to 
the gradient of the square of the field. The constant of proportionality involves, 
among other things, the dielectric constant of the object, and it also depends upon 
the size and shape of the object. 

There is a related problem in which the force on a dielectric can be worked out 
quite accurately. If we have a parallel-plate capacitor with a dielectric slab only 
partially inserted, as shown in Fig. 10-9, there will be a force driving the sheet in. 
A detailed examination of the force is quite complicated; it is related to nonuni- 
formities in the field near the edges of the dielectric and the plates. However, if 
we do not look at the details, but merely use the principle of conservation of energy, 
we can easily calculate the force. We can find the force from the formula we de- 
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Fig. 10-9. The force on a dielectric 
sheet in a parallel-plate capacitor can be 
computed by applying the principle of 
energy conservation. 


rived earlier. Equation (10.28) is equivalent to 


du = dC 

dx 2 dx 


(10.30) 


We need only find out how the capacitance varies with the position of the dielectric 
slab. 

Let’s suppose that the total length of the plates is L, that the width of the plates 
is W, that the plate separation and dielectric thickness are d, and that the distance 
to which the dielectric has been inserted is x . The capacitance is the ratio of the 
total free charge on the plates to the voltage between the plates. We have seen 
above that for a given voltage V the surface charge density of free charge is K€ 0 V/d . 
So the total charge on the plates is 

Q = ^xW+^f(L- x)W, 

from which we get the capacitance: 

c = ^ (kx + L — x). (10.31) 

Using (10.30), we have 

= t i ? (K_I) - (la32) 


Now this equation is not particularly useful for anything unless you happen to 
need to know the force in such circumstances. We only wished to show that the 
theory of energy can often be used to avoid enormous complications in determining 
the forces on dielectric materials — as there would be in the present case. 

Our discussion of the theory of dielectrics has dealt only with electrical phe- 
nomena, accepting the fact that the material has a polarization which is proportional 
to the electric field. Why there is such a proportionality is perhaps of greater interest 
to physics. Once we understand the origin of the dielectric constants from an atomic 
point of view, we can use electrical measurements of the dielectric constants in 
varying circumstances to obtain detailed information about atomic or molecular 
structure. This aspect will be treated in part in the next chapter. 
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11 


Inside Ilieleetries 


11-1 Molecular dipoles 

In this chapter we are going to discuss why it is that materials are dielectric. 
We said in the last chapter that we could understand the properties of electrical 
systems with dielectrics once we appreciated that when an electric field is applied 
to a dielectric it induces a dipole moment in the atoms. Specifically, if the electric 
field E induces an average dipole moment per unit volume P, then k, the dielectric 
constant, is given by 



We have already discussed how this equation is applied; now we have to dis- 
cuss the mechanism by which polarization arises when there is an electric field 
inside a material. We begin with the simplest possible example — the polarization 
of gases. But even gases already have complications: there are two types. The 
molecules of some gases, like oxygen, which has a symmetric pair of atoms in each 
molecule, have no inherent dipole moment. But the molecules of others, like water 
vapor (which has a nonsymmetric arrangement of hydrogen and oxygen atoms) 
carry a permanent electric dipole moment. As we pointed out in Chapters 6 and 7, 
there is in the water vapor molecule an average plus charge on the hydrogen 
atoms and a negative charge on the oxygen. Since the center of gravity of the nega- 
tive charge and the center of gravity of the positive charge do not coincide, the 
total charge distribution of the molecule has a dipole moment. Such a molecule is 
called a polar molecule. In oxygen, because of the symmetry of the molecule, the 
centers of gravity of the positive and negative charges are the same, so it is a 
nonpolar molecule. It does, however, become a dipole when placed in an electric 
field. The forms of the two types of molecules are sketched in Fig. 11-1. 


11-2 Electronic polarization 

We will first discuss the polarization of nonpolar molecules. We can start with 
the simplest case of a monatomic gas (for instance, helium). When an atom of 
such a gas is in an electric field, the electrons are pulled one way by the field while 
the nucleus is pulled the other way, as shown in Fig. 10-4. Although the atoms are 
very stiff with respect to the electrical forces we can apply experimentally, there is a 
slight net displacement of the centers of charge, and a dipole moment is induced. 
For small fields, the amount of displacement, and so also the dipole moment, is 
proportional to the electric field. The displacement of the electron distribution 
which produces this kind of induced dipole moment is called electronic polarization . 

We have already discussed the influence of an electric field on an atom in 
Chapter 31 of Vol. I, when we were dealing with the theory of the index of refrac- 
tion. If you think about it for a moment, you will see that what we must do now is 
exactly the same as we did then. But now we need worry only about fields that do 
not vary with time, while the index of refraction depended on time-varying fields. 

In Chapter 31 of Vol. I we supposed that when an atom is placed in an oscilla- 
ting electric field the center of charge of the electrons obeys the equation 

m ^ + mcolx = q e E . (11.2) 
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Fig. 11-1, (a) An oxygen molecule 

with zero dipole moment, (b) The water 
molecule has a permanent dipole moment 
Po- 
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The first term is the electron mass times its acceleration and the second is a restoring 
force, while the right-hand side is the force from the outside electric field. If the 
electric field varies with the frequency a>, Eq. (11.2) has the solution 


m(co 1 2 — co 2 ) 


(11.3) 


which has a resonance at co = co 0 . When we previously found this solution, we 
interpreted it as saying that co 0 was the frequency at which light (in the optical 
region or in the ultraviolet, depending on the atom) was absorbed. For our 
purposes, however, we are interested only in the case of constant fields, i.e., for 
co = 0, so we can disregard the acceleration term in (11.2), and we find that the 
displacement is 

* = (ii- 4 ) 


From this we see that the dipole moment p of a single atom is 


p = q e x = 


( 11 . 5 ) 


In this theory the dipole moment p is indeed proportional to the electric field. 
People usually write 

p = ae 0 E. (11 .6) 

(Again the e 0 is put in for historical reasons.) The constant a is called the polariz- 
ability of the atom, and has the dimensions L 3 . It is a measure of how easy it is to 
induce a moment in an atom with an electric field. Comparing (11.5) and (11.6), 
our simple theory says that 


e 0 mco£ mtog 


(11.7) 


If there are N atoms in a unit volume, the polarization P — the dipole moment 
per unit volume — is given by 

P = Np = Na^E. (11.8) 

Putting (11.1) and (11.8) together, we get 


/c — 1 = = Not. 

€qE 


or, using (11.7), 


k - 1 = 


Air Ne 2 


(11.9) 


( 11 . 10 ) 


From Eq. (11.9) we would predict that the dielectric constant k of different 
gases should depend on the density of the gas and on the frequency w 0 of its optical 
absorption. 

Our formula is, of course, only a very rough approximation, because in Eq. 
(11.2) we have taken a model which ignores the complications of quantum me- 
chanics. For example, we have assumed that an atom has only one resonant 
frequency, when it really has many. To calculate properly the polarizability a of 
atoms we must use the complete quantum-mechanical theory, but the classical 
ideas above give us a reasonable estimate. 

Let’s see if we can get the right order of magnitude for the dielectric constant 
of some substance. Suppose we try hydrogen. We have once estimated (Chapter 
38, Vol. I) that the energy needed to ionize the hydrogen atom should be approxi- 


mately 


1 me 4 

2 ~W 


( 11 . 11 ) 
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For an estimate of the natural frequency o>o, we can set this energy equal to hco Q - 
the energy of an atomic oscillator whose natural frequency is co 0 . We get 

1 me 4 

U ° ~ 2 ~W ' 


If we now use this value of o> 0 in Eq. (1 1.7), we find for the electronic polarizability 


a 



( 11 . 12 ) 


The quantity (h 2 /me 2 ) is the radius of the ground-state orbit of a Bohr atom (see 
Chapter 38, Vol. I) and equals 0.528 angstroms. In a gas at standard pressure and 
temperature (1 atmosphere, 0°C) there are 2.69 X 10 19 atoms/cm 3 , so Eq. (11.9) 
gives us 

k = 1 + (2.69 X 10 19 )16tt (0.528 X 10 ” 8 ) 3 = 1.00020. (11.13) 


The dielectric constant for hydrogen gas is measured to be 


Kexp = 1.00026. 


We see that our theory is about right. We should not expect any better, because 
the measurements were, of course, made with normal hydrogen gas, which has 
diatomic molecules, not single atoms. We should not be surprised if the polariza- 
tion of the atoms in a molecule is not quite the same as that of the separate atoms. 
The molecular effect, however, is not really that large. An exact quantum- 
mechanical calculation of a for hydrogen atoms gives a result about 12% higher 
than (11.12) (the 1 671- is changed to 187 t), and therefore predicts a dielectric constant 
somewhat closer to the observed one. In any case, it is clear that our model of a 
dielectric is fairly good. 

Another check on our theory is to try Eq. (11.12) on atoms which have a 
higher frequency of excitation. For instance, it takes about 24.5 volts to pull the 
electron off helium, compared with the 13.5 volts required to ionize hydrogen. 
We would, therefore, expect that the absorption frequency c*j 0 for helium would be 
about twice as big as for hydrogen and that a would be one-quarter as large. We 
expect that 

Khehum « 1.000050. 

Experimentally, 


Xh eh urn = 1 . 000068 , 


so you see that our rough estimates are coming out on the right track. So we have 
understood the dielectric constant of nonpolar gas, but only qualitatively, because 
we have not yet used a correct atomic theory of the motions of the atomic electrons. 


11-3 Polar molecules; orientation polarization 

Next we will consider a molecule which carries a permanent dipole moment 
p 0 — such as a water molecule. With no electric field, the individual dipoles point 
in random directions, so the net moment per unit volume is zero. But when an 
electric field is applied, two things happen: First, there is an extra dipole moment 
induced because of the forces on the electrons; this part gives just the same kind of 
electronic polarizability we found for a nonpolar molecule. For very accurate 
work, this effect should, of course, be included, but we will neglect it for the 
moment. (It can always be added in at the end.) Second, the electric field tends to 
line up the individual dipoles to produce a net moment per unit volume. If all the 
dipoles in a gas were to line up, there would be a very large polarization, but that 
does not happen. At ordinary temperatures and electric fields the collisions of the 
molecules in their thermal motion keep them from lining up very much. But there 
is some net alignment, and so some polarization (see Fig. 11-2). The polarization 
that does occur can be computed by the methods of statistical mechanics we 
described in Chapter 40 of Vol. I. 
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Fig. 1 1-2. (a) In a gas of polar 

molecules, the individual moments are 
oriented at random; the average moment 
in a small volume is zero, (b) When there 
is an electric field, there is some average 
alignment of the molecules. 
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To use this method we need to know the energy of a dipole in an electric field. 
Consider a dipole of moment p 0 in an electric field, as shown in Fig. 11-3. The 
energy of the positive charge is q<l>( 1), and the energy of the negative charge is 
—q<\>( 2). Thus the energy of the dipole is 


+q 


(i) 


E 


[ 2 ) 


Fig. 11-3. The energy of a dipole 
p 0 in the field E is — p 0 * E. 


U = - q<f>(2) = qd ■ V<f>, 

or 

U~ —po * E = — p 0 Ecosd, (11-14) 

where 8 is the angle between p 0 and E. As we would expect, the energy is lower 
when the dipoles are lined up with the field. 

We now find out how much lining up occurs by using the methods of statis- 
tical mechanics. We found in Chapter 40 of Vol. I that in a state of thermal equili- 
brium, the relative number of molecules with the potential energy U is proportional 
to 

e - ulkT , (11.15) 


where U(x, y, z ) is the potential energy as a function of position. The same argu- 
ments would say that using Eq. (11.14) for the potential energy as a function of 
angle , the number of molecules at 8 per unit solid angle is proportional to e~ ulkT . 

Letting n(8) be the number of molecules per unit solid angle at 6, we have 

n(8) = noe+*o Eco& 6t kT . (11.16) 

For normal temperatures and fields, the exponent is small, so we can approximate 
by expanding the exponential: 

n(0) = n 0 (l + E°*Spiy (11.17) 

We can find n 0 if we integrate (11.17) over all angles; the result should be just 
N, the total number of molecules per unit volume. The average value of cos 6 over 
all angles is zero, so the integral is just n 0 times the total solid angle 47 r. We get 

(11.18) 

We see from (11.17) that there will be more molecules oriented along the field 
(cos 6 = 1) than against the field (cos 6 = —1). So in any small volume contain- 
ing many molecules there will be a net dipole moment per unit volume — that is, 
a polarization P. To calculate P, we want the vector sum of all the molecular 
moments in a unit volume. Since we know that the result is going to be in the 
direction of E , we will just sum the components in that direction (the components 
at right angles to E will sum to zero): 

p = ^ p° cos *»• 

unit 

volume 

We can evaluate the sum by integrating over the angular distribution. The 
solid angle at 8 is 2i r sin 8 d8 , so 

P = / n(8)p 0 cos 82tt sin 8 dd. (11.19) 

Jo 

Substituting for n(8) from (11.17), we have 

P = — y J cos 0^ po cos 8 d( cos 8 ), 

which is easily integrated to give 

P = ^M- (11-20) 
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The polarization is proportional to the field E, so there will be normal dielectric 
behavior. Also, as we expect, the polarization depends inversely on the tempera- 
ture, because at higher temperatures there is more disalignment by collisions. This 
\fT dependence is called Curie’s law. The permanent moment po appears squared 
for the following reason: In a given electric field, the aligning force depends upon 
Po, and the mean moment that is produced by the lining up is again proportional 
to po . The average induced moment is proportional to 

We should now try to see how well Eq. (1 1.20) agrees with experiment. Let’s 
look at the case of steam. Since we don’t know what p 0 is, we cannot compute P 
directly, but Eq. (1 1.20) does predict that k — 1 should vary inversely as the tem- 
perature, and this we should check. 

From (11.20) we get 


= _P_ = Npl 
€q E 3€ 0 kF s 


( 11 . 21 ) 


so k — 1 should vary in direct proportion to the density N, and inversely as the 
absolute temperature. The dielectric constant has been measured at several 
different pressures and temperatures, chosen such that the number of molecules in 
a unit volume remained fixed.* [Notice that if the measurements had all been 
taken at constant pressure, the number of molecules per unit volume would 
decrease linearly with increasing temperature and k — 1 would vary as T~ 2 
instead of as F -1 .] In Fig. 11-4 we plot the experimental observations for k — l 
as a function of 1 /T. The dependence predicted by (11.21) is followed quite well. 

There is another characteristic of the dielectric constant of polar molecules — 
its variation with the frequency of the applied field. Due to the moment of inertia 
of the molecules, it takes a certain amount of time for the heavy molecules to turn 
toward the direction of the field. So if we apply frequencies in the high microwave 
region or above, the polar contribution to the dielectric constant begins to fall 
away because the molecules cannot follow. In contrast to this, the electronic 
polarizability still remains the same up to optical frequencies, because of the 
smaller inertia in the electrons. 



Fig. 11-4. Experimental measure- 
ments of the dielectric constant of water 
vapor at various temperatures. 


11-4 Electric fields in cavities of a dielectric 

We now turn to an interesting but complicated question — the problem of the 
dielectric constant in dense materials. Suppose that we take liquid helium or 
liquid argon or some other nonpolar material. We still expect electronic polari- 
zation. But in a dense material, P can be large, so the field on an individual atom 
will be influenced by the polarization of the atoms in its close neighborhood. The 
question is, what electric field acts on the individual atom? 

Imagine that the liquid is put between the plates of a condenser. If the plates 
are charged they will produce an electric field in the liquid. But there are also 
charges in the individual atoms, and the total field E is the sum of both of these 
effects. This true electric field varies very, very rapidly from point to point in the 
liquid. It is very high inside the atoms — particularly right next to the nucleus — and 
relatively small between the atoms. The potential difference between the plates is 
the line integral of this total field. If we ignore all the fine-grained variations, we 
can think of an average electric field E , which is just V/d. (This is the field we were 
using in the last chapter.) We should think of this field as the average over a space 
containing many atoms. 

Now you might think that an “average” atom in an “average” location would 
feel this average field. But it is not that simple, as we can show by considering what 
happens if we imagine different-shaped holes in a dielectric. For instance, suppose 
that we cut a slot in a polarized dielectric, with the slot oriented parallel to the 
field, as shown in part (a) of Fig. 1 1-5. Since we know that V X E = 0, the line 
integral of E around the curve, r, which goes as shown in (b) of the figure, should 




Fig. 1 1-5. The field in a slot cut in a 
dielectric depends on the shape and 
orientation of the slot. 


* Sanger, Steiger, and Gachter, Helvetica Physica Acta 5, 200 (1932). 
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Fig. 1 1-7. The electric field of a 
uniformly polarized sphere. 


be zero. The field inside the slot must give a contribution which just cancels the 
part from the field outside. Therefore the field E 0 actually found in the center of 
a long thin slot is equal to E , the average electric field found in the dielectric. 

Now consider another slot whose large sides are perpendicular to E, as shown 
in part (c) of Fig. 1 1-5. In this case, the field E 0 in the slot is not the same as E 
because polarization charges appear on the surfaces. If we apply Gauss’ law to 
a surface 5 drawn as in (d) of the figure, we find that the field E 0 in the slot is 
given by 

E 0 = E + —, (11.22) 

Co 

where E is again the electric field in the dielectric. (The gaussian surface contains 
the surface polarization charge <rpo\ = P.) We mentioned in Chapter 10 that 
e 0 E + P is often called Z>, so e Q E Q = Z) 0 is equal to D in the dielectric. 

Earlier in the history of physics, when it was supposed to be very important 
to define every quantity by direct experiment, people were delighted to discover 
that they could define what they meant by E and D in a dielectric without having 
to crawl around between the atoms. The average field E is numerically equal to 
the field E 0 that would be measured in a slot cut parallel to the field. And the field 
D could be measured by finding E 0 in a slot cut normal to the field. But nobody 
ever measures them that way anyway, so it was just one of those philosophical 
things. 



Fig. 11-6. The field at any point A 
in a dielectric can be considered as the 
sum of the field in a spherical hole plus 
the field due to a spherical plug. 


For most liquids which are not too complicated in structure, we could expect 
that an atom finds itself, on the average, surrounded by the other atoms in what 
would be a good approximation to a spherical hole . And so we should ask: “What 
would be the field in a spherical hole?” We can find out by noticing that if we 
imagine carving out a spherical hole in a uniformly polarized material, we are just 
removing a sphere of polarized material. (We must imagine that the polarization 
is “frozen in” before we cut out the hole.) By superposition, however, the fields 
inside the dielectric, before the sphere was removed, is the sum of the fields from 
all charges outside the spherical volume plus the fields from the charges within the 
polarized sphere. That is, if we call E the field in the uniform dielectric, we can 
write 

E = £hole + £plug> (11.23) 

where Z?h 0 ie is the field in the hole and E p \ u& is the field inside a sphere which is 
uniformly polarized (see Fig. 1 1-6). The fields due to a uniformly polarized sphere 
are shown in Fig. 1 1-7. The electric field inside the sphere is uniform, and its 
value is 

£plu8=- 37T (11.24) 

Using (11.23), we get 

E hoU = E + £- (11.25) 

The field in a spherical cavity is greater than the average field by the amount 
P/3€ 0 * (The spherical hole gives a field 1/3 of the way between a slot parallel to 
the field and a slot perpendicular to the field.) 

11-5 The dielectric constant of liquids; the Clausius-Mossotti equation 

In a liquid we expect that the field which will polarize an individual atom is 
more like £h 0 i e ^ an J ust If we use the E^ 0 \ c of ( 1 1 .25) for the polarizing field in 
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Eq. (11.6), then Eq. (11.8) becomes 



P 1 - {Na/ 3 ) e ° K 

Remembering that k — 1 is just P/t^E, we have 


(11.26) 

(11.27) 


«- 1 = r -(^/3y < 1L28 > 

which gives us the dielectric constant of a liquid in terms of a, the atomic polar- 
izability. This is called the Clausius-Mossotti equation. 

Whenever Na is very small, as it is for a gas (because the density N is small), 
then the term Na/ 3 can be neglected compared with 1, and we get our old result, 
Eq. (11.9), that 

k - 1 = Na. (11.29) 


Let’s compare Eq. (1 1.28) with some experimental results. It is first necessary 
to look at gases for which, using the measurement of k, we can find a from Eq. 
(1 1.29). For instance, for carbon disulfide at zero degrees centigrade the dielectric 
constant is 1.0029, so Na is 0.0029. Now the density of the gas is easily worked out 
and the density of the liquid can be found in handbooks. At 20°C, the density of 
liquid CS 2 is 381 times higher than the density of the gas at 0°C. This means that 
A is 381 times higher in the liquid than it is in the gas so, that — if we make the 
approximation that the basic atomic polarizability of the carbon disulfide doesn’t 
change when it is condensed into a liquid — Na in the liquid is equal to 381 times 
0.0029, or 1.11. Notice that the Na/ 3 term amounts to almost 0.4, so it is quite 
significant. With these numbers we predict a dielectric constant of 2.76, which 
agrees reasonably well with the observed value of 2.64. 

In Table 11-1 we give some experimental data on various materials (taken 
from the Handbook of Chemistry and Physics), together with the dielectric constants 
calculated from Eq. (11.28) in the way just described. The agreement between 
observation and theory is even better for argon and oxygen than for CS 2 — and 
not so good for carbon tetrachloride. On the whole, the results show that Eq. 
(11.28) works very well. 


Table 11-1 

Computation of the dielectric constants of liquids 
from the dielectric constant of the gas. 



Gas 

Liquid 

Substance 

k (exp) 

Na 

Density 

Density 

Ratio* 

Na 

k (predict) 

k (exp) 

CS 2 

1.0029 

0.0029 

0.00339 

1.293 

381 

1.11 

2.76 

2.64 

o 2 

1.000523 

0.000523 

0.00143 

1.19 

832 

0.435 

1.509 

1.507 

CC1 4 

1.0030 

0.0030 

0.00489 

1.59 

325 

0.977 

2.45 

2.24 

A 

1.000545 

0.000545 

0.00178 

1.44 

810 

0.441 

1.517 

1.54 


* Ratio = density of liquid/density of gas. 


Our derivation of Eq. (1 1 .28) is valid only for electronic polarization in liquids. 
It is not right for a polar molecule like H 2 0. If we go through the same calcu- 
lations for water, we get 13.2 for Na, which means that the dielectric constant for 
the liquid is negative , while the observed value of k is 80. The problem has to do 
with the correct treatment of the permanent dipoles, and Onsager has pointed out 
the right way to go. We do not have the time to treat the case now, but if you are 
interested it is discussed in KitteTs book, Introduction to Solid State Physics. 
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11-6 Solid dielectrics 
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Fig. 11-8. A complex crystal lattice 
can have a permanent intrinsic polariza- 
tion P . 
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Fig. 11-9. The unit cell of BaTi0 3 . 
The atoms really fill up most of the space; 
for clarity, only the positions of their 
centers are shown. 


Now we turn to the solids. The first interesting fact about solids is that there 
can be a permanent polarization built in — which exists even without applying an 
electric field. An example occurs with a material like wax, which contains long 
molecules having a permanent dipole moment. If you melt some wax and put a 
strong electric field on it when it is a liquid, so that the dipole moments get partly 
lined up, they will stay that way when the liquid freezes. The solid material will 
have a permanent polarization which remains when the field is removed. Such a 
solid is called an electret . 

An electret has permanent polarization charges on its surface. It is the electrical 
analog of a magnet. It is not as useful, though, because free charges from the air 
are attracted to its surfaces, eventually cancelling the polarization charges. The 
electret is “discharged” and there are no visible external fields, 

A permanent internal polarization P is also found occurring naturally in some 
crystalline substances. In such crystals, each unit cell of the lattice has an identical 
permanent dipole moment, as drawn in Fig. 1 1-8. All the dipoles point in the same 
direction, even with no applied electric field. Many complicated crystals have, in 
fact, such a polarization; we do not normally notice it because the external fields 
are discharged, just as for the electrets. 

If these internal dipole moments of a crystal are changed, however, external 
fields appear because there is not time for stray charges to gather and cancel the 
polarization charges. If the dielectric is in a condenser, free charges will be induced 
on the electrodes. For example, the moments can change when a dielectric is 
heated, because of thermal expansion. The effect is called pyroelectricity . Similarly, 
if we change the stresses in a crystal — for instance, if we bend it — again the mo- 
ment may change a little bit, and a small electrical effect, called piezoelectricity , 
can be detected. 

For crystals that do not have a permanent moment, one can work out a theory 
of the dielectric constant that involves the electronic polarizability of the atoms. 
It goes much the same as for liquids. Some crystals also have rotatable dipoles 
inside, and the rotation of these dipoles will also contribute to k. In ionic crystals 
such as NaCl there is also ionic polarizability . The crystal consists of a checkerboard 
of positive and negative ions, and in an electric field the positive ions are pulled 
one way and the negatives the other ; there is a net relative motion of the plus and 
minus charges, and so a volume polarization. We could estimate the magnitude 
of the ionic polarizability from our knowledge of the stiffness of salt crystals, but 
we will not go into that subject here. 


11-7 Ferroelectricity; BaTi0 3 

We want to describe now one special class of crystals which have, just by 
accident almost, a built-in permanent moment. The situation is so marginal that 
if we increase the temperature a little bit they lose the permanent moment com- 
pletely. On the other hand, if they are nearly cubic crystals, so that their moments 
can be turned in different directions, we can detect a large change in the moment 
when an applied electric field is changed. All the moments flip over and we get a 
large effect. Substances which have this kind of permanent moment are called 
ferroelectric , after the corresponding ferromagnetic effects which were first dis- 
covered in iron. 

We would like to explain how ferroelectricity works by describing a particular 
example of a ferroelectric material. There are several ways in which the ferro- 
electric property can originate; but we will take up only one mysterious case — that 
of barium titanate, BaTi0 3 . This material has a crystal lattice whose basic cell is 
sketched in Fig. 11-9. It turns out that above a certain temperature, specifically 
118°C, barium titanate is an ordinary dielectric with an enormous dielectric con- 
stant. Below this temperature, however, it suddenly takes on a permanent moment. 

In working out the polarization of solid material, we must first find what are 
the local fields in each unit cell. We must include the fields from the polarization 
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itself, just as we did for the case of a liquid. But a crystal is not a homogeneous 
liquid, so we cannot use for the local field what we would get in a spherical hole. 
If you work it out for a crystal, you find that the factor 1/3 in Eq. (1 1.24) becomes 
slightly different, but not far from 1/3. (For a simple cubic crystal, it is just 1/3.) 
We will, therefore, assume for our preliminary discussion that the factor is 1/3 
for BaTi0 3 . 

Now when we wrote Eq. (11.28) you may have wondered what would happen 
if Not became greater than 3. It appears as though k would become negative. But 
that surely cannot be right. Let’s see what should happen if we were gradually to 
increase a in a particular crystal. As a gets larger, the polarization gets bigger, 
making a bigger local field. But a bigger local field will polarize each atom more, 
raising the local fields still more. If the “give” of the atoms is enough, the process 
keeps going; there is a kind of feedback that causes the polarization to increase 
without limit — assuming that the polarization of each atom increases in proportion 
to the field. The “runaway” condition occurs when Na = 3. The polarization 
does not become infinite, of course, because the proportionality between the in- 
duced moment and the electric field breaks down at high fields, so that our formulas 
are no longer correct. What happens is that the lattice gets “locked in” with a high, 
self-generated, internal polarization. 

In the case of BaTi0 3 , there is, in addition to an electronic polarization, also 
a rather large ionic polarization, presumed to be due to titanium ions which can 
move a little within the cubic lattice. The lattice resists large motions, so after the 
titanium has gone a little way, it jams up and stops. But the crystal cell is then left 
with a permanent dipole moment. 

In most crystals, this is really the situation for all temperatures that can be 
reached. The very interesting thing about barium titanate is that there is such a 
delicate condition that if Na is decreased just a little bit it comes unstuck. Since 
N decreases with increasing temperature — because of thermal expansion — we can 
vary Na by varying the temperature. Below the critical temperature it is just 
barely stuck, so it is easy — by applying an external field — to shift the polarization 
and have it lock in a different direction. 

Let’s see if we can analyze what happens in more detail. We call T c the critical 
temperature at which Na is exactly 3. As the temperature increases, N goes down a 
little bit because of the expansion of the lattice. Since the expansion is small, we 
can say that near the critical temperature 


Na = 3 - ft(T - T c ), 


(11.30) 


where ft is a small constant, of the same order of magnitude as the thermal expansion 
coefficient, or about 10 -5 to 10“ 6 per degree C. Now if we substitute this relation 
into Eq. (11.28), we get that 

3 - ft{T - T c ) 
ft(T — T c )/ 3 


Since we have assumed that ft(T — T c ) is small compared with one, we can ap- 
proximate this formula by 


K - l = f 

ft(T - T c ) 


(11.31) 


This relation is right, of course, only for T > T c . We see that just above the 
critical temperature k is enormous. Because Na is so close to 3, there is a tremen- 
dous magnification effect, and the dielectric constant can easily be as high as 50,000 
to 100,000. It is also very sensitive to temperature. For increases in temperature, 
the dielectric constant goes down inversely as the temperature, but, unlike the case 
of a dipolar gas, for which k — 1 goes inversely as the absolute temperature, for 
ferroelectrics it varies inversely as the difference between the absolute temperature 
and the critical temperature (this law is called the Curie-Weiss law). 

When we lower the temperature to the critical temperature, what happens? 
If we imagine a lattice of unit cells like that in Fig. 11-9, we see that it is possible 
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to pick out chains of ions along vertical lines. One of them consists of alternating 
oxygen and titanium ions. There are other lines made up of either barium or 
oxygen ions, but the spacing along these lines is greater. We make a simple model 
to imitate this situation by imagining, as shown in Fig. 1 1 -10(a), a series of chains 
of ions. Along what we call the main chain, the separation of the ions is a, which 
is half the lattice constant; the lateral distance between identical chains is 2a, 
There are less-dense chains in between which we will ignore for the moment. To 
make the analysis a little easier, we will also suppose that all the ions on the main 
chain are identical. (It is not a serious simplification because all the important 
effects will still appear. This is one of the tricks of theoretical physics. One does 
a different problem because it is easier to figure out the first time — then when one 
understands how the thing works, it is time to put in all the complications.) 

Now let’s try to find out what would happen with our model. We suppose that 
the dipole moment of each atom is p and we wish to calculate the field at one of 
the atoms of the chain. We must find the sum of the. fields from all the other atoms. 
We will first calculate the field from the dipoles in only one vertical chain ; we will 
talk about the other chains later. The field at the distance r from a dipole in a 
direction along its axis is given by 


1 2 P 

4x6q z* 3 


(11.32) 


At any given atom, the dipoles at equal distances above and below it give fields in 
the same direction, so for the whole chain we get 


■'chain 


= - — (l + - + — + — + •••') = i 7 - 0,383 . (l l 33) 
4X60 a3 \ + 8 + 27^64 + / e 0 u 3 U } 


It is not too hard to show that if our model were like a completely cubic crystal — 
that is, if the next identical lines were only the distance a away — the number 0.383 
would be changed to 1/3. In other words, if the next lines were at the distance a 
they would contribute only —0.050 unit to our sum. However, the next main 
chain we are considering is at the distance la and, as you remember from Chapter 7, 
the field from a periodic structure dies off exponentially with distance. Therefore 
these lines contribute much less than —0.050 and we can just ignore all the other 
chains. 

It is necessary now to find out what polarizability a is needed to make the 
runaway process work. Suppose that the induced moment p of each atom of the 
chain is proportional to the field on it, as in Eq. (11.6). We get the polarizing field 
on the atom from ^chains using Eq. (11.32). So we have the two equations 


Fig. 11-10. Models of a ferroelec- 
tric: (a) corresponds to an antiferro- and 
electric, and (b) to a normal ferroelectric. 
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There are two solutions: E and p both zero, or 


a 


0.383 5 


with E and p both finite. Thus if a is as large as ge 3 /0.383, a permanent polarization 
sustained by its own field will set in. This critical equality must be reached for 
barium titanate at just the temperature T c . (Notice that if a were larger than the 
critical value for small fields, it would decrease at larger fields and at equilibrium 
the same equality we have found would hold.) 

For BaTi0 3 , the spacing a is 2 X 10 -8 cm, so we must expect that a = 
21.8 X 10“ 24 cm 3 . We can compare this with the known polarizabilities of the 
individual atoms. For oxygen, a = 30.2 X 10 -24 cm 3 ; we’re on the right track! 
But for titanium, a = 2.4 X 10“ 24 cm 3 ; rather small. To use our model we should 
probably take the average. (We could work out the chain again for alternating 
11-10 



atoms, but the result would be about the same.) So a(average) = 16.3 X 10“ 24 , 
which is not high enough to give a permanent polarization. 

But wait a moment! We have so far only added up the electronic polariz- 
abilities. There is also some ionic polarization due to the motion of the titanium 
ion. All we need is an ionic polarizability of 9.2 X 10“ 24 cm 3 . (A more precise 
computation using alternating atoms shows that actually 11.9 X 10" 24 is needed.) 
To understand the properties of BaTi0 3 , we have to assume that such an ionic 
polarizability exists. 

Why the titanium ion in barium titanate should have that much ionic polar- 
izability is not known. Furthermore, why, at a lower temperature, it polarizes along 
the cube diagonal and the face diagonal equally well is not clear. If we figure out 
the actual size of the spheres in Fig. 11-9, and ask whether the titanium is a little 
bit loose in the box formed by its neighboring oxygen atoms — which is what you 
would hope, so that it could be easily shifted — you find quite the contrary. It fits 
very tightly. The barium atoms are slightly loose, but if you let them be the ones 
that move, it doesn’t work out. So you see that the subject is really not one-hundred 
percent clear; there are still mysteries we would like to understand. 

Returning to our simple model of Fig. 1 l-10(a), we see that the field from one 
chain would tend to polarize the neighboring chain in the opposite direction, which 
means that although each chain would be locked, there would be no net permanent 
moment per unit volume! (Although there would be no external electric effects, 
there are still certain thermodynamic effects one could observe.) Such systems exist, 
and are called antiferroelectric. So what we have explained is really an anti- 
ferroelectric. Barium titanate, however, is really like the arrangement in Fig. 
1 1— 10(b). The oxygen-titanium chains are all polarized in the same direction 
because there are intermediate chains of atoms in between. Although the atoms 
in these chains are not very polarizable, or very dense, they will be somewhat 
polarized, in the direction antiparallel to the oxygen-titanium chains. The small 
fields produced at the next oxygen-titanium chain will get it started parallel to the 
first. So BaTi0 3 is really ferroelectric, and it is because of the atoms in between. 
You may be wondering: “But what about the direct effect between the two O-Ti 
chains?” Remember, though, the direct effect dies off exponentially with the 
separation; the effect of the chain of strong dipoles at 2 a can be less than the effect 
of a chain of weak ones at the distance a . 

This completes our rather detailed report on our present understanding of the 
dielectric constants of gases, of liquids, and of solids. 
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Electrostatic Analogs 


12-1 The same equations have the same solutions 

The total amount of information which has been acquired about the physical 
world since the beginning of scientific progress is enormous, and it seems almost 
impossible that any one person could know a reasonable fraction of it. But it is 
actually quite possible for a physicist to retain a broad knowledge of the physical 
world rather than to become a specialist in some narrow area. The reasons for 
this are threefold: First, there are great principles which apply to all the different 
kinds of phenomena — such as the principles of the conservation of energy and of 
angular momentum. A thorough understanding of such principles gives an under- 
standing of a great deal all at once. Second, there is the fact that many compli- 
cated phenomena, such as the behavior of solids under compression, really 
basically depend on electrical and quantum-mechanical forces, so that if one 
understands the fundamental laws of electricity and quantum mechanics, there is 
at least some possibility of understanding many of the phenomena that occur 
in complex situations. Finally, there is a most remarkable coincidence: The 
equations for many different physical situations have exactly the same appearance . 
Of course, the symbols may be different — one letter is substituted for another — 
but the mathematical form of the equations is the same. This means that having 
studied one subject, we immediately have a great deal of direct and precise 
knowledge about the solutions of the equations of another. 

We are now finished with the subject of electrostatics, and will soon go on to 
study magnetism and electrodynamics. But before doing so, we would like to 
show that while learning electrostatics we have simultaneously learned about a 
large number of other subjects. We will find that the equations of electrostatics 
appear in several other places in physics. By a direct translation of the solutions 
(of course the same mathematical equations must have the same solutions) it is 
possible to solve problems in other fields with the same ease — or with the same 
difficulty — as in electrostatics. 

The equations of electrostatics, we know, are 

V(kE) = ^, ( 12 . 1 ) 

V X E = 0. (12.2) 
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(We take the equations of electrostatics with dielectrics so as to have the most 
general situation.) The same physics can be expressed in another mathematical 
form: 


E = — V<t>, 


(12.3) 


V • (« V*) = - £*29 


(12.4) 


Now the point is that there are many physics problems whose mathematical 
equations have the same form. There is a potential (<£) whose gradient multiplied 
by a scalar function (k) has a divergence equal to another scalar function ( — p/e 0 ). 

Whatever we know about electrostatics can immediately be carried over into 
that other subject, and vice versa . (It works both ways, of course — if the other 
subject has some particular characteristics that are known, then we can apply 
that knowledge to the corresponding electrostatic problem.) We want to consider 
a series of examples from different subjects that produce equations of this form. 
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The same equations have the 
same solutions 

The flow of heat; a point 
source near an infinite plane 
boundary 

The stretched membrane 

The diffusion of neutrons; a 
uniform spherical source in a 
homogeneous medium 

Irrotational fluid flow; the 
flow past a sphere 

Illumination; the uniform 
lighting of a plane 

The “underlying unity” of 
nature 



12-2 The flow of heat; a point source near an infinite plane boundary 

We have discussed one example earlier (Section 3-4) — the flow of heat. 
Imagine a block of material, which need not be homogeneous but may consist of 
different materials at different places, in which the temperature varies from point 
to point. As a consequence of these temperature variations there is a flow of heat, 
which can be represented by the vector h . It represents the amount of heat energy 
which flows per unit time through a unit area perpendicular to the flow. The di- 
vergence of h represents the rate per unit volume at which heat is leaving a region: 

V • h = rate of heat out per unit volume. 

(We could, of course, write the equation in integral form — just as we did in electro- 
statics with Gauss’ law — which would say that the flux through a surface is equal 
to the rate of change of heat energy inside the material. We will not bother to 
translate the equations back and forth between the differential and the integral 
forms, because it goes exactly the same as in electrostatics.) 

The rate at which heat is generated or absorbed at various places depends, of 
course, on the problem. Suppose, for example, that there is a source of heat inside 
the material (perhaps a radioactive source, or a resistor heated by an electrical 
current). Let us call s the heat energy produced per unit volume per second by 
this source. There may also be losses (or gains) of thermal energy to other internal 
energies in the volume. If u is the internal energy per unit volume, — du/dt will 
also be a “source” of heat energy. We have, then, 

Vh = s-^ ( - (12.5) 

We are not going to discuss just now the complete equation in which things 
change with time, because we are making an analogy to electrostatics, where no- 
thing depends on the time. We will consider only steady heat-flow problems, in 
which constant sources have produced an equilibrium state. In these cases, 

V h = s. (12.6) 

It is, of course, necessary to have another equation, which describes how the 
heat flows at various places. In many materials the heat current is approximately 
proportional to the rate of change of the temperature with position: the larger the 
temperature difference, the more the heat current. As we have seen, the vector 
heat current is proportional to the temperature gradient. The constant of pro- 
portionality K , a property of the material, is called the thermal conductivity . 

h = - KVT . (12.7) 

If the properties of the material vary from place to place, then K = K(x , y , z), a 
function of position. [Equation (12.7) is not as fundamental as (12.5), which 
expresses the conservation of heat energy, since the former depends upon a special 
property of the substance.] If now we substitute Eq. (12.7) into Eq. (12.6) we have 

V-(KVT) = - s , (12.8) 

which has exactly the same form as (12.4). Steady heat-flow problems and electro- 
static problems are the same . The heat flow vector h corresponds to E, and the 
temperature T corresponds to <f > . We have already noticed that a point heat source 
produces a temperature field which varies as 1 /r and a heat flow which varies as 
1/r 2 . This is nothing more than a translation of the statements from electrostatics 
that a point charge generates a potential which varies as 1/r and an electric field 
which varies as 1/r 2 . We can, in general, solve static heat problems as easily as 
we can solve electrostatic problems. 

Consider a simple example. Suppose that we have a cylinder of radius a at the 
temperature T l9 maintained by the generation of heat in the cylinder. (It could be, 
for example, a wire carrying a current, or a pipe with steam condensing inside.) 
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The cylinder is covered with a concentric sheath of insulating material which has a 
conductivity K. Say the outside radius of the insulation is b and the outside is 
kept at temperature T 2 (Fig. 12- la). We want to find out at what rate heat will 
be lost by the wire, or steampipe, or whatever it is in the center. Let the total 
amount of heat lost from a length L of the pipe be called G — which is what we are 
trying to find. 

How can we solve this problem? We have the differential equations, but since 
these are the same as those of electrostatics, we have really already solved the 
mathematical problem. The analogous problem is that of a conductor of radius a 
at the potential <£i, separated from another conductor of radius b at the potential 
</> 2 , with a concentric layer of dielectric material in between, as drawn in Fig. 
12-1 (b). Now since the heat flow h corresponds to the electric field E, the quantity 
G that we want to find corresponds to the flux of the electric fipld from a unit 
length (in other words, to the electric charge per unit length over e 0 ). We have 
solved the electrostatic problem by using Gauss’ law. We follow the same pro- 
cedure for our heat-flow problem. 

From the symmetry of the situation, we know that h depends only on the 
distance from the center. So we enclose the pipe in a gaussian cylinder of length 
L and radius r. From Gauss’ law, we know that the heat flow h multiplied by 
the area lirrL of the surface must be equal to the total amount of heat generated 
inside, which is what we are calling G: 


lirrLh = G or h = ~ — y * 

2irrL 

The heat flow is proportional to the temperature gradient: 

h = -KVT, 

or, in this case, the magnitude of h is 

v dT 
h = —a -r- * 
dr 

This, together with (12.9), gives 

(12.9) 

dT G 

dr IrKLr 

(12.10) 

Integrating from r = a to r = b, we get 


r “ r ‘- 

(12.11) 

Solving for G, we find 


2irKL(Ti - 7a) 

In (b/a) 

(12.12) 


This result corresponds exactly to the result for the charge on a cylindrical conden- 
ser: 

27T€oL(^i — <£2) 


Q = 


In {b/a ) 


The problems are the same, and they have the same solutions. From our knowledge 
of electrostatics, we also know how much heat is lost by an insulated pipe. 

Let’s consider another example of heat flow. Suppose we wish to know the 
heat flow in the neighborhood of a point source of heat located a little way beneath 
the surface of the earth, or near the surface of a large metal block. The localized 
heat source might be an atomic bomb that was set off underground, leaving an 
intense source of heat, or it might correspond to a small radioactive source inside 
a block of iron — there are numerous possibilities. 

We will treat the idealized problem of a point heat source of strength G at the 
distance a beneath the surface of an infinite block of uniform material whose 
thermal conductivity is K . And we will neglect the thermal conductivity of the 
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Fig. 12-1. (a) Heat flow in a cylin- 

drical geometry, (b) The corresponding 
electrical problem. 
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Fig. 12-2. The heat flow and iso- 
thermals near a point heat source at the 
distance a below the surface of a good 
thermal conductor. An image source is 
shown outside the material. 


air outside the material. We want to determine the distribution of the temperature 
on the surface of the block. How hot is it right above the source and at various 
places on the surface of the block? 

How shall we solve it? It is like an electrostatic problem with two materials 
with different dielectric coefficients k on opposite sides of a plane boundary. Aha! 
Perhaps it is the analog of a point charge near the boundary between a dielectric 
and a conductor, or something similar. Let’s see what the situation is near the 
surface. The physical condition is that the normal component of A on the surface 
is zero , since we have assumed there is no heat flow out of the block. We should 
ask: In what electrostatic problem do we have the condition that the normal 
component of the electric field E (which is the analog of K) is zero at a surface? 
There is none! 

That is one of the things that we have to watch out for. For physical reasons, 
there may be certain restrictions in the kinds of mathematical conditions which 
arise in any one subject. So if we have analyzed the differential equation only for 
certain limited cases, we may have missed some kinds of solutions that can occur 
in other physical situations. For example, there is no material with a dielectric 
constant of zero, whereas a vacuum does have zero thermal conductivity. So there 
is no electrostatic analogy for a perfect heat insulator. We can, however, still use 
the same methods . We can try to imagine what would happen if the dielectric 
constant were zero. (Of course, the dielectric constant is never zero in any real 
situation. But we might have a case in which there is a material with a very high 
dielectric constant, so that we could neglect the dielectric constant of the air out- 
side.) 

How shall we find an electric field that has no component perpendicular to 
the surface? That is, one which is always tangent at the surface? You will notice 
that our problem is opposite to the one of a point charge near a plane conductor. 
There we wanted the field to be perpendicular to the surface, because the conductor 
was all at the same potential. In the electrical problem, we invented a solution 
by imagining a point charge behind the conducting plate. We can use the same 
idea again. We try to pick an “image source” that will automatically make the 
normal component of the field zero at the surface. The solution is shown in 
Fig. 12-2. An image source of the same sign and the same strength placed at the 
distance a above the surface will cause the field to be always horizontal at the sur- 
face. The normal components of the two sources cancel out. 

Thus our heat flow problem is solved. The temperature everywhere is the 
same, by direct analogy, as the potential due to two equal point charges! The 
temperature T at the distance r from a single point source G in an infinite medium is 


T = 


G 

AirKr 


(12.13) 


(This, of course, is just the analog of </> = q/4ire 0 r.) The temperature for a point 
source, together with its image source, is 


T = 


G 

4xA>i 


+ 


G 

4i tKt 2 


(12.14) 


This formula gives us the temperature everywhere in the block. Several isothermal 
surfaces are shown in Fig. 12-2. Also shown are lines of A, which can be obtained 
from A = —K VT. 

We originally asked for the temperature distribution on the surface. For a 
point on the surface at the distance p from the axis, r± = r 2 = \/p 2 + so 


7Xsurface) = 2G - (12.15) 

4t tK Vp2 + 

This function is also shown in the figure. The temperature is, naturally, higher 
right above the source than it is farther away. This is the kind of problem that 
geophysicists often need to solve. We now see that it is the same kind of thing we 
have already been solving for electricity. 
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12-3 The stretched membrane 


Now let us consider a completely different physical situation which, never- 
theless, gives the same equations again. Consider a thin rubber sheet — a membrane 
— which has been stretched over a large horizontal frame (like a drumhead). 
Suppose now that the membrane is pushed up in one place and down in another; 
as shown in Fig. 12-3. Can we describe the shape of the surface? We will show 
how the problem can be solved when the deflections of the membrane are not too 
large. 

There are forces in the sheet because it is stretched. If we were to make a 
small cut anywhere, the two sides of the cut would pull apart (see Fig. 12-4). So 
there is a surface tension in the sheet, analogous to the one-dimensional tension 
in a stretched string. We define the magnitude of the surface tension r as the force 
per unit length which will just hold together the two sides of a cut such as one of 
those shown in Fig. 12-4. 

Suppose now that we look at a vertical cross section of the membrane. It 
will appear as a curve, like the one in Fig. 12-5. Let u be the vertical displacement 
of the membrane from its normal position, and x and y the coordinates in the 
horizontal plane. (The cross section shown is parallel to the x-axis.) 

Consider a little piece of the surface of length Ax and width Ay. There will be 
forces on the piece from the surface tension along each edge. The force along 
edge 1 of the figure will be r i Ay, directed tangent to the surface — that is, at the 
angle 0 X from the horizontal. Along edge 2, the force will be t 2 Ay at the angle 0 2 . 
(There will be similar forces on the other two edges of the piece, but we will forget 
them for the moment.) The net upward force on the piece from edges 1 and 2 is 

A F = r 2 Ay sin 0 2 — r x Ay sin 0 X . 

We will limit our considerations to small distortions of the membrane, i.e., to 
small slopes: we can then replace sin 0 by tan 0, which can be written as du/dx. The 
force is then 




The quantity in brackets can be equally well written (for small Ax) as 


d ( du\ 

sl'W 1 

-(t-) 

dX \ dx/ 


Ax Ay. 


There will be another contribution to A F from the forces on the other two 
edges; the total is evidently 

4f - [e ( t fj) + | ( T I)] 4 * 4 '- <12l6) 

The distortions of the diaphragm are caused by external forces. Let’s let 
/ represent the upward force per unit area on the sheet (a kind of “pressure”) 
from the external forces. When the membrane is in equilibrium (the static case), 
this force must be balanced by the internal force we have just computed, Eq 
(12.16). That is 

J Av An 


Ax Ay 


Equation (12.16) can then be written 


/ = - V * (t Vw), 


(12.17) 


where by V we now mean, of course, the two-dimensional gradient operator 
( d/dx , d/dy). We have the differential equation that relates w(x, y) to the applied 



Fig. 1 2-3. A thin rubber sheet 
stretched over a cylindrical frame (like 
a drumhead). If the sheet is pushed up 
at A and down at 6, what is the shape 
of the surface? 



Fig. 12-4. The surface tension r of 
a stretched rubber sheet is the force per 
unit length across a line. 



x 


Fig. 12-5. Cross section of the de- 
flected sheet. 


forces f(x 9 y) and the surface tension r(x, y), which may, in general, vary from 
place to place in the sheet. (The distortions of a three-dimensional elastic body are 
also governed by similar equations, but we will stick to two-dimensions.) We 
will worry only about the case in which the tension r is constant throughout the 
sheet. We can then write for Eq. (12.17), 

V 2 m = - £. (12.18) 

We have another equation that is the same as for electrostatics! — only this 
time, limited to two-dimensions. The displacement u corresponds to </>, and f/r 
corresponds to p/c 0 . So all the work we have done for infinite plane charged sheets, 
or long parallel wires, or charged cylinders is directly applicable to the stretched 
membrane. 

Suppose we push the membrane at some points up to a definite height — that is, 
we fix the value of u at some places. That is the analog of having a definite potential 
at the corresponding places in an electrical situation. So, for instance, we may 
make a positive “potential” by pushing up on the membrane with an object having 
the cross-sectional shape of the corresponding cylindrical conductor. For example, 
if we push the sheet up with a round rod, the surface will take on the shape shown 
in Fig. 12-6. The height u is the same as the electrostatic potential 6 of a charged 
cylindrical rod. It falls off as ln(l/r). (The slope , which corresponds to the 
electric field £, drops off as 1/r.) 


Fig. 12-6. Cross section of a 
stretched rubber sheet pushed up by a 
round rod. The function u(x, y) is the same 
as the electric potential <£(x, y) near a 
very long charged rod. 



The stretched rubber sheet has often been used as a way of solving complicated 
electrical problems experimentally. The analogy is used backwards! Various 
rods and bars are pushed against the sheet to heights that correspond to the po- 
tentials of a set of electrodes. Measurements of the height then give the electrical 
potential for the electrical situation. The analogy has been carried even further. 
If little balls are placed on the membrane, their motion corresponds approximately 
to the motion of electrons in the corresponding electric field. One can actually 
watch the “electrons” move on their trajectories. This method was used to design 
the complicated geometry of many photomultiplier tubes (such as the ones used 
for scintillation counters, and the one used for controlling the headlight beams on 
Cadillacs). The method is still used, but the accuracy is limited. For the most 
accurate work, it is better to determine the fields by numerical methods, using the 
large electronic computing machines. 


12-4 The diffusion of neutrons; a uniform spherical source in a homogeneous 
medium 

We take another example that gives the same kind of equation, this time 
having to do with diffusion. In Chapter 43 of Vol. I we considered the diffusion 
of ions in a single gas, and of one gas through another. This time, let’s take a 
different example — the diffusion of neutrons in a material like graphite. We choose 
to speak of graphite (a pure form of carbon) because carbon doesn’t absorb slow 
neutrons. In it the neutrons are free to wander around. They travel in a straight 
line for several centimeters, on the average, before being scattered by a nucleus 
and deflected into a new direction. So if we have a large block — many meters on 
a side — the neutrons initially at one place will diffuse to other places. We want to 
find a description of their average behavior — that is, their average flow. 
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Let N(x, y, z) AF be the number of neutrons in the element of volume AF 
at the point (x, y , z ). Because of their motion, some neutrons will be leaving AF, 
and others will be coming in. If there are more neutrons in one region than in a 
nearby region, more neutrons will go from the first region to the second than come 
back; there will be a net flow. Following the arguments of Chapter 43 in VoL I, 
we describe the flow by a flow vector /. Its ^-component J x is the net number of 
neutrons that pass in unit time a unit area perpendicular to the x-direction. We 
found that 

dN 

J * = (12. 19) 


where the diffusion constant D is given in terms of the mean velocity v, and the 
mean-free-path / between scatterings is given by 

D = \ lv - 

The vector equation for J is 

/= - DvN . (12.20) 


The rate at which neutrons flow across any surface element da is / • nda 
(where, as usual, n is the unit normal). The net flow out of a volume element is then 
(following the usual gaussian argument) V • J dV, This flow would result in 
a decrease with time of the number in AF unless neutrons are being created in 
AF (by some nuclear process). If there are sources in the volume that generate S 
neutrons per unit time in a unit volume, then the net flow out of AF will be equal 
to ( S — dN/dt) AF. We have then that 


V / = S 


dN 

dt 


( 12 . 21 ) 


Combining (12.21) with (12.20), we get the neutron diffusion equation 


V * (-2) VN) 



( 12 . 22 ) 


In the static case — where dN/dt — 0 — we have Eq. (12.4) all over again! 
We can use our knowledge of electrostatics to solve problems about the diffusion 
of neutrons. So let’s solve a problem. (You may wonder: Why do a problem if 
we have already done all the problems in electrostatics? We can do it faster this 
time because we have done the electrostatic problems!) 

Suppose we have a block of material in which neutrons are being generated — 
say by uranium fission — uniformly throughout a spherical region of radius a 
(Fig. 12-7). We would like to know: What is the density of neutrons everywhere? 
How uniform is the density of neutrons in the region where they are being gen- 
erated? What is the ratio of the neutron density at the center to the neutron density 
at the surface of the source region? Finding the answers is easy. The source 
density 5 0 replaces the charge density p, so our problem is the same as the problem 
of a sphere of uniform charge density. Finding N is just like finding the potential 
4>. We have already worked out the fields inside and outside of a uniformly charged 
sphere; we can integrate them to get the potential. Outside, the potential is 
2/47r€ 0 r, with the total charge Q given by 4rra 3 p/3. So 


Outside - • (12.23) 

For points inside, the field is due only to the charge Q(r ) inside the sphere of radius 
r, Q(r) = 47ir 3 p/3, so 

E = ^ • (12.24) 



Fig. 1 2-7. (a) Neutrons are produced 

uniformly throughout a sphere of radius a 
in a large graphite block and diffuse 
outward. The neutron density N is found 
as a function of r, the distance from the 
center of the source, (b) The analogous 
electrostatic situation: a uniform sphere of 
charge, where N corresponds to <f> and J 
corresponds to £. 
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The field increases linearly with r. Integrating E to get <j>, we have 


tfmside = — + a constant. 

OCo 


At the radius a, <t> ins ide must be the same as <£ 0 uteide> so the constant must be 
pa 2 / 2€ 0 . (We are assuming that <t> is zero at large distances from the source, which 
will correspond to N being zero for the neutrons.) Therefore, 


3e 0 V 2 2 ) 


(12,25) 


We know immediately the neutron density in our other problem. The answer 
is 

^outside = ^;> (12.26) 

and 

02.27) 


N is shown as a function of r in Fig. 12-7. 

Now what is the ratio of density at the center to that at the edge? At the center 
(r = 0), it is proportional to 3a 2 /2. At the edge ( r = a) it is proportional to 
2a 2 /2, so the ratio of densities is 3/2. A uniform source doesn’t produce a uniform 
density of neutrons. You see, our knowledge of electrostatics gives us a good start 
on the physics of nuclear reactors. 

There are many physical circumstances in which diffusion plays a big part. 
The motion of ions through a liquid, or of electrons through a semiconductor, 
obeys the same equation. We find again and again the same equations. 


12-5 Irrotational fluid flow; the flow past a sphere 

Let’s now consider an example which is not really a very good one, because 
the equations we will use will not really represent the subject with complete 
generality but only in an artificial idealized situation. We take up the problem 
of water flow . In the case of the stretched sheet, our equations were an approxima- 
tion which was correct only for small deflections . For our consideration of water 
flow, we will not make that kind of an approximation; we must make restrictions 
that do not apply at all to real water. We treat only the case of the steady flow of 
an incompressible , nonviscous, circulation-free liquid. Then we represent the flow 
by giving the velocity v(r) as a function of position r. If the motion is steady 
(the only case for which there is an electrostatic analog) v is independent of time. 
If p is the density of the fluid, then pv is the amount of mass which passes per unit 
time through a unit area. By the conservation of matter, the divergence of pv will 
be, in general, the time rate of change of the mass of the material per unit volume. 
We will assume that there are no processes for the continuous creation or destruc- 
tion of matter. The conservation of matter then requires that V - pv — 0. (It 
should, in general, be equal to — dp/dt , but since our fluid is incompressible, p 
cannot change.) Since p is everywhere the same, we can factor it out, and our equa- 
tion is simply 

V • v = 0. 

Good! We have electrostatics again (with no charges); it’s just like V * E = 0. 
Not so! Electrostatics is not simply V * E = 0. It is a pair of equations. One 
equation does not tell us enough; we need still an additional equation. To match 
electrostatics, we should have also that the curl of v is zero. But that is not generally 
true for real liquids. Most liquids will ordinarily develop some circulation. So 
we are restricted to the situation in which there is no circulation of the fluid. Such 
flow is often called irrotational Anyway, if we make all our assumptions, we can 
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imagine a case of fluid flow that is analogous to electrostatics. So we take 


V • v = 0 (12.28) 

and 

V X v = 0. (12.29) 

We want to emphasize that the number of circumstances in which liquid 
flow follows these equations is far from the great majority, but there are a few. 
They must be cases in which we can neglect surface tension, compressibility, and 
viscosity, and in which we can assume that the flow is irrotational. These assump- 
tions are valid so rarely for real water that the mathematician John von Neumann 
said that people who analyze Eqs. (12.28) and (12.29) are studying “dry water”! 
(We take up the problem of fluid flow in more detail in Chapters 40 and 41.) 

Because V X v = 0, the velocity of “dry water” can be written as the 
gradient of some potential: 

v = -V*. (12.30) 


What is the physical meaning of There isn’t any very useful meaning. The 
velocity can be written as the gradient of a potential simply because the flow is 
irrotational. And by analogy with electrostatics, $ is called the velocity potential , 
but it is not related to a potential energy in the way that <j> is. Since the divergence 
of v is zero, we have 

V(V^) = VV = 0* (12.31) 


The velocity potential \p obeys the same differential equation as the electrostatic 
potential in free space (p = 0). 

Let’s pick a problem in irrotational flow and see whether we can solve it by 
the methods we have learned. Consider the problem of a spherical ball falling 
through a liquid. If it is going too slowly, the viscous forces, which we are dis- 
regarding, will be important. If it is going too fast, little whirlpools (turbulence) 
will appear in its wake and there will be some circulation of the water. But if the 
ball is going neither too fast nor too slow, it is more or less true that the water flow 
will fit our assumptions, and we can describe the motion of the water by our 
simple equations. 

It is convenient to describe what happens in a frame of reference fixed in the 
sphere . In this frame we are asking the question : How does water flow past a sphere 
at rest when the flow at large distances is uniform? That is, when, far from the 
sphere, the flow is everywhere the same. The flow near the sphere will be as shown 
by the streamlines drawn in Fig. 12-8. These lines, always parallel to v 9 correspond 
to lines of electric field. We want to get a quantative description for the velocity 
field, i.e., an expression for the velocity at any point P. 

We can find the velocity from the gradient of \f/ 9 so we first work out the po- 
tential. We want a potential that satisfies Eq. (12.31) everywhere, and which 
also satisfies two restrictions: (1) there is no flow in the spherical region inside 
the surface of the ball, and (2) the flow is constant at large distances. To satisfy 
(1), the component of v normal to the surface of the sphere must be zero. That 
means that d^/dr is zero at r = a. To satisfy (2), we must have d^/dz = Vq at 
all points where ry> a. Strictly speaking, there is no electrostatic case which 
corresponds exactly to our problem. It really corresponds to putting a sphere of 
dielectric constant zero in a uniform electric field. If we had worked out the 
solution to the problem of a sphere of a dielectric constant k in a uniform field, 
then by putting k = 0 we would immediately have the solution to this problem. 

We have not actually worked out this particular electrostatic problem in de- 
tail, but let’s do it now. (We could work directly on the fluid problem with v and 
but we will use E and <j> because we are so used to them.) 

The problem is: Find a solution of v 2 <t> = 0 such that E = — y<p is a con- 
stant, say Eq 9 for large r, and such that the radial component of E is equal to zero 
at r = a. That is, 


d(j> 

~dr 


= 0 . 


(12.32) 



Fig. 12-8. The velocity field of ir- 
rotational fluid flow past a sphere. 


r=a 
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Our problem involves a new kind of boundary condition, not one for which 
^ is a constant on a surface, but for which d<t>/dr is a constant. That is a 
little different. It is not easy to get the answer immediately. First of all, without 
the sphere, 4> would be —E 0 z. Then E would be in the z-direction and have the con- 
stant magnitude E 0 , everywhere. Now we have analyzed the case of a dielectric 
sphere which has a uniform polarization inside it, and we found that the field 
inside such a polarized sphere is a uniform field, and that outside it is the same as 
the field of a point dipole located at the center. So let’s guess that the solution we 
want is a superposition of a uniform field plus the field of a dipole. The potential 
of a dipole (Chapter 6) is pz/4Tre 0 r z . Thus we assume that 

+ (1133) 

Since the dipole field falls off as 1/r 3 , at large distances we have just the field E 0 , 
Our guess will automatically satisfy condition (2) above. But what do we take for 
the dipole strength pi To find out, we may use the other condition on <£, Eq. (12.32). 
We must differentiate <j> with respect to /*, but of course we must do so at a constant 
angle 0, so it is more convenient if we first express 4> in terms of r and 0, rather than 
of z and r. Since z = r cos 0, we get 

* = -£orcos* + • (12.34) 

The radial component of E is 

~ a? =+Eo cos 9 + W3 • (12>35) 

This must be zero at r = a for all 0. This will be true if 

p ~ — 2Tr6 0 a s E 0 . (12.36) 

Note carefully that if both terms in Eq. (12.35) had not had the same 0-depen- 

dence, it would not have been possible to choose p so that (12.35) turned out to be 
zero at r — a for all angles. The fact that it works out means that we have guessed 
wisely in writing Eq. (12.33). Of course, when we made the guess we were looking 
ahead; we knew that we would need another term that (a) satisfied V 2 <£ = 0 (any 
real field would do that), (b) dependent on cos 0, and (c) fell to zero at large r. 
The dipole field is the only one that does all three. 

Using (12.36), our potential is 

= —E 0 cos 6 ■ (12.37) 

The solution of the fluid flow problem can be written simply as 

t = -v 0 cos e (r + ^ 2 ) • (12.38) 

It is straightforward to find u from this potential. We will not pursue the matter 
further. 


12-6 Illumination; the uniform lighting of a plane 

In this section we turn to a completely different physical problem — we want 
to illustrate the great variety of possibilities. This time we will do something that 
leads to the same kind of integral that we found in electrostatics. (If we have a 
mathematical problem which gives us a certain integral, then we know something 
about the properties of that integral if it is the same integral that we had to do for 
another problem.) We take our example from illumination engineering. Suppose 
there is a light source at the distance a above a plane surface. What is the illumina- 
tion of the surface? That is, what is the radiant energy per unit time arriving at a 
unit area of the surface? (See Fig. 12-9.) We suppose that the source is spherically 
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Fig. 12-9. The illumination /„ of a 
surface is the radiant energy per unit 
time arriving at a unit area of the surface. 

symmetric, so that light is radiated equally in all directions. Then the amount of 
radiant energy which passes through a unit area at right angles to a light flow varies 
inversely as the square of the distance. It is evident that the intensity of the light in 
the direction normal to the flow is given by the same kind of formula as for the 
electric field from a point source. If the light rays meet the surface at an angle 6 to 
the normal, then /, the energy arriving per unit area of the surface, is only cos 0 as 
great, because the same energy goes onto an area larger by 1/cos 0 . If we call the 
strength of our light source S, then 7 n , the illumination of a surface, is 

/„ = %e r n, (12.39) 

r z 

where e r is the unit vector from the source and n is the unit normal to the surface. 

The illumination I n corresponds to the normal component of the electric field from 
a point charge of strength 47re 0 *S. Knowing that, we see that for any distribution of 
light sources, we can find the answer by solving the corresponding electrostatic 
problem. We calculate the vertical component of electric field on the plane due to 
a distribution of charge in the same way as for that of the light sources.* 

Consider the following example. We wish for some special experimental 
situation to arrange that the top surface of a table will have a very uniform illumina- 
tion. We have available long tubular fluorescent lights which radiate uniformly 
along their lengths. We can illuminate the table by placing the fluorescent tubes 
in a regular array on the ceiling, which is at the height z above the table. What is 
the widest spacing b from tube to tube that we should use if we want the surface 
illumination to be uniform to, say, within one part in a thousand? Answer ; (1) 

Find the electric field from a grid of wires with the spacing b , each charged uni- 
formly; (2) compute the vertical component of the electric field; (3) find out what 
b must be so that the ripples of the field are not more than one part in a thousand. 

In Chapter 7 we saw that the electric field of a grid of charged wires could be 
represented as a sum of terms, each one of which gave a sinusoidal variation of 
the field with a period of b/n, where n is an integer. The amplitude of any one of 
these terms is given by Eq. (7.44): 



F n — A n e 


— 2rnzjb 


We need consider only n = 1, so long as we only want the field at points not too 
close to the grid. For a complete solution, we would still need to determine the 
coefficients A n , which we have not yet done (although it is a straightforward 
calculation). Since we need only A i, we can estimate that its magnitude is roughly 
the same as that of the average field. The exponential factor would then give us 
directly the relative amplitude of the variations. If we want this factor to be 10 3 , 
we find that b must be 0.91z. If we make the spacing of the fluorescent tubes 3/4 


* Since we are talking about incoherent sources whose intensities always add linearly, 
the analogous electric charges will always have the same sign. Also, our analogy applies 
only to the light energy arriving at the top of an opaque surface, so we must include in 
our integral only the sources which shine on the surface (and, naturally, not sources 
located below the surface!). 
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of the distance to the ceiling, the exponential factor is then 1/4000, and we have a 
safety factor of 4, so we are fairly sure that we will have the illumination constant 
to one part in a thousand. (An exact calculation shows that A x is really twice the 
average field, so the exact answer is b = 0.8z.) It is somewhat surprising that for 
such a uniform illumination the allowed separation of the tubes comes out so large. 


12-7 The “underlying unity” of nature 

In this chapter, we wished to show that in learning electrostatics you have 
learned at the same time how to handle many subjects in physics, and that by 
keeping this in mind, it is possible to learn almost all of physics in a limited number 
of years. 

However, a question surely suggests itself at the end of such a discussion: 
Why are the equations from different phenomena so similar ? We might say: “It is 
the underlying unity of nature. 5 ’ But what does that mean? What could such a 
statement mean? It could mean simply that the equations are similar for different 
phenomena; but then, of course, we have given no explanation. The “underlying 
unity” might mean that everything is made out of the same stuff, and therefore 
obeys the same equations. That sounds like a good explanation, but let us 
think. The electrostatic potential, the diffusion of neutrons, heat flow — are we 
really dealing with the same stuff? Can we really imagine that the electrostatic po- 
tential is physically identical to the temperature, or to the density of particles? 
Certainly 4> is not exactly the same as the thermal energy of particles. The displace- 
ment of a membrane is certainly not like a temperature. Why, then, is there “an 
underlying unity”? 

A closer look at the physics of the various subjects shows, in fact, that the 
equations are not really identical. The equation we found for neutron diffusion is 
only an approximation that is good when the distance over which we are looking 
is large compared with the mean free path. If we look more closely, we would see 
the individual neutrons running around. Certainly the motion of an individual 
neutron is a completely different thing from the smooth variation we get from 
solving the differential equation. The differential equation is an approximation, 
because we assume that the neutrons are smoothly distributed in space. 

Is it possible that this is the clue? That the thing which is common to all the 
phenomena is the space , the framework into which the physics is put? As long as 
things are reasonably smooth in space, then the important things that will be 
involved will be the rates of change of quantities with position in space. That is 
why we always get an equation with a gradient. The derivatives must appear in 
the form of a gradient or a divergence; because the laws of physics are independent 
of direction , they must be expressible in vector form. The equations of electro- 
statics are the simplest vector equations that one can get which involve only the 
spatial derivatives of quantities. Any other simple problem — or simplification of a 
complicated problem — must look like electrostatics. What is common to all our 
problems is that they involve space and that we have imitated what is actually a 
complicated phenomenon by a simple differential equation. 

That leads us to another interesting question. Is the same statement perhaps 
also true for the electrostatic equations? Are they also correct only as a smoothed- 
out imitation of a really much more complicated microscopic world? Could it be 
that the real world consists of little X-ons which can be seen only at very tiny dis- 
tances? And that in our measurements we are always observing on such a large 
scale that we can’t see these little X-ons, and that is why we get the differential 
equations? 

Our currently most complete theory of electrodynamics does indeed have its 
difficulties at very short distances. So it is possible, in principle, that these equations 
are smoothed-out versions of something. They appear to be correct at distances 
down to about 10“ 14 cm, but then they begin to look wrong. It is possible that 
there is some as yet undiscovered underlying “machinery,” and that the details of 
an underlying complexity are hidden in the smooth-looking equations — as is so 
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in the “smooth” diffusion of neutrons. But no one has yet formulated a successful 
theory that works that way. 

Strangely enough, it turns out (for reasons that we do not at all understand) 
that the combination of relativity and quantum mechanics as we know them seems 
to forbid the invention of an equation that is fundamentally different from Eq. 
(12.4), and which does not at the same time lead to some kind of contradiction. 
Not simply a disagreement with experiment, but an internal contradiction . As, for 
example, the prediction that the sum of the probabilities of all possible occurrences 
is not equal to unity, or that energies may sometimes come out as complex numbers, 
or some other such idiocy. No one has yet made up a theory of electricity for which 
V 2 <j> = — p/e 0 is understood as a smoothed-out approximation to a mechanism 
underneath, and which does not lead ultimately to some kind of an absurdity. 
But, it must be added, it is also true that the assumption that V 2 tf> = — p/e 0 is 
valid for all distances, no matter how small, leads to absurdities of its own (the 
electrical energy of an electron is infinite) — absurdities from which no one yet 
knows an escape. 
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Magnetostatics 


13-1 The magnetic field 

The force on an electric charge depends not only on where it is, but also on 
how fast it is moving. Every point in space is characterized by two vector quantities 
which determine the force on any charge. First, there is the electric force , which 
gives a force component independent of the motion of the charge. We describe it 
by the electric field, E. Second, there is an additional force component, called the 
magnetic force , which depends on the velocity of the charge. This magnetic force 
has a strange directional character: At any particular point in space, both the 
direction of the force and its magnitude depend on the direction of motion of the 
particle: at every instant the force is always at right angles to the velocity vector; 
also, at any particular point, the force is always at right angles to a fixed direction 
in space (see Fig. 13-1); and finally, the magnitude of the force is proportional to 
the component of the velocity at right angles to this unique direction. It is possible 
to describe all of this behavior by defining the magnetic field vector B, which speci- 
fies both the unique direction in space and the constant of proportionality with the 
velocity, and to write the magnetic force as qv X B . The total electromagnetic 
force on a charge can, then, be written as 

F = q(E + v X B). (13.1) 

This is called the Lorentz force . 

The magnetic force is easily demonstrated by bringing a bar magnet close to a 
cathode-ray tube. The deflection of the electron beam shows that the presence of 
the magnet results in forces on the electrons transverse to their direction of motion, 
as we described in Chapter 12 of Vol. I. 

The unit of magnetic field B is evidently one newton-second per 
coulomb-meter. The same unit is also one volt-second per meter 2 . It is also 
called one weber per square meter. 


13-2 Electric current; the conservation of charge 

We consider first how we can understand the magnetic forces on wires carrying 
electric currents. In order to do this, we define what is meant by the current density. 
Electric currents are electrons or other charges in motion with a net drift or flow. 
We can represent the charge flow by a vector which gives the amount of charge 
passing per unit area and per unit time through a surface element at right angles to 
the flow (just as we did for the case of heat flow). We call this the current density 
and represent it by the vector j. It is directed along the motion of the charges. 
If we take a small area AS at a given place in the material, the amount of charge 
flowing across that area in a unit time is 

j nAS, (13.2) 

where n is the unit vector normal to AS. 

The current density is related to the average flow velocity of the charges. 
Suppose that we have a distribution of charges whose average motion is a drift 
with the velocity v. As this distribution passes over a surface element AS, the charge 
A q passing through the surface element in a time At is equal to the charge contained 
in a parallelepiped whose base is AS and whose height is v At, as shown in Fig. 13-2. 
The volume of the parallelepiped is the projection of AS at right angles to v times 
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Fig. 13-1. The velocity-dependent 
component of the force on a moving 
charge is at right angles to v and to the 
direction of B. It is also proportional to 
the component of v at right angles to B, 
that is, to v sin 6 . 



v At , which when multiplied by the charge density p will give A q. Thus 



Fig. 1 3-2. If a charge distribution of 
density p moves with the velocity v, the 
charge per unit time through AS is 
pv • n AS. 



Fig. 13-3. The current / through the 
surface S is fj • n dS. 


Aq — pv • n AS At. 

The charge per unit time is then pv * n AS, from which we get 

j - pv. (13.3) 

If the charge distribution consists of individual charges, say electrons, each 
with the charge q and moving with the mean velocity v, then the current density is 

j = Nqv, (13.4) 

where N is the number of charges per unit volume. 

The total charge passing per unit time through any surface S is called the 
electric current , /. It is equal to the integral of the normal component of the flow 
through all of the elements of the surface: 



(see Fig. 13-3). 

The current / out of a closed surface S represents the rate at which charge 
leaves the volume V enclosed by 5. One of the basic laws of physics is that 
electric charge is indestructible; it is never lost or created. Electric charges can 
move from place to place but never appear from nowhere. We say that charge is 
conserved . If there is a net current out of a closed surface, the amount of charge 
inside must decrease by the corresponding amount (Fig. 13-4). We can, therefore, 
write the law of the conservation of charge as 

j j ■ n dS = -~(Q inside ). (13.6) 

any closed 
surface 



The charge inside can be written as a volume integral of the charge density: 

<2in 8 ide= / pdV. (13.7) 

V 

inside S 

If we apply (13.6) to a small volume AV, we know that the left-hand integral 
is V - j AV. The charge inside is p AV, so the conservation of charge can also be 
written as 

Vj=-f t (13.8) 

(Gauss’ mathematics once again!). 


Fig. 1 3-4. The integral of j • n over 
a closed surface is the rate of change of 
the total charge Q inside. 


13-3 The magnetic force on a current 

Now we are ready to find the force on a current-carrying wire in a magnetic 
field. The current consists of charged particles moving with the velocity v along 
the wire. Each charge feels a transverse force 


F = qv X B 

(Fig. 13-5a). If there are N such charges per unit volume, the number in a small 
volume AV of the wire is N AV. The total magnetic force A F on the volume AV 
is the sum of the forces on the individual charges, that is, 


But Nqv is just j, so 


A F = (NAV)(qv X B). 
AF = ;X BAV 


(Fig. 13— 5b). The force per unit volume is j X B. 
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If the current is uniform across a wire whose cross-sectional area is A, we 
may take as the volume element a cylinder with the base area A and the length 
A L. Then 

AF = j X BAAL. (13.10) 

Now we can call jA the vector current / in the wire. (Its magnitude is the electric 
current in the wire, and its direction is along the wire.) Then 

A F = IX BAL . (13.11) 

The force per unit length on a wire is / X B. 

This equation gives the important result that the magnetic force on a wire, 
due to the movement of charges in it, depends only on the total current, and not on 
the amount of charge carried by each particle — or even its sign! The magnetic 
force on a wire near a magnet is easily shown by observing its deflection when a 
current is turned on, as was described in Chapter 1 (see Fig. 1-6). 


13-4 The magnetic field of steady currents; Ampere’s law 

We have seen that there is a force on a wire in the presence of a magnetic field, 
produced, say, by a magnet. From the principle that action equals reaction we 
might expect that there should be a force on the source of the magnetic field, i.e., 
on the magnet, when there is a current through the wire.* There are indeed such 
forces, as is seen by the deflection of a compass needle near a current-carrying 
wire. Now we know that magnets feel forces from other magnets, so that means 
that when there is a current in a wire, the wire itself generates a magnetic field. 
Moving charges, then, produce a magnetic field. We would like now to try to 
discover the laws that determine how such magnetic fields are created. The question 
is: Given a current, what magnetic field does it make? The answer to this question 
was determined experimentally by three critical experiments and a brilliant 
theoretical argument given by Ampere. We will pass over this interesting historical 
development and simply say that a large number of experiments have demonstrated 
the validity of Maxwell’s equations. We take them as our starting point. If we 
drop the terms involving time derivatives in these equations we get the equations of 
magnetostatics: 

V • B = 0 (13.12) 

and 

c 2 V X B = — • (13.13) 

€0 

These equations are valid only if all electric charge densities are constant and all 
currents are steady, so that the electric and magnetic fields are not changing with 
time — all of the fields are “static.” 

We may remark that it is rather dangerous to think that there is such a thing 
as a static magnetic situation, because there must be currents in order to get a 
magnetic field at all — and currents can come only from moving charges. “Mag- 
netostatics” is, therefore, an approximation. It refers to a special kind of dynamic 
situation with large numbers of charges in motion, which we can approximate by 
a steady flow of charge. Only then can we speak of a current density j which does 
not change with time. The subject should more accurately be called the study of 
steady currents. Assuming that all fields are steady, we drop all terms in SEjdt 
and dB/dt from the complete Maxwell equations, Eqs. (2.41), and obtain the 
two equations (13.12) and (13.13) above. Also notice that since the divergence of 
the curl of any vector is necessarily zero, Eq. (13.13) requires that V j = 0. This 
is true, by Eq. (13.8), only if dp/dt is zero. But that must be so if E is not changing 
with time, so our assumptions are consistent. 


* We will see later, however, that such assumptions are not generally correct for electro- 
magnetic forces! 



(b) 


Fig. 13-5. The magnetic force on a 
current-carrying wire is the sum of the 
forces on the individual moving charges. 
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Fig, 13-6. The line integral of the 
tangential component of 6 is equal to the 
surface integral of the normal component 
of V X B. 


The requirement that V • j — 0 means that we may only have charges which 
flow in paths that close back on themselves. They may, for instance, flow in wires 
that form complete loops — called circuits. The circuits may, of course, contain 
generators or batteries that keep the charges flowing. But they may not include 
condensers which are charging or discharging. (We will, of course, extend the 
theory later to include dynamic fields, but we want first to take the simpler case of 
steady currents.) 

Now let us look at Eqs. (13.12) and (13.13) to see what they mean. The first 
one says that the divergence of B is zero. Comparing it to the analogous equation 
in electrostatics, which says that V ■ E = p/e 0 , we can conclude that there is no 
magnetic analog of an electric charge. There are no magnetic charges from which 
lines of B can emerge. If we think in terms of “lines'’ of the vector field B, they can 
never start and they never stop. Then where do they come from? Magnetic fields 
“appear” in the presence of currents; they have a curl proportional to the current 
density. Wherever there are currents, there are lines of magnetic field making 
loops around the currents. Since lines of B do not begin or end, they will often 
close back on themselves, making closed loops. But there can also be complicated 
situations in which the lines are not simple closed loops. But whatever they do, 
they never diverge from points. No magnetic charges have ever been discovered, 
so V • B — 0. This much is true not only for magnetostatics, it is always true — 
even for dynamic fields. 

The connection between the B field and currents is contained in Eq. (13.13). 
Here we have a new kind of situation which is quite different from electrostatics, 
where we had V X E = 0. That equation meant that the line integral of E around 
any closed path is zero: 

E • ds = 0. 

loop 


We got that result from Stokes’ theorem, which says that the integral around any 
closed path of any vector field is equal to the surface integral of the normal com- 
ponent of the curl of the vector (taken over any surface which has the closed loop 
as its periphery). Applying the same theorem to the magnetic field vector and 
using the symbols shown in Fig. 13-6, we get 

(V X B) • n dS. (13.14) 


Taking the curl of B from Eq. (13.13), we have 

The integral over /, according to (13.5), is the total current /through the surface S. 
Since for steady currents the current through S is independent of the shape of 5, 
so long as it is bounded by the curve r, one usually speaks of “the current through 
the loop T.” We have, then, a general law: the circulation of B around any closed 
curve is equal to the current / through the loop, divided by € 0 c 2 : 

f Bek = /th - rou f r ■ (13.16) 

Jr e 0 c 2 

This law — called Ampere's law — plays the same role in magnetostatics that Gauss’ 
law played in electrostatics. Ampere’s law alone does not determine B from cur- 
rents; we must, in general, also use V • B = 0. But, as we will see in the next 
section, it can be used to find the field in special circumstances which have certain 
simple symmetries. 
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13-5 The magnetic field of a straight wire and of a solenoid; atomic 
currents 


We can illustrate the use of Ampere’s law by finding the magnetic field near 
a wire. We ask: What is the field outside a long straight wire with a cylindrical 
cross section? We will assume something which may not be at all evident, but which 
is nevertheless true: that the field lines of B go around the wire in closed circles. 
If we make this assumption, then Ampere’s law, Eq. (13.16), tells us how strong the 
field is. From the symmetry of the problem, B has the same magnitude at all 
points on a circle concentric with the wire (see Fig. 13-7). We can then do the line 
integral of B • ds quite easily; it is just the magnitude of B times the circumference. 
If r is the radius of the circle, then 

B • ds = B • 27nr. 


The total current through the loop is merely the current 1 in the wire, so 


or 


B • 27rr 



4t€qC 2 r 


(13.17) 



Fig. 1 3-7. The magnetic field outside 
of a long wire carrying the current /. 


The strength of the magnetic field drops off inversely as /*, the distance from the 
axis of the wire. Wecan, if we wish, write Eq. (13. 17)in vector form. Remembering 
that B is at right angles both to / and to r, we have 


1 21 X e T 

4? t€ 0 c 2 r 


(13.18) 


We have separated out the factor l/47re 0 c 2 , because it appears often. It is 
worth remembering that it is exactly 10~ 7 (in the mks system), since an equation 
like (13.17) is used to define the unit of current, the ampere. At one meter from a 
current of one ampere the magnetic field is 2 X 10~ 7 webers per square meter. 

Since a current produces a magnetic field, it will exert a force on a nearby wire 
which is also carrying a current. In Chapter 1 we described a simple demonstration 
of the forces between two current-carrying wires. If the wires are parallel, each is 
at right angles to the B field of the other ; the wires should then be pushed either 
toward or away from each other. When currents are in the same direction, the 
wires attract; when the currents are moving in opposite directions, the wires repel. 



Let’s take another example that can be analyzed by Ampere’s law if we add 
some knowledge about the field. Suppose we have a long coil of wire wound in a 
tight spiral, as shown by the cross sections in Fig. 13-8. Such a coil is called a 
solenoid . We observe experimentally that when a solenoid is very long compared 
with its diameter, the field outside is very small compared with the field inside. 
Using just that fact, together with Ampere’s law, we can find the size of the field 
inside. 

Since the field stays inside (and has zero divergence), its lines must go along 
parallel to the axis, as shown in Fig. 13-8. That being the case, we can use Ampere’s 
law with the rectangular “curve” T shown in the figure. This loop goes the distance 

13-5 



L inside the solenoid, where the field is, say, B 0 , then goes at right angles to the 
field, and returns along the outside, where the field is negligible. The line integral 
of B for this curve is just B 0 L , and it must be l/e 0 c 2 times the total current through 
r, which is NI if there are N turns of the solenoid in the length L. We have 


B 0 L = 


NI 

e 0 c 2 



Fig. 1 3-9. The magnetic field outside 
of a solenoid. 


Or, letting n be the number of turns per unit length of the solenoid (that is, n — 
N/L ), we get 


nl 

€ 0 C 2 


(13.19) 


What happens to the lines of B when they get to the end of the solenoid? 
Presumably, they spread out in some way and return to enter the solenoid at the 
other end, as sketched in Fig. 13-9. Such a field is just what is observed outside of 
a bar magnet. But what is a magnet anyway ? Our equations say that B comes from 
the presence of currents. Yet we know that ordinary bars of iron (no batteries or 
generators) also produce magnetic fields. You might expect that there should be 
some other terms on the right-hand side of (13.12) or (13.13) to represent “the 
density of magnetic iron” or some such quantity. But there is no such term. Our 
theory says that the magnetic effects of iron come from some internal currents 
which are already taken care of by the j term. 

Matter is very complex when looked at from a fundamental point of view — as 
we saw when we tried to understand dielectrics. In order not to interrupt our pres- 
ent discussion, we will wait until later to deal in detail with the interior mechanisms 
of magnetic materials like iron. You will have to accept, for the moment, that all 
magnetism is produced from currents, and that in a permanent magnet there are 
permanent internal currents. In the case of iron, these currents come from electrons 
spinning around their own axes. Every electron has such a spin, which corresponds 
to a tiny circulating current. Of course, one electron doesn’t produce much mag- 
netic field, but in an ordinary piece of matter there are billions and billions of elec- 
trons. Normally these spin and point every which way, so that there is no net 
effect. The miracle is that in a very few substances, like iron, a large fraction of 
the electrons spin with their axes in the same direction — for iron, two electrons from 
each atom takes part in this cooperative motion. In a bar magnet there are large 
numbers of electrons all spinning in the same direction and, as we will see, their 
total effect is equivalent to a current circulating on the surface of the bar. (This is 
quite analogous to what we found for dielectrics — that a uniformly polarized di- 
electric is equivalent to a distribution of charges on its surface.) It is, therefore, no 
accident that a bar magnet is equivalent to a solenoid. 


13-6 The relativity of magnetic and electric fields 

When we said that the magnetic force on a charge was proportional to its 
velocity, you may have wondered: “What velocity? With respect to which refer- 
ence frame?” It is, in fact, clear from the definition of B given at the beginning of 
this chapter that what this vector is will depend on what we choose as a reference 
frame for our specification of the velocity of charges. But we have said nothing 
about which is the proper frame for specifying the magnetic field. 

It turns out that any inertial frame will do. We will also see that magnetism 
and electricity are not independent things — that they should always be taken to- 
gether as one complete electromagnetic field. Although in the static case Maxwell’s 
equations separate into two distinct pairs, one pair for electricity and one pair for 
magnetism, with no apparent connection between the two fields, nevertheless, in 
nature itself there is a very intimate relationship between them that arises from the 
principle of relativity. Historically, the principle of relativity was discovered after 
Maxwell’s equations. It was, in fact, the study of electricity and magnetism which 
led ultimately to Einstein’s discovery of his principle of relativity. But let’s see 
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Fig. 13-10. The interaction of a current-carrying wire and a particle with the 
charge q as seen in two frames. In frame S (part a), the wire is at rest; in frame 
S' (part b), the charge is at rest. 


what our knowledge of relativity would tell us about magnetic forces if we assume 
that the relativity principle is applicable — as it is — to electromagnetism. 

Suppose we think about what happens when a negative charge moves with 
velocity u 0 parallel to a current-carrying wire, as in Fig. 13-10. We will try to under- 
stand what goes on in two reference frames: one fixed with respect to the wire, 
as in part (a) of the figure, and one fixed with respect to the particle, as in part (b). 
We will call the first frame 5 and the second 5'. 

In the 5-frame, there is clearly a magnetic force on the particle. The force is 
directed toward the wire, so if the charge is moving freely we would see it curve in 
toward the wire. But in the 5'-frame there can be no magnetic force on the particle, 
because its velocity is zero. Does it, therefore, stay where it is? Would we see 
different things happening in the two systems? The principle of relativity would 
say that in S’ we should also see the particle move closer to the wire. We must 
try to understand why that would happen. 

We return to our atomic description of a wire carrying a current. In a normal 
conductor, like copper, the electric currents come from the motion of some of the 
negative electrons — called the conduction electrons — while the positive nuclear 
charges and the remainder of the electrons stay fixed in the body of the material. 
We let the density of the conduction electrons be p_ and their velocity in 5 be v. 
The density of the charges at rest in 5 is p + , which must be equal to the negative 
of p_, since we are considering an uncharged wire. There is thus no electric field 
outside the wire, and the force on the moving particle is just 


F = qv 0 X B. 


Using the result we found in Eq. (13.18) for the magnetic field at the distance 
r from the axis of a wire, we conclude that the force on the particle is directed 
toward the wire and has the magnitude 

F = 1 2Iqv a 

47reoC 3 r 

Using Eqs. (13.4) and (13.5), the current / can be written as p^vA, where A is 
the area of a cross section of the wire. Then 


47T€oC 2 


2 qp^Awo 
r 


(13.20) 


We could continue to treat the general case of arbitrary velocities for v and v 0i 
but it will be just as good to look at the special case in which the velocity v 0 of 
the particle is the same as the velocity v of the conduction electrons. So we write 
v 0 = v, and Eq. (13.20) becomes 


q p—A v 2 
27T6o r c 2 


(13.21) 


Now we turn our attention to what happens in 5', in which the particle is at 
rest and the wire is running past (toward the left in the figure) with the speed v . 
The positive charges moving with the wire will make some magnetic field B f at 
the particle. But the particle is now at rest , so there is no magnetic force on it! 
If there is any force on the particle, it must come from an electric field. It must 
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be that the moving wire has produced an electric field. But it can do that only if it 
appears charged — it must be that a neutral wire with a current appears to be charged 
when set in motion. 

We must look into this. We must try to compute the charge density in the 
wire in S' from what we know about it in 5. One might, at first, think they are the 
same; but we know that lengths are changed between S and S' (see Chapter 15, 
Vol. I), so volumes will change also. Since the charge densities depend on the 
volume occupied by charges, the densities will change, too. 

Before we can decide about the charge densities in S', we must know what 
happens to the electric charge of a bunch of electrons when the charges are moving. 
We know that the apparent mass of a particle changes by l/y/l — v 2 /c 2 . Does 
its charge do something similar? No! Charges are always the same, moving or 
not. Otherwise we would not always observe that the total charge is conserved. 

Suppose that we take a block of material, say a conductor, which is initially 
uncharged. Now we heat it up. Because the electrons have a different mass than 
the protons, the velocities of the electrons and of the protons will change by differ- 
ent amounts. If the charge of a particle depended on the speed of the particle carry- 
ing it, in the heated block the charge of the electrons and protons would no longer 
balance. A block would become charged when heated. As we have seen earlier, a 
very small fractional change in the charge of all the electrons in a block would give 
rise to enormous electric fields. No such effect has ever been observed. 

Also, we can point out that the mean speed of the electrons in matter depends 
on its chemical composition. If the charge on an electron changed with speed, the 
net charge in a piece of material would be changed in a chemical reaction. Again, 
a straightforward calculation shows that even a very small dependence of charge 
on speed would give enormous fields from the simplest chemical reactions. No 
such effect is observed, and we conclude that the electric charge of a single particle 
is independent of its state of motion. 

So the charge q on a particle is an invariant scalar quantity, independent of 
the frame of reference. That means that in any frame the charge density of a 
distribution of electrons is just proportional to the number of electrons per unit 
volume. We need only worry about the fact that the volume can change because 
of the relativistic contraction of distances. 

We now apply these ideas to our moving wire. If we take a length L 0 of the 
wire, in which there is a charge density po of stationary charges, it will contain 
the total charge Q — p 0 L 0 A 0 . If the same charges are observed in a different frame 
to be moving with velocity v, they will all be found in a piece of the material with 
the shorter length 

L - Wl - t>2/c 2 , (13.22) 


but with the same area A 0 (since dimensions transverse to the motion are un- 
changed). See Fig. 13-11. 

If we call p the density of charges in the frame in which they are moving, the 
total charge Q will be pLA 0 . This must also be equal to p 0 L 0 A, because charge is 
the same in any system, so that pL ~ p 0 L 0 or, from (13.22), 


Po 

\/\ — v 2 /c 2 


(13.23) 



Fig. 1 3-1 1 . If a distribution of charged particles at rest has the charge density 
po, the same charges will have the density p = po/Vl — v 2 /c 2 when seen from a 
frame with the relative velocity v. 
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The charge density of a moving distribution of charges varies in the same way as the 
relativistic mass of a particle. 

We now use this general result for the positive charge density p + of our wire. 
These charges are at rest in frame 5. In S', however, where the wire moves with 
the speed v, the positive charge density becomes 


P+ = 


P+ 

\Z\-~v*/c 2 


(13.24) 


The negative charges are at rest in S'. So they have their “rest density” p 0 in 
this frame. In Eq. (13.23) p 0 = P-> because they have the density pL when the 
wire is at rest, i.e., in frame S, where the speed of the negative charges is v. For 
the conduction electrons, we then have that 


or 


p- 

Vl “ v 2 /c 2 

(13.25) 

= P— Vl — v 2 /c 2 . 

(13.26) 


Now we can see why there are electric fields in S' — because in this frame the 
wire has the net charge density p' given by 

P f - P+ + Pa- 
using (13.24) and (13.26), we have 

p' = — + p_V 1 — v 2 fc 2 . 

Vl — v 2 /c 2 

Since the stationary wire is neutral, p_ = — p+, and we have 

2 / 2 
V (C 


P f = P+ 


Vl — v 2 /c 2 


(13.27) 


Our moving wire is positively charged and will produce an electric field E f at the 
external stationary particle. We have already solved the electrostatic problem of a 
uniformly charged cylinder. The electric field at the distance r from the axis of the 
cylinder is 


E' = P ' A = P+^ 2 /c 2 

2TT€or 27r£orVl — v 2 /c 2 


(13.28) 


The force on the negatively charged particle is toward the wire. We have, at least, 
a force in the same direction from the two points of view; the electric force in S' 
has the same direction as the magnetic force in S. 

The magnitude of the force in S' is 


E> = P±A v 2 / c2 

2ire 0 r Vl - v 2 /c 2 


(13.29) 


Comparing this result for F f with our result for F in Eq. (13.21), we see that the 
magnitudes of the forces are almost identical from the two points of view. In fact, 

F' = - - F - , (13.30) 

Vl — v 2 /c 2 


so for the small velocities we have been considering, the two forces are equal. 
We can say that for low velocities, at least, we understand that magnetism and 
electricity are just “two ways of looking at the same thing.” 

But things are even better than that. If we take into account the fact that 
forces also transform when we go from one system to the other, we find that the 
two ways of looking at what happens do indeed give the same physical result for 
any velocity. 
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(b) 

Fig. 13-12. In frame S the charge 
density is zero and the current density is 
/. There is only a magnetic field. In S', 
there is a charge density p', and a differ- 
ent current density /'. The magnetic field 
B' is different and there is an electric 
field 


One way of seeing this is to ask a question like: What transverse momentum 
will the particle have after the force has acted for a little while? We know from 
Chapter 16 of Vol. I that the transverse momentum of a particle should be the same 
in both the 5- and 5' -frames. Calling the transverse coordinate y, we want to 
compare A p y and Ap' y . Using the relativistically correct equation of motion, 
F = dp/dt, we expect that after the time At our particle will have a transverse 
momentum A p y in the 5-system given by 


A p y — FAt. (13.31) 

In the 5'-system, the transverse momentum will be 

Ap' y = F' At'. (13.32) 

We must, of course, compare A p y and A p' y for corresponding time intervals At and 
At', We have seen in Chapter 15 of Vol. I that the time intervals referred to a 
moving particle appear to be longer than those in the rest system of the particle. 
Since our particle is initially at rest in S', we expect, for small At, that 

At' 

At = -- m - > (13.33) 

Vl — v 2 /c 2 

and everything comes out O.K. From (13.31) and (13.32), 

Ap'y _ F' At' 

A p y ~ F At 

which is just = 1 if we combine (13.30) and (13.33). 

We have found that we get the same physical result whether we analyze the 
motion of a particle moving along a wire in a coordinate system at rest with respect 
to the wire, or in a system at rest with respect to the particle. In the first instance, 
the force was purely “magnetic,” in the second, it was purely “electric.” The two 
points of view are illustrated in Fig. 13-12 (although there is still a magnetic field 
B' in the second frame, it produces no forces on the stationary particle). 

If we had chosen still another coordinate system, we would have found a 
different mixture of E and B fields. Electric and magnetic forces are part of one 
physical phenomenon — the electromagnetic interactions of particles. The separa- 
tion of this interaction into electric and magnetic parts depends very much on the 
reference frame chosen for the description. But a complete electromagnetic de- 
scription is invariant; electricity and magnetism taken together are consistent 
with Einstein’s relativity. 

Since electric and magnetic fields appear in different mixtures if we change our 
frame of reference, we must be careful about how we look at the fields E and B. 
For instance, if we think of “lines” of E or B, we must not attach too much reality 
to them. The lines may disappear if we try to observe them from a different co- 
ordinate system. For example, in system S' there are electric field lines, which we 
do not find “moving past us with velocity v in system 5.” In system 5 there are no 
electric field lines at all! Therefore it makes no sense to say something like: When 
I move a magnet, it takes its field with it, so the lines of B are also moved. There 
is no way to make sense, in gf leral, out of the idea of “the speed of a moving field 
line.” The fields are our way of describing what goes on at a point in space. In 
particular, E and B tell us about the forces that will act on a moving particle. The 
question “What is the force on a charge from a moving magnetic field?” doesn’t 
mean anything precise. The force is given by the values of E and B at the charge, 
and the formula (13.1) is not to be altered if the source of E or B is moving (it is 
the values of E and B that will be altered by the motion). Our mathematical de- 
scription deals only with the fields as a function of x, y, z, and t with respect to 
some inertial frame. 

We will later be speaking of “a wave of electric and magnetic fields travelling 
through space,” as, for instance, a light wave. But that is like speaking of a wave 
travelling on a string. We don’t then mean that some part of the string is moving 
13-10 




in the direction of the wave, we mean that the displacement of the string appears 
first at one place and later at another. Similarly, in an electromagnetic wave, the 
wave travels,* but the magnitude of the fields change . So in the future when we — or 
someone else — speaks of a “moving” field, you should think of it as just a handy, 
short way of describing a changing field in some circumstances. 


13-7 The transformation of currents and charges 

You may have worried about the simplification we made above when we 
took the same velocity v for the particle and for the conduction electrons in the 
' wire. We could go back and carry through the analysis again for two different 
velocities, but it is easier to simply notice that charge and current density are the 
components of a four-vector (see Chapter 17, Vol. I). 

We have seen that if Po is the density of the charges in their rest frame, then 
in a frame in which they have the velocity v, the density is 

p = - Po • 

Vl — v 2 /c 2 


In that frame their current density is 


i « pv = 


PqV 

V 1 — v 2 /c 2 


(13.34) 


Now we know that the energy U and momentum p of a particle moving with 
velocity v are given by 

rr m 0 c 2 _ m 0 v 

U = — = 9 P = - Z= 9 

\/l “ V 2 /c 2 Vl — v 2 /c 2 

where m 0 is its rest mass. We also know that U and p form a relativistic four-vector. 
Since p andy depend on the velocity v exactly as do U and p, we can conclude that p 
and j are also the components of a relativistic four-vector. This property is the key 
to a general analysis of the field of a wire moving with any velocity, which we 
would need if we want to do the problem again with the velocity v 0 of the particle 
different from the velocity of the conduction electrons, 

If we wish to transform p and j to a coordinate system moving with a velocity 
u in the x-direction, we know that they transform just like t and (x, y, z), so that 
we have (see Chapter 15, Vol. I) 


. X — ut 

X ~ 9 

./ _ jx “ up 

Vl - h 2 /c 2 

Vl — u 2 /c 2 

X 

II 

Vl 

. & 

II 

z' = z, 

j'z ~ j 21 

, _ t — ux/c 2 

9 — ujx/c 2 

Vl — u 2 /c 2 

Vl — u 2 /c 2 


(13.35) 


With these equations we can relate charges and currents in one frame to those 
in another. Taking the charges and currents in either frame, we can solve the 
electromagnetic problem in that frame by using our Maxwell equations. The 
result we obtain for the motions of particles will be the same no matter which frame 
we choose. We will return at a later time to the relativistic transformations of the 
electromagnetic fields. 


13-8 Superposition; the right-hand rule 

We will conclude this chapter by making two further points regarding the 
subject of magnetostatics. First, our basic equations for the magnetic field, 


V B = 0, 


v X B = j/c 2 e 0 , 
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are linear in B and j. That means that the principle of superposition also applies 
to magnetic fields. The field produced by two different steady currents is the sum 
of the individual fields from each current acting alone. Our second remark con- 
cerns the right-hand rules which we have encountered (such as the right-hand 
rule for the magnetic field produced by a current). We have also observed that the 
magnetization of an iron magnet is to be understood from the spin of the electrons 
in the material. The direction of the magnetic field of a spinning electron is related 
to its spin axis by the same right-hand rule. Because B is determined by a “handed” 
rule — involving either a cross product or a curl — it is called an axial vector. (Vec- 
tors whose direction in space does not depend on a reference to a right or left hand 
are called polar vectors. Displacement, velocity, force, and E t for example, are 
polar vectors.) 

Physically observable quantities in electromagnetism are not , however, right- 
(or left-) handed. Electromagnetic interactions are symmetrical under reflection 
(see Chapter 52, Vol. I). Whenever magnetic forces between two sets of currents are 
computed, the result is invariant with respect to a change in the hand convention. 
Our equations lead, independently of the right-hand convention, to the end result 
that parallel currents attract, or that currents in opposite directions repel. (Try 
working out the force using “left-hand rules.”) An attraction or repulsion is a 
polar vector. This happens because in describing any complete interaction, we 
use the right-hand rule twice — once to find B from currents, again to find the force 
this B produces on a second current. Using the right-hand rule twice is the same 
as using the left-hand rule twice. If we were to change our conventions to a left- 
hand system all our B fields would be reversed, but all forces — or, what is perhaps 
more relevant, the observed accelerations of objects — would be unchanged. 

Although physicists have recently found to their surprise that all the laws of 
nature are not always invariant for mirror reflections, the laws of electromagnetism 
do have such a basic symmetry. 
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14 


The Magnetic Field in Various Situations 


14-1 The vector potential 

In this chapter we continue our discussion of magnetic fields associated with 
steady currents — the subject of magnetostatics. The magnetic field is related to 
electric currents by our basic equations 

V B - 0, (14.1) 

c 2 V Xi = - (14.2) 

Co 

We want now to solve these equations mathematically in a general way, that is, 
without requiring any special symmetry or intuitive guessing. In electrostatics, 
we found that there was a straightforward procedure for finding the field when the 
positions of all electric charges are known: One simply works out the scalar 
potential 4> by taking an integral over the charges — as in Eq. (4.25). Then if one 
wants the electric field, it is obtained from the derivatives of 0 . We will now show 
that there is a corresponding procedure for finding the magnetic field B if we know 
the current density j of all moving charges. 

In electrostatics we saw that (because the curl of E was always zero) it was 
possible to represent E as the gradient of a scalar field 0 . Now the curl of B is not 
always zero, so it is not possible, in general, to represent it as a gradient. However, 
the divergence of B is always zero, and this means that we can always represent B as 
the curl of another vector field. For, as we saw in Section 2-8, the divergence of a 
curl is always zero. Thus we can always relate B to a field we will call A by 

B = V X A. (14.3) 

Or, by writing out the components, 

Bx = { yxA) x = d -^- d -§, 

= (14.4) 

Writing B = V X A guarantees that Eq. (14.1) is satisfied, since, necessarily, 


14-1 The vector potential 

14-2 The vector potential of known 
currents 

14-3 A straight wire 

14-4 A long solenoid 

14-5 The field of a small loop; the 
magnetic dipole 

14-6 The vector potential of a 
circuit 

14-7 The law of Biot and Savart 


V * B = V * (V X A) = 0. 


The field A is called the vector potential . 

You will remember that the scalar potential 0 was not completely specified 
by its definition. If we have found 0 for some problem, we can always find another 
potential 0 ' that is equally good by adding a constant: 

<p' = <f> + C. 

The new potential 0 ' gives the same electric fields, since the gradient VC is zero; 
0 ' and 0 represent the same physics. 

Similarly, we can have different vector potentials A which give the same 
magnetic fields. Again, because B is obtained from A by differentiation, adding a 
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constant to A doesn’t change anything physical. But there is even more latitude 
for A . We can add to A any field which is the gradient of some scalar field, without 
changing the physics. We can show this as follows. Suppose we have an A that 
gives correctly the magnetic field B for some real situation, and ask in what cir- 
cumstances some other new vector potential A f will give the same field B if sub- 
stituted into (14.3). Then A and A ' must have the same curl: 


Therefore 


B = V X A' = V X A. 


V X A' - V X A = V X (A' - A) = 0. 


But if the curl of a vector is zero it must be the gradient of some scalar field, say 
ip, so A f — A ~ V\p. That means that if A is a satisfactory vector potential for a 
problem then, for any \p at all, 

A' = A + VrP (14.5) 

will be an equally satisfactory vector potential, leading to the same field B . 

It is usually convenient to take some of the “latitude” out of A by arbitrarily 
placing some other condition on it (in much the same way that we found it con- 
venient — often — to choose to make the potential <t> zero at large distances). We 
can, for instance, restrict A by choosing arbitrarily what the divergence of A must 
be. We can always do that without affecting B. This is because although A f and 
A have the same curl, and give the same B, they do not need to have the same 
divergence. In fact, V * A f = V * A + V V, and by a suitable choice of \p we can 
make V • A f anything we wish. 

What should we choose for V • A? The choice should be made to get the 
greatest mathematical convenience and will depend on the problem we are doing. 
For magnetostatics , we will make the simple choice 

V A = 0. (14.6) 


(Later, when we take up electrodynamics, we will change our choice.) Our complete 
definition* of A is then, for the moment, V X A = B and V • A — 0. 

To get some experience with the vector potential, let’s look first at what it is 
for a uniform magnetic field B 0 . Taking our z-axis in the direction of B 0 , we must 
have 


dA z _ 

dAy 

dy 

dz 

dA x _ 

dA z 

dz 

dx 

a 

<■0 

dA x 

dx 

dy 


(14.7) 


By inspection, we see that one possible solution of these equations is 
Ay = xB o, Ax == 0 , A 2, = 0 . 

Or we could equally well take 

A x = yB 0 , Ay = 0 , A z = 0 . 

Still another solution is a linear combination of the two: 

A x = —hyB o, Ay = %xB 0 , A z = 0. (14.8) 


* Our definition still does not uniquely determine A. For a unique specification we 
would also have to say something about how the field A behaves on some boundary, or 
at large distances. It is sometimes convenient, for example, to choose a field which 
goes to zero at large distances. 
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It is clear that for any particular field B , the vector potential A is not unique; 
there are many possibilities. 

The third solution, Eq. (14.8), has some interesting properties. Since the 
^-component is proportional to —y and the ^-component is proportional to +x, 
A must be at right angles to the vector from the z-axis, which we will call r' (the 
“prime” is to remind us that it is not the vector displacement from the origin). 
Also, the magnitude of A is proportional to y/x 2 + y 2 and, hence, to r f . So A 
can be simply written (for our uniform field) as 


A = iB X r'. 


(14.9) 


The vector potential A has the magnitude Br'/2 and rotates about the z-axis as 
shown in Fig. 14-1. If, for example, the B field is the axial field inside a solenoid, 
then the vector potential circulates in the same sense as do the currents of the 
solenoid. 

The vector potential for a uniform field can be obtained in another way. 
The circulation of A on any closed loop T can be related to the surface integral of 
V X A by Stokes’ theorem, Eq. (3.38): 


£ A * ds = j (V X A) • n da. 


(14.10) 


inside T 

But the integral on the right is equal to the flux of B through the loop, so 

£ A * ds = j B-nda. (14.11) 

inside r 


So the circulation of A around any loop is equal to the flux of B through the loop. 
If we take a circular loop, of radius r f in a plane perpendicular to a uniform field 
B , the flux is just 

7i r f2 B. 


If we choose our origin on an axis of symmetry, so that we can take A as circum- 
ferential and a function only of r', the circulation will be 


We get, as before, 


£ A * ds — lirr'A = irr ,2 B. 


A 


Bd_ 

2 


In the example we have just given, we have calculated the vector potential from 
the magnetic field, which is opposite to what one normally does. In complicated 
problems it is usually easier to solve for the vector potential, and then determine 
the magnetic field from it. We will now show how this can be done. 


14-2 The vector potential of known currents 

Since B is determined by currents, so also is A . We want now to find A in 
terms of the currents. We start with our basic equation (14.2): 

c 2 V X B = j - . 

Co 

which means, of course, that 

c 2 V X (V X A) = J- • (14.12) 

Co 

This equation is for magnetostatics what the equation 

v . v* = _ L (14.13) 

Co 


y 



Fig. 14-1. A uniform magnetic field 
B in the z-direction corresponds to a 
vector potential A that rotates about the 
z-axis, with the magnitude A = Br / 2 
(r 1 is the displacement from the z-axis). 


was for electrostatics. 
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Our equation (14.12) for the vector potential looks even more like that for 
0 if we rewrite V X (V X A) using the vector identity Eq. (2.58): 

V X (V X A) = V(V A) - V 2 A . (14.14) 


Since we have chosen to make V • A = 0 (and now you see why), Eq. (14.12) 
becomes 


V 2 A = - 


J 

€ 0 C 2 


(14.15) 


This vector equation means, of course, three equations: 



Fig. 1 4-2. The vector potential A at 
point 1 is given by an integral over the 
current elements jdV at all points 2. 


V 2 A x 



V 2 Ay 


Jy 

e 0 c 2 ’ 




And each of these equations is mathematically identical to 


V 2 </> 


p., 

€ 0 


Jz 

€ 0 C 2 


(14.16) 


(14.17) 


All we have learned about solving for potentials when p is known can be used for 
solving for each component of A when j is known! 

We have seen in Chapter 4 that a general solution for the electrostatic equation 
(14.17) is 


* (1) = 


1 f p(2) dV 2 

47T€ 0 J r X 2 


So we know immediately that a general solution for A x is 


A x (\) 


1 f U2)dV 2 
47T€ 0 c 2 J r X2 


(14.18) 


and similarly for A y and A z . (Figure 14-2 will remind you of our conventions for 
r 12 and dV 2 .) We can combine the three solutions in the vector form 


^(D = 


1 f j(2) dV 2 

47 T€ 0 c 2 J r \2 


(14.19) 


(You can verify if you wish, by direct differentiation of components, that this inte- 
gral for A satisfies V ■ A = 0 so long as V •/ — 0, which, as we saw, must happen 
for steady currents.) 

We have, then, a general method for finding the magnetic field of steady cur- 
rents. The principle is: the jc-component of vector potential arising from a current 
density j is the same as the electric potential <j> that would be produced by a charge 
density p equal to j x /c 2 — and similarly for the y- and z-components. (This principle 
works only with components in fixed directions. The “radial” component of A 
does not come in the same way from the “radial” component of y, for example.) 
So from the vector current density y, we can find A using Eq. (14.19) — that is, we 
find each component of A by solving three imaginary electrostatic problems for 
the charge distributions p x = j x /c 2 , p 2 = j y /c 2 , and p 3 = j z /c 2 . Then we get 
B by taking various derivatives of A to obtain V X A. It’s a little more compli- 
cated than electrostatics, but the same idea. We will now illustrate the theory by 
solving for the vector potential in a few special cases. 


14-3 A straight wire 

For our first example, we will again find the field of a straight wire — which we 
solved in the last chapter by using Eq. (14.2) and some arguments of symmetry. 
We take a long straight wire of radius a, carrying the steady current /. Unlike the 
charge on a conductor in the electrostatic case, a steady current in a wire is uni- 
formly distributed throughout the cross section of the wire. If we choose our 
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coordinates as shown in Fig. 14-3, the current density vector j has only a z-com- 
ponent. Its magnitude is 


jz 


I 

wa 2 


(14.20) 


inside the wire, and zero outside. 

Since j x and j y are both zero, we have immediately 


A x = 0, Ay = 0. 


To get A g we can use our solution for the electrostatic potential <f> of a wire with a 
uniform charge density p = j z /c 2 . For points outside an infinite charged cylinder, 
the electrostatic potential is 


« 


2t€o 


In r f . 


where r' = yjx 2 + y 2 and X is the charge per unit length, ira 2 p. So A z must be 


A z 


Tra 2 jz 

lirtoC 2 


lnr 7 


for points outside a long wire carrying a uniform current. Since ira 2 j z = /, we 
can also write 


A z 


I 

2tT€qC 2 


In r'. 


(14.21) 


Now we can find B from (14.4). There are only two of the six derivatives that 
are not zero. We get 


B x = - 


27re 0 c 2 dy 


In /*' — — 


lireoC 2 dx 


In r f = 


27reoC 2 r' 2 
/ x 


27T€ 0 c 2 r* 2 


(14.22) 

(14.23) 


Bg = 0. 


We get the same result as before: B circles around the wire, and has the magnitude 


B = 


1 21 
4ir€oC 2 r f 


(14.24) 


14-4 A long solenoid 

Next, we consider again the infinitely long solenoid with a circumferential 
current on the surface of nl per unit length. (We imagine there are n turns of wire 
per unit length, carrying the current /, and we neglect the slight pitch of the winding.) 

Just as we have defined a “surface charge density” we define here a “sur- 
face current density” J equal to the current per unit length on the surface of the 
solenoid (which is, of course, just the average j times the thickness of the thin 
winding). The magnitude of / is, here, nL This surface current (see Fig. 14-4) has 
the components. 

J x = — Jsin<j>, J y = 7cos0, J z = 0. 

Now we must find A for such a current distribution. 

First, we wish to find A x for points outside the solenoid. The result is the same 
as the electrostatic potential outside a cylinder with a surface charge 

<7 — c o sin <£, 

with <Xq = J/c 2 . We have not solved such a charge distribution, but we have done 
something similar. This charge distribution is equivalent to two solid cylinders of 
charge, one positive and one negative, with a slight relative displacement of their 
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Fig. 14-3. A long cylindrical wire 
along the z-axis with a uniform current 
density /. 



Fig. 14-4. A long solenoid with a 
surface current density J. 



i > 


Fig. 14-5. A rotating charged cylin- 
der produces a magnetic field inside. A 
short radial wire rotating with the cylinder 
has charges induced on its ends. 


axes in the ^-direction. The potential of such a pair of cylinders is proportional 
to the derivative with respect to y of the potential of a single uniformly charged 
cylinder. We could work out the constant of proportionality, but let’s not worry 
about it for the moment. 

The potential of a cylinder of charge is proportional to In r ' ; the potential 
of the pair is then 


So we know that 


d In r f 
dy r ' 2 


A x 



(14.25) 


where K is some constant. Following the same argument, we would find 

A y = K±- (14.26) 


Although we said before that there was no magnetic field outside a solenoid, we 
find now that there is an ^4-field which circulates around the z-axis, as in Fig. 14-4. 
The question is: Is its curl zero? 

Clearly, B x and B y are zero, and 



So the magnetic field outside a very long solenoid is indeed zero, even though the 
vector potential is not. 

We can check our result against something else we know: The circulation of 
the vector potential around the solenoid should be equal to the flux of B inside the 
coil (Eq. 14.11). The circulation is A • lirr' or, since A = K/r\ the circulation is 
2 ttAT. Notice that it is independent of r\ That is just as it should be if there is no 
B outside, because the flux is just the magnitude of B inside the solenoid times 
ira 2 . It is the same for all circles of radius r > a. We have found in the last chapter 
that the field inside is nl/e 0 c 2 , so we can determine the constant K: 


^ v 2 nl 
2tK = ira — 5 > 
€ 0 c 2 

or 

nla 2 

K ~l^ 2 ' 

So the vector potential outside has the magnitude 

. nla 2 1 

A ~ 2^?’ 


(14.27) 


and is always perpendicular to the vector #*'. 

We have been thinking of a solenoidal coil of wire, but we would produce 
the same fields if we rotated a long cylinder with an electrostatic charge on the 
surface. If we have a thin cylindrical shell of radius a with a surface charge <r, 
rotating the cylinder makes a surface current / = av, where v = ao) is the velocity 
of the surface charge. There will then be a magnetic field B = <rao)/e 0 c 2 inside 
the cylinder. 

Now we can raise an interesting question. Suppose we put a short piece of 
wire W perpendicular to the axis of the cylinder, extending from the axis out to 
the surface, and fastened to the cylinder so that it rotates with it, as in Fig. 14-5. 
This wire is moving in a magnetic field, so the v X B forces will cause the ends of 
the wire to be charged (they will charge up until the F-field from the charges just 
balances the v X B force). If the cylinder has a positive charge, the end of the wire 
at the axis will have a negative charge. By measuring the charge on the end of the 
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wire, we could measure the speed of rotation of the system. We would have an 
“angular-velocity meter”! 

But are you wondering: “What if I put myself in the frame of reference of the 
rotating cylinder? Then there is just a charged cylinder at rest, and I know that the 
electrostatic equations say there will be no electric fields inside, so there will be no 
force pushing charges to the center. So something must be wrong.” But there is 
nothing wrong. There is no “relativity of rotation.” A rotating system is not an 
inertial frame, and the laws of physics are different. We must be sure to use equa- 
tions of electromagnetism only with respect to inertial coordinate systems. 

It would be nice if we could measure the absolute rotation of the earth with 
such a charged cylinder, but unfortunately the effect is much too small to observe 
even with the most delicate instruments now available. 


14-5 The field of a small loop; the magnetic dipole 

Let’s use the vector-potential method to find the magnetic field of a small 
loop of current. As usual, by “small” we mean simply that we are interested in 
the fields only at distances large compared with the size of the loop. It will turn 
out that any small loop is a “magnetic dipole.” That is, it produces a magnetic 
field like the electric field from an electric dipole. 



Fig. 14-6. A rectangular loop of wire with the 
current /. What is the magnetic field at P? [R a, or b.) 


Fig. 14-7. The distribution of / x in 
the current loop of Fig. 1 4-6. 


We take first a rectangular loop, and choose our coordinates as shown in 
Fig. 14-6. There are no currents in the z-direction, so A z is zero. There are currents 
in the x-direction on the two sides of length a . In each leg, the current density 
(and current) is uniform. So the solution for A x is just like the electrostatic po- 
tential from two charged rods (see Fig. 14-7). Since the rods have opposite charges, 
their electric potential at large distances would be just the dipole potential (Section 
6-5). At the point P in Fig. 14-6, the potential would be 


<t> = 


1 P ' e R 9 
47 T€ 0 R 2 


(14.28) 


where p is the dipole moment of the charge distribution. The dipole moment, in 
this case, is the total charge on one rod times the separation between them: 


p = Xab. 


(14.29) 


The dipole moment points in the negative ^-direction, so the cosine of the angle 
between R and p is —y/R (where y is the coordinate of P ). So we have 


_ 1 Xab y 

0 “ 4tt€o R 2 R 

We get A x simply by replacing X by 7/c 2 : 

A = Iab y 
4jre 0 c2 & 


(14.30) 
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By the same reasoning, 



I 


Fig. 14-8. The vector potential of a 
small current loop at the origin (in the 
xy-plane); a magnetic dipole field. 


_ lab x 
V ~~ 47T€ 0 C 2 R s 


(14.31) 


Again, A v is proportional to x and A x is proportional to —y, so the vector potential 
(at large distances) goes in circles around the z-axis, circulating in the same sense 
as / in the loop, as shown in Fig. 14-8. 

The strength of A is proportional to lab , which is the current times the area 
of the loop. This product is called the magnetic dipole moment (or, often, just 
“magnetic moment”) of the loop. We represent it by p: 

p = lab . (14.32) 


The vector potential of a small plane loop of any shape (circle, triangle, etc.) is 
also given by Eqs. (14.30) and (14.31) provided we replace lab by 

p = /* (area of loop). (14.33) 

We leave the proof of this to you. 

We can put our equation in vector form if we define the direction of the vector 
p to be the normal to the plane of the loop, with a positive sense given by the right- 
hand rule (Fig. 14-8). Then we can write 


1 p X R __ 1 p X en 

4tT€ 0 C 2 ~W~ ~ 4tT€ 0 c2 *2 


(14.34) 


We have still to find B . Using (14.33) and (14.34), together with (14.4), we get 


d p x _ 3xz 
dz 4ireoC 2 R 3 R 5 


(where by ... we mean ju/47r€ 0 c 2 ), 



(14.35) 


(14.36) 


The components of the U-field behave exactly like those of the ZT-field for a 
dipole oriented along the z-axis. (See Eqs. (6.14) and (6.15); also Fig. 6-5.) 
That’s why we call the loop a magnetic dipole. The word “dipole” is slightly 
misleading when applied to a magnetic field because there are no magnetic “poles” 
that correspond to electric charges. The magnetic “dipole field” is not produced 
by two “charges,” but by an elementary current loop. 

It is curious, though, that starting with completely different laws, V ■ E = p/e 0 
and VX5 = y/€ 0 c 2 , we can end up with the same kind of a field. Why should 
that be? It is because the dipole fields appear only when we are far away from 
all charges or currents. So through most of the relevant space the equations for 
E and B are identical: both have zero divergence and zero curl. So they give the 
same solutions. However, the sources whose configuration we summarize by the 
dipole moments are physically quite different — in one case, it’s a circulating cur- 
rent; in the other, a pair of charges, one above and one below the plane of the loop 
for the corresponding field. 


14-6 The vector potential of a circuit 

We are often interested in the magnetic fields produced by circuits of wire in 
which the diameter of the wire is very small compared with the dimensions of the 
whole system. In such cases, we can simplify the equations for the magnetic field. 
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For a thin wire we can write our volume element as 


dV = Sds , 

where S is the cross-sectional area of the wire and ds is the element of distance 
along the wire. In fact, since the vector ds is in the same direction as 7, as shown in 
Fig. 14-9 (and we can assume that j is constant across any given cross section), 
we can write a vector equation: 

jdV = jSds . (14.37) 

But jS is just what we call the current / in a wire, so our integral for the vector 
potential (14.19) becomes 

7TT (,4 ' 38) 

(see Fig. 14-10). (We assume that / is the same throughout the circuit. If there are 
several branches with different currents, we should, of course, use the appropriate 
/ for each branch.) 

Again, we can find the fields from (14.38) either by integrating directly or by 
solving the corresponding electrostatic problems. 


14-7 The law of Biot and Savart 

In studying electrostatics we found that the electric field of a known charge 
distribution could be obtained directly with an integral (Eq. 4-16): 


As we have seen, it is usually more work to evaluate this integral — there are really 
three integrals, one for each component — than to do the integral for the potential 
and take its gradient. 

There is a similar integral which relates the magnetic field to the currents. 
We already have an integral for A , Eq. (14.19); we can get an integral for B by 
taking the curl of both sides: 


B( 1) - V X A( 1) = V X 


1 f K2)dV 2 
4t e 0 c 2 J r X2 


(14.39) 


Now we must be careful: The curl operator means taking the derivatives of 
A( 1), that is, it operates only on the coordinates (x x , y u z{). We can move the 
V X operator inside the integral sign if we remember that it operates only on 
variables with the subscript 1, which of course, appear only in 

r X 2 — [(pa — x 2 ) 2 + 0>i — y 2 ) 2 + ( z i ~ z 2 ) 2 ] 112 - (14.40) 

We have, for the x-component of B , 

12 J 


B x = 


dA t 

dyi 

1 


dAy 

dzi 


J[ J ‘skQr) 

4jre 0 c 2 J (/ 


47T€ 0 C 2 

1 


y 1 - y 2 


12 


The quantity in brackets is just the x-component of 


3 X r 12 _ j X e 12 




12 


r 2 

r l2 



Fig. 1 4-9. For a fine wire / dV is the 
same as / ds . 



Fig. 14-10. The magnetic field of a 
wire can be obtained from an integral 
around the circuit. 
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(14.42) 


Corresponding results will be found for the other components, so we have 


B(l) = 


1 f 7(2) X *i2 

47T€ 0 c 2 J r\ 2 


dV 2 . 


The integral gives B directly in terms of the known currents. The geometry in- 
volved is the same as that shown in Fig. 14-2. 

If the currents exist only in circuits of small wires we can, as in the last section, 
immediately do the integral across the wire, replacing j dV by Ids , where ds is an 
element of length of the wire. Then, using the symbols in Fig. 14-10, 


B(l) = 


1 f Ie 12 X ds 2 
47 T€ 0 C 2 J r\ % 


(14.43) 


(The minus sign appears because we have reversed the order of the cross product.) 
This equation for B is called the Biot-Savart law , after its discoverers. It gives a 
formula for obtaining directly the magnetic field produced by wires carrying 
currents. 

You may wonder: “What is the advantage of the vector potential if we can 
find B directly with a vector integral? After all, A also involves three integrals!” 
Because of the cross product, the integrals for B are usually more complicated, as 
is evident from Eq. (14.41). Also, since the integrals for A are like those of electro- 
statics, we may already know them. Finally, we will see that in more advanced 
theoretical matters (in relativity, in advanced formulations of the laws of me- 
chanics, like the principle of least action to be discussed later, and in quantum 
mechanics) the vector potential plays an important role. 
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15-1 The forces on a current loop; energy of a dipole 

In the last chapter we studied the magnetic field produced by a small rec- 
tangular current loop. We found that it is a dipole field, with the dipole moment 
given by 

M = IA , (15.1) 

where I is the current and A is the area of the loop. The direction of the moment 
is normal to the plane of the loop, so we can also write 

/i = JAn, 

where n is the unit normal to the area A . 

A current loop — or magnetic dipole — not only produces magnetic fields, but 
will also experience forces when placed in the magnetic field of other currents. 
We will look first at the forces on a rectangular loop in a uniform magnetic field* 
Let the 2 -axis be along the direction of the field, and the plane of the loop be 
placed through the ;>-axis, making the angle 9 with the xy-plane as in Fig. 15-1. 
Then the magnetic moment of the loop — which is normal to its plane — will make 
the angle 9 with the magnetic field. 

Since the currents are opposite on opposite sides of the loop, the forces are 
also opposite, so there is no net force on the loop (when the field is uniform). 
Because of forces on the two sides marked 1 and 2 in the figure, however, there is a 
torque which tends to rotate the loop about the j>-axis. The magnitude of these 
forces F x and F 2 is 

F x = F 2 = IBb. 

Their moment arm is 

a sin 9, 

so the torque is 

r = lab B sin 9 , 

or, since lab is the magnetic moment of the loop, 

r = fxB sin 9 . 

The torque can be written in vector notation: 

r = fiX B. (15.2) 

Although*we have only shown that the torque is given by Eq. (15.2) in one rather 
special case, the result is right for a small loop of any shape, as we will see. You will 
remember that we found the same kind of relation for the torque on an electric 
dipole: 

r = p X E. 

We now ask about the mechanical energy of our current loop. Since there is 
a torque, the energy evidently depends on the orientation. The principle of virtual 
work says that the torque is the rate of change of energy with angle, so we can write 

dU = — t d9. 


15-1 The forces on a current loop; 
energy of a dipole 

15-2 Mechanical and electrical 
energies 

15-3 The energy of steady currents 

15-4 B versus A 

15-5 The vector potential and 
quantum mechanics 

15-6 What is true for statics is 
false for dynamics 



Fig. 15-1. A rectangular loop carry- 
ing the current / sits in a uniform field S 
(in the z-direction). The torque on the 
loop is r = ft X B, where the magnetic 
moment ju = lab. 
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Setting r = —jjlB sin 0, and integrating, we can write for the energy 

U = —fiB cos 6 + a constant. (15.3) 

(The sign is negative because the torque tries to line up the moment with the field ; 
the energy is lowest when \x and B are parallel.) 

For reasons which we will discuss later, this energy is not the total energy of a 
current loop. (We have, for one thing, not taken into account the energy required 
to maintain the current in the loop.) We will, therefore, call this energy U me c h, 
to remind us that it is only part of the energy. Also, since we are leaving out some 
of the energy anyway, we can set the constant of integration equal to zero in Eq. 
(15.3). So we rewrite the equation: 


tfmech - “M’S. (15.4) 

Again, this corresponds to our result for an electric dipole: 

U = - pE . (15.5) 


Now the electrostatic energy C/inEq. (15.5) is the true energy, but U mec h in 
(15.4) is not the real energy. It can , however, be used in computing forces, by the 
principle of virtual work, supposing that the current in the loop — or at least m — is 
kept constant. 

We can show for our rectangular loop that U me C h also corresponds to the 
mechanical work done in bringing the loop into the field. The total force on the 
loop is zero only in a uniform field; in a nonuniform field there are net forces on a 
current loop. In putting the loop into a region with a field, we must have gone 
through places where the field was not uniform, and so work was done. To make 
the calculation simple, we shall imagine that the loop is brought into the field with 
its moment pointing along the field. (It can be rotated to its final position after it 
is in place.) 

Imagine that we want to move the loop in the x-direction — toward a region of 
stronger field — and that the loop is oriented as shown in Fig. 15-2. We start 
somewhere where the field is zero and integrate the force times the distance as we 
bring the loop into the field. 


Fig. 15-2. A loop is carried 
the x-direction through the field 
right angles to x. 



First, let’s compute the work done on each side separately and then take the 
sum (rather than adding the forces before integrating). The forces on sides 3 and 4 
are at right angles to the direction of motion, so no work is done on them. The 
force on side 2 is IbB(x) in the x-direction, and to get the work done against the 
magnetic forces we must integrate this from some x where the field is zero, say at 
x = — oo, to x 2 , its present position: 

W 2 = - P F 2 dx = -lb P B(x) dx. (15.6) 

J oo J — oo 

Similarly, the work done against the forces on side 1 is 

Wi = - r F 1 dx = lb P B(x) dx. (15.7) 
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To find each integral, we need to know how B(x ) depends on x. But notice that 
side 1 follows along right behind side 2, so that its integral includes most of the 
work done on side 2. In fact, the sum of (15.6) and (15.7) is just 

w = -lb f 2 B( x) dx. (15.8) 

JX i 

But if we are in a region where B is nearly the same on both sides 1 and 2, we can 
write the integral as 

f x 2 

I B(x)dx = (, x 2 — Xi )B = aB , 

Jxi 

where B is the field at the center of the loop. The total mechanical energy we have 
put in is 

Umech = w = -lab B = -/iA (15.9) 

The result agrees with the energy we took for Eq. (15.4). 

We would, of course, have gotten the same result if we had added the forces 
on the loop before integrating to find the work. If we let Bi be the field at side 1 
and B 2 be the field at side 2, then the total force in the ^-direction is 


F x = lb(B 2 - BJ. 

If the loop is “small,” that is, if B 2 and B i are not too different, we can write 


So the force is 


B * = Bi + !i A * = Bi + !§ a - 


F x = lab 


3B 

dx 


(15.10) 


The total work done on the loop by external forces is 



—lab J ~dx 


—IabB, 


which is again just — pB. Only now we see why it is that the force on a small 
current loop is proportional to the derivative of the magnetic field, as we would 
expect from 

F x Ax = -At/mech =-A(-M’J*). (15.11) 


Our result, then, is that even though U mec h = —p * B may not include all the 
energy of a system — it is a fake kind of energy — it can still be used with the principle 
of virtual work to find the forces on steady current loops. 


15-2 Mechanical and electrical energies 

We want now to show why the energy £/ meC h discussed in the previous section 
is not the correct energy associated with steady currents — that it does not keep 
track of the total energy in the world. We have, indeed, emphasized that it can 
be used like the energy, for computing forces from the principle of virtual work, 
provided that the current in the loop (and all other currents) do not change. Let’s 
see why all this works. 

Imagine that the loop in Fig. 15-2 is moving in the +x-direction and take the 
z-axis in the direction of B. The conduction electrons in side 2 will experience a 
force along the wire, in the ^-direction. But because of their flow — as an electric 
current — there is a component of their motion in the same direction as the force. 
Each electron is, therefore, having work done on it at the rate F y v y , where v y is the 
component of the electron velocity along the wire. We will call this work done on 
the electrons electrical work. Now it turns out that if the loop is moving in a 
uniform field, the total electrical work is zero, since positive work is done on some 
parts of the loop and an equal amount of negative work is done on other parts. 
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But this is not true if the circuit is moving in a nonuniform field — then there will 
be a net amount of work done on the electrons. In general, this work would tend 
to change the flow of the electrons, but if the current is being held constant, energy 
must be absorbed or delivered by the battery or other source that is keeping the 
current steady. This energy was not included when we computed C/ mec h in Eq. 
(15.9), because our computations included only the mechanical forces on the body 
of the wire. 

You may be thinking: But the force on the electrons depends on how fast 
the wire is moved ; perhaps if the wire is moved slowly enough this electrical energy 
can be neglected. It is true that the rate at which the electrical energy is delivered 
is proportional to the speed of the wire, but the total energy delivered is propor- 
tional also to the time that this rate goes on. So the total electrical energy is pro- 
portional to the velocity times the time, which is just the distance moved. For a 
given distance moved in a field the same amount of electrical work is done. 

Let’s consider a segment of wire of unit length carrying the current / and mov- 
ing in a direction perpendicular to itself and to a magnetic field B with the speed 
y w ire* Because of the current the electrons will have a drift velocity thrift along the 
wire. The component of the magnetic force on each electron in the direction of the 
drift is q e v w ire E. So the rate at which electrical work is being done is FVdrift = 
(^wire^drift- If there are N conduction electrons in the unit length of the wire, 
the total rate at which electrical work is being done is 

= Nq e v wile Bv dtiH . 


But Nq e v dT at = /, the current in the wire, so 


dU& lect 

dt 


JVvtiT&B. 


Now since the current is held constant, the forces on the conduction electrons 
do not cause them to accelerate; the electrical energy is not going into the electrons 
but into the source that is keeping the current constant. 

But notice that the force on the wire is IB , so IBv wlTe is also the rate of me- 
chanical work done on the wire, dUm^/dt — /E*v ire - We conclude that the 
mechanical work done on the wire is just equal to the electrical work done on the 
current source, so the energy of the loop is a constantl 

This is not a coincidence, but a consequence of the law we already know. 
The total force on each charge in the wire is 


F = q(E + v X B). 
The rate at which work is done is 


v • F = q[v • E + v • (v X B)]. (15.12) 

If there are no electric fields we have only the second term, which is always zero. 
We shall see later that changing magnetic fields produce electric fields, so our 
reasoning applies only to moving wires in steady magnetic fields. 

How is it then that the principle of virtual work gives the right answer? 
Because we still have not taken into account the total energy of the world. We have 
not included the energy of the currents that are producing the magnetic field we 
start out with. 

Suppose we imagine a complete system such as that drawn in Fig. 15-3 (a), in 
which we are moving our loop with the current I\ into the magnetic field B\ pro- 
duced by the current I 2 in a coil. Now the current f in the loop will also be pro- 
ducing some magnetic field B 2 at the coil. If the loop is moving, the field B 2 will 
be changing. As we shall see in the next chapter, a changing magnetic field gen- 
erates an E-field; and this E-field will do work on the charges in the coil. This 
energy must also be included in our balance sheet of the total energy. 
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Fig. 15-3. Finding the energy of a small loop in a magnetic field. 


We could wait until the next chapter to find out about this new energy term, 
but we can also see what it will be if we use the principle of relativity in the following 
way. When we are moving the loop toward the stationary coil we know that its 
electrical energy is just equal and opposite to the mechanical work done. So 


^raech ^elect(lOOp) 0. 


Suppose now we look at what is happening from a different point of view, 
in which the loop is at rest, and the coil is moved toward it. The coil is then moving 
into the field produced by the loop. The same arguments would give that 


Cmech. T" ^elect(coil) 0. 


The mechanical energy is the same in the two cases because it comes from the force 
between the two circuits. 

The sum of the two equations gives 


2f/mech “b ^elect(loop) + C e ] ec t(coil) 0. 


The total energy of the whole system is, of course, the sum of the two electrical 
energies plus the mechanical energy taken only once . So we have 

^total = ^elect(lOOp) H - C/ e i ec t(coil) ~f“ ^Tmech = ^mech* (15.13) 

The total energy of the world is really the negative of U mec h. If we want the 
true energy of a magnetic dipole, for example, we should write 

Ctotal ~ +M ' B* 


It is only if we make the condition that all currents are constant that we can use 
only a part of the energy, t/ mec h (which is always the negative of the true energy), 
to find the mechanical forces. In a more general problem, we must be careful to 
include all energies. 

We have seen an analogous situation in electrostatics. We showed that the 
energy of a capacitor is equal to Q % /2C. When we use the principle of virtual work 
to find the force between the plates of the capacitor, the change in energy is equal 
to Q 2 / 2 times the change in 1/C. That is, 



AC 
2 C 2 


(15.14) 


Now suppose that we were to calculate the work done in moving two con- 
ductors subject to the different condition that the voltage between them is held 
constant. Then we can get the right answers for force from the principle of virtual 
work if we do something artificial. Since Q = CF, the real energy is JCF 2 . But 
if we define an artificial energy equal to — JCF 2 , then the principle of virtual work 
can be used to get forces by setting the change in the artificial energy equal to the 
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Fig. 15-4. The energy of a large 
loop in a magnetic field can be considered 
as the sum of energies of smaller loops. 


mechanical work, provided that we insist that the voltage V be held constant. Then 

/ cv 2 \ V 2 

m nech = A ( - = — ~ AC, (15.15) 


which is the same as Eq. (15.14). We get the correct result even though we are 
neglecting the work done by the electrical system to keep the voltage constant. 
Again, this electrical energy is just twice as big as the mechanical energy and of 
the opposite sign. 

Thus if we calculate artificially, disregarding the fact that the source of the 
potential has to do work to maintain the voltages constant, we get the right answer. 
It is exactly analogous to the situation in magnetostatics. 


15-3 The energy of steady currents 

We can now use our knowledge that U tot& \ = — i/ m ech to find the true energy 
of steady currents in magnetic fields. We can begin with the true energy of a small 
current loop. Calling J7 t otai just U y we write 

U = pB. (15.16) 

Although we calculated this energy for a plane rectangular loop, the same result 
holds for a small plane loop of any shape. 

We can find the energy of a circuit of any shape by imagining that it is made 
up of small current loops. Say we have a wire in the shape of the loop r of Fig. 
15-4. We fill in this curve with the surface S, and on the surface mark out a large 
number of small loops, each of which can be considered plane. If we let the current 
/circulate around each of the little loops, the net result will be the same as a current 
around T, since the currents will cancel on all lines internal to T. Physically, the 
system of little currents is indistinguishable from the original circuit. The 
energy must also be the same, and so is just the sum of the energies of the little loops. 

If the area of each little loop is A a, its energy is I AaB n , where B n is the com- 
ponent normal to A a. The total energy is 

u = X) I B „ Aa - 

Going to the limit of infinitesimal loops, the sum becomes an integral, and 

U = ljB n da = ijBnda, (15.17) 

where n is the unit normal to da . 

If we set B = V X A f we can connect the surface integral to a line integral, 
using Stokes’ theorem, 

lj(y X A) It da = if A' da, (15.18) 

where ds is the line element along T. So we have the energy for a circuit of any 
shape: 

U = / 6 A * ds. (15.19) 

circuit 

In this expression A refers, of course, to the vector potential due to those currents 
(other than the I in the wire) which produce the field B at the wire. 

Now any distribution of steady currents can be imagined to be made up of 
filaments that run parallel to the lines of current flow. For each pair of such circuits, 
the energy is given by (15.19), where the integral is taken around one circuit, using 
the vector potential A from the other circuit. For the total energy we want the 
sum of all such pairs. If, instead of keeping track of the pairs, we take the complete 
sum over all the filaments, we would be counting the energy twice (we saw a 
similar effect in electrostatics), so the total energy can be written 

U = ifj'A dV. (15.20) 
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This formula corresponds to the result we found for the electrostatic energy: 

U=ifp<t>dV. (15.21) 

So we may if we wish think of A as a kind of potential energy for currents in 
magnetostatics. Unfortunately, this idea is not too useful, because it is true only 
for static fields. In fact, neither of the equations (15.20) and (15.21) gives the cor- 
rect energy when the fields change with time. 

15-4 B versus A 

In this section we would like to discuss the following questions: Is the vector 
potential merely a device which is useful in making calculations — as the scalar 
potential is useful in electrostatics — or is the vector potential a “real” field? Isn’t 
the magnetic field the “real” field, because it is responsible for the force on a 
moving particle? First we should say that the phrase “a real field” is not very 
meaningful. For one thing, you probably don’t feel that the magnetic field is 
very “real” anyway, because even the whole idea of a field is a rather abstract thing. 
You cannot put out your hand and feel the magnetic field. Furthermore, the value 
of the magnetic field is not very definite; by choosing a suitable moving coordinate 
system, for instance, you can make a magnetic field at a given point disappear. 

What we mean here by a “real” field is this: a real field is a mathematical 
function we use for avoiding the idea of action at a distance. If we have a charged 
particle at the position P, it is affected by other charges located at some distance 
from P. One way to describe the interaction is to say that the other charges make 
some “condition” — whatever it may be — in the environment at P . If we know 
that condition, which we describe by giving the electric and magnetic fields, then 
we can determine completely the behavior of the particle — with no further reference 
to how those conditions came about. 

In other words, if those other charges were altered in some way, but the 
conditions at P that are described by the electric and magnetic field at P remain 
the same, then the motion of the charge will also be the same. A “real” field is 
then a set of numbers we specify in such a way that what happens at a point depends 
only on the numbers at that point. We do not need to know any more about what’s 
going on at other places. It is in this sense that we will discuss whether the vector 
potential is a “real” field. 

You may be wondering about the fact that the vector potential is not unique — 
that it can be changed by adding the gradient of any scalar with no change at all 
in the forces on particles. That has not, however, anything to do with the question 
of reality in the sense that we are talking about. For instance, the magnetic field 
is in a sense altered by a relativity change (as are also E and A). But we are not 
worried about what happens if the field can be changed in this way. That doesn’t 
really make any difference; that has nothing to do with the question of whether 
the vector potential is a proper “real”* field for describing magnetic effects, or 
whether it is just a useful mathematical tool. 

We should also make some remarks on the usefulness of the vector potential 
A. We have seen that it can be used in a formal procedure for calculating the mag- 
netic fields of known currents, just as <f> can be used to find electric fields. In 
electrostatics we saw that <f> was given by the scalar integral 

*(D = zir 05.22) 

47T€o J T 12 

From this <f> 9 we get the three components of E by three differential operations. 
This procedure is usually easier to handle than evaluating the three integrals in 
the vector formula r 

Em - *kj ^ (,5 - 23) 

First, there are three integrals; and second, each integral is in general somewhat 
more difficult. 
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The advantages are much less clear for magnetostatics. The integral for A is 
already a vector integral: 


^(D ^ 


1 ( j(2) dV 2 

47 r€ 0 c 2 J r 12 


(15.24) 


which is, of course, three integrals. Also, when we take the curl of A to get B , we 
have six derivatives to do and combine by pairs. It is not immediately obvious 
whether in most problems this procedure is really any easier than computing B 
directly from 

* (l) - <I5 ' 25) 


Using the vector potential is often more difficult for simple problems for the 
following reason. Suppose we are interested only in the magnetic field B at one 
point, and that the problem has some nice symmetry — say we want the field at a 
point on the axis of a ring of current. Because of the symmetry, we can easily get 
B by doing the integral of Eq. (1 5.25). If, however, we were to find A first, we would 
have to compute B from derivatives of A, so we must know what A is at all points 
in the neighborhood of the point of interest. And most of these points are off the 
axis of symmetry, so the integral for A gets complicated. In the ring problem, for 
example, we would need to use elliptic integrals. In such problems, A is clearly 
not very useful. It is true that in many complex problems it is easier to work with 
A , but it would be hard to argue that this ease of technique would justify making 
you learn about one more vector field. 

We have introduced A because it does have an important physical significance. 
Not only is it related to the energies of currents, as we saw in the last section, but 
it is also a “real” physical field in the sense that we described above. In classical 
mechanics it is clear that we can write the force on a particle as 

F — q(E + v X B)< (15.26) 


so that, given the forces, everything about the motion is determined. In any region 
where B = 0 even if A is not zero, such as outside a solenoid, there is no dis- 
cernible effect of A. Therefore for a long time it was believed that A was not a 
“real” field. It turns out, however, that there are phenomena involving quantum 
mechanics which show that the field A is in fact a “real” field in the sense we have 
defined it. In the next section we will show you how that works. 


15-5 The vector potential and quantum mechanics 

There are many changes in what concepts are important when we go from 
classical to quantum mechanics. We have already discussed some of them in 
Vol. I. In particular, the force concept gradually fades away, while the concepts 
of energy and momentum become of paramount importance. You remember that 
instead of particle motions, one deals with probability amplitudes which vary in 
space and time. In these amplitudes there are wavelengths related to momenta, 
and frequencies related to energies. The momenta and energies, which determine 
the phases of wave functions, are therefore the important quantities in quantum 
mechanics. Instead of forces, we deal with the way interactions change the wave- 
length of the waves. The idea of a force becomes quite secondary — if it is there at 
all. When people talk about nuclear forces, for example, what they usually analyze 
and work with are the energies of interaction of two nucleons, and not the force 
between them. Nobody ever differentiates the energy to find out what the force 
looks like. In this section we want to describe how the vector and scalar poten- 
tials enter into quantum mechanics. It is, in fact, just because momentum and 
energy play a central role in quantum mechanics that A and <t> provide the most 
direct way of introducing electromagnetic effects into quantum descriptions. 

We must review a little how quantum mechanics works. We will consider 
again the imaginary experiment described in Chapter 37 of Vol. I, in which elec- 
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Fig. 15-5. An interference experiment with electrons 
(see also Chapter 37 of Vol. I). 


trons are diffracted by two slits. The arrangement is shown again in Fig. 15-5. 
Electrons, all of nearly the same energy, leave the source and travel toward a wall 
with two narrow slits. Beyond the wall is a “backstop” with a movable detector. 
The detector measures the rate, which we call /, at which electrons arrive at a small 
region of the backstop at the distance x from the axis of symmetry. The rate is 
proportional to the probability that an individual electron that leaves the source 
will reach that region of the backstop. This probability has the complicated-looking 
distribution shown in the figure, which we understand as due to the interference of 
two amplitudes, one from each slit. The interference of the two amplitudes 
depends on their phase difference. That is, if the amplitudes are C\e 1 *' and C 2 e 1 * 2 , 
the phase difference 8 = $i — <J> 2 determines their interference pattern [see Eq. 
(29.12) in Vol. I]. If the distance between the screen and the slits is L, and if the 
difference in the path lengths for electrons going through the two slits is a, as 
shown in the figure, then the phase difference of the two waves is given by 

5 = |- (15.27) 

A 

As usual, we let X = A/27 r, where X is the wavelength of the space variation of the 
probability amplitude. For simplicity, we will consider only values of x much 
less than L\ then we can set 



and 

5 = zr (15 - 28) 


When x is zero, 8 is zero; the waves are in phase, and the probability has a maxi- 
mum. When 8 is 7r, the waves are out of phase, they interfere destructively, and the 
probability is a minimum. So we get the wavy function for the electron intensity. 

Now we would like to state the law that for quantum mechanics replaces the 
force law F = qv X B. It will be the law that determines the behavior of quantum- 
mechanical particles in an electromagnetic field. Since what happens is determined 
by amplitudes, the law must tell us how the magnetic influences affect the ampli- 
tudes; we are no longer dealing with the acceleration of a particle. The law is the 
following: the phase of the amplitude to arrive via any trajectory is changed by 
the presence of a magnetic field by an amount equal to the integral of the vector 
potential along the whole trajectory times the charge of the particle over Planck’s 
constant. That is, 


Magnetic change in phase = 


/ 


A • ds . 


trajectory 


(15.29) 
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If there were no magnetic field there would be a certain phase of arrival. If there is 
a magnetic field anywhere, the phase of the arriving wave is increased by the integral 
in Eq. (15.29). 

Although we will not need to use it for our present discussion, we mention 
that the effect of an electrostatic field is to produce a phase change given by the 
negative of the time integral of the scalar potential <f>: 


Electric change in phase 


g 

ft 


/ 


<t>dt. 


These two expressions are correct not only for static fields, but together give the 
correct result for any electromagnetic field, static or dynamic. This is the law that 
replaces F = q{E + v X B). We want now, however, to consider only a static 
magnetic field. 

Suppose that there is a magnetic field present in the two-slit experiment. We 
want to ask for the phase of arrival at the screen of the two waves whose paths pass 
through the two slits. Their interference determines where the maxima in the 
probability will be. We may call $1 the phase of the wave along trajectory (1). 
If $1 (B = 0) is the phase without the magnetic field, then when the field is turned 
on the phase will be 

= $j(5 = 0) + I / A-ds. (15.30) 

n J(i) 

Similarly, the phase for trajectory (2) is 

$2 = $2 (B = 0) + I / A-ds. (15.31) 

ft J (2) 

The interference of the waves at the detector depends on the phase difference 

5 = $ x (5 = 0) - $ 2 (B = 0) + | / A-ds - | / A-ds. (15.32) 

ft -/(l) ft J (2) 

The no-field difference we will call 5 (B — 0); it is just the phase difference we 
have calculated above in Eq. (15.28). Also, we notice that the two integrals can 
be written as one integral that goes forward along (1) and back along (2); we call 
this the closed path (1-2). So we have 

8 = S(B = 0) + | f A-ds. (15.33) 

ft J ( 1 — 2 ) 


This equation tells us how the electron motion is changed by the magnetic field ; 
with it we can find the new positions of the intensity maxima and minima at the 
backstop. 

Before we do that, however, we want to raise the following interesting and 
important point. You remember that the vector potential function has some 
arbitrariness. Two different vector potential functions A and A f whose difference 
is the gradient of some scalar function both represent the same magnetic field, 
since the curl of a gradient is zero. They give, therefore, the same classical force 
qv X B . If in quantum mechanics the effects depend on the vector potential, 
which of the many possible /I -functions is correct? 

The answer is that the same arbitrariness in A continues to exist for quantum 
mechanics. If in Eq. (15.33) we change A to A f = A + V^, the integral on 
A becomes etc 

f A f • ds = f A • ds + f • ds . 

J ( 1 — 2 ) J ( 1 — 2 ) 1 - 2 ) 


The integral of V\j/ is around the closed path (1-2), but the integral of the tangential 
component of a gradient on a closed path is always zero, by Stokes’ theorem. 
Therefore both A and A' give the same phase differences and the same quantum- 
mechanical interference effects. In both classical and quantum theory it is only the 
curl of A that matters ; any choice of the function of A which has the correct curl 
gives the correct physics. 
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The same conclusion is evident if we use the results of Section 14-1. There 
we found that the line integral of A around a closed path is the flux of B through 
the path, which here is the flux between paths (1) and (2). Equation (15.33) can, 
if we wish, be written as 


8 = S(B = 0) + I [flux of B between (1) and (2)], (15.34) 

n 

where by the flux of B we mean, as usual, the surface integral of the normal com- 
ponent of B . The result depends only on B , and therefore only on the curl of A. 

Now because we can write the result in terms of B as well as in terms of A , 
you might be inclined to think that the B holds its own as a “real” field and that 
the A can still be thought of as an artificial construction. But the definition of 
“real” field that we originally proposed was based on the idea that a “real” field 
would not act on a particle from a distance. We can, however, give an example 
in which B is zero — or at least arbitrarily small — at any place where there is some 
chance to find the particles, so that it is not possible to think of it acting directly 
on them. 

You remember that for a long solenoid carrying an electric current there is 
a B-field inside but none outside, while there is lots of A circulating around outside, 
as shown in Fig. 15-6. If we arrange a situation in which electrons are to be found 
only outside of the solenoid — only where there is A — there will still be an influence 
on the motion, according to Eq. (15.33). Classically, that is impossible. Classically, 
the force depends only on B; in order to know that the solenoid is carrying current, 
the particle must go through it. But quantum-mechanically you can find out that 
there is a magnetic field inside the solenoid by going around it — without ever going 
close to it! 

Suppose that we put a very long solenoid of small diameter just behind the 
wall and between the two slits, as shown in Fig. 15-7. The diameter of the solenoid 
is to be much smaller than the distance d between the two slits. In these circum- 
stances, the diffraction of the electrons at the slit gives no appreciable probability 
that the electrons will get near the solenoid. What will be the effect on our inter- 
ference experiment? 



Fig. 15-6. The magnetic field and 
vector potential of a long solenoid. 



Fig. 1 5-7. A magnetic field can influence the motion of electrons even though 
it exists only in regions where there is an arbitrarily small probability of finding the 
electrons. 

We compare the situation with and without a current through the solenoid. 

If we have no current, we have no B or A and we get the original pattern of elec- 
tron intensity at the backstop. If we turn the current on in the solenoid and build 
up a magnetic field B inside, then there is an A outside. There is a shift in the 
phase difference proportional to the circulation of A outside the solenoid, which will 
mean that the pattern of maxima and minima is shifted to a new position. In fact, 
since the flux of B inside is a constant for any pair of paths, so also is the circula- 
tion of A. For every arrival point there is the same phase change; this corresponds 
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to shifting the entire pattern in x by a constant amount, say x 0 , that we can easily 
calculate. The maximum intensity will occur where the phase difference between 
the two waves is zero. Using Eq. (15.32) or Eq. (15.33) for 5 and Eq. (15.28) for 
8(B — 0), we have 

*0 = - h * % f A-ds, (15.35) 

a n Ja- 2 ) 
or 

x 0 = — ~ X | [flux of B between (1) and (2)]. (15.36) 

The pattern with the solenoid in place should appear* as shown in Fig. 15-7. At 
least, that is the prediction of quantum mechanics. 

Precisely this experiment has recently been done. It is a very, very difficult 
experiment. Because the wavelength of the electrons is so small, the apparatus must 
be on a tiny scale to observe the interference. The slits must be very close together, 
and that means that one needs an exceedingly small solenoid. It turns out that in 
certain circumstances, iron crystals will grow in the form of very long, microsco- 
pically thin filaments called whiskers. When these iron whiskers are magnetized 
they are like a tiny solenoid, and there is no field outside except near the ends. 
The electron interference experiment was done with such a whisker between two 
slits, and the predicted displacement in the pattern of electrons was observed. 

In our sense then, the ^4 -field is “real.” You may say: “But there was a mag- 
netic field.” There was, but remember our original idea — that a field is “real” if it is 
what must be specified at the position of the particle in order to get the motion. 
The 2?-field in the whisker acts at a distance. If we want to describe its influence 
not as action-at-a-distance, we must use the vector potential. 

This subject has an interesting history. The theory we have described was 
known from the beginning of quantum mechanics in 1926. The fact that the vector 
potential appears in the wave equation of quantum mechanics (called the Schrod- 
inger equation) was obvious from the day it was written. That it cannot be replaced 
by the magnetic field in any easy way was observed by one man after the other 
who tried to do so. This is also clear from our example of electrons moving in a 
region where there is no field and being affected nevertheless. But because in 
classical mechanics A did not appear to have any direct importance and, further- 
more, because it could be changed by adding a gradient, people repeatedly said 
that the vector potential had no direct physical significance — that only the magnetic 
and electric fields are “right” even in quantum mechanics. It seems strange in 
retrospect that no one thought of discussing this experiment until 1956, when 
Bohm and Aharanov first suggested it and made the whole question crystal clear. 
The implication was there all the time, but no one paid attention to it. Thus 
many people were rather shocked when the matter was brought up. That’s why 
someone thought it would be worth while to do the experiment to see that it really 
was right, even though quantum mechanics, which had been believed for so many 
years, gave an unequivocal answer. It is interesting that something like this can 
be around for thirty years but, because of certain prejudices of what is and is not 
significant, continues to be ignored. 

Now we wish to continue in our analysis a little further. We will show the 
connection between the quantum-mechanical formula and the classical formula — 
to show why it turns out that if we look at things on a large enough scale it will 
look as though the particles are acted on by a force equal to qv X the curl of A . 
To get classical mechanics from quantum mechanics, we need to consider cases in 
which all the wavelengths are very small compared with distances over which ex- 
ternal conditions, like fields, vary appreciably. We shall not prove the result in 
great generality, but only in a very simple example, to show how it works. Again 
we consider the same slit experiment. But instead of putting all the magnetic field 
in a very tiny region between the slits, we imagine a magnetic field that extends 


* If the field B comes out of the plane of the figure, the flux as we have defined it is 
negative and xq is positive. 
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over a larger region behind the slits, as shown in Fig. 15-8. We will take the ideal- 
ized case where we have a magnetic field which is uniform in a narrow strip of 
width w , considered small as compared with L. (That can easily be arranged; the 
backstop can be put as far out as we want.) In order to calculate the shift in phase, 
we must take the two integrals of A along the two trajectories (1) and (2). They 
differ, as we have seen, merely by the flux of B between the paths. To our approxi- 
mation, the flux is Bwd. The phase difference for the two paths is then 

8 = s(B = 0) + | Bwd. (15.37) 


We note that, to our approximation, the phase shift is independent of the angle. 
So again the effect will be to shift the whole pattern upward by an amount Ax. 
Using Eq. (15.28), 

Ax = ^ AS = ^ [5 - S(B = 0)]. 

Using (15.37) for 8 — S(B = 0), 

Ax = LX | Bw. (15.38) 

Such a shift is equivalent to deflecting all the trajectories by the small angle a 
(see Fig. 15-8), where 

°c = ~ = \ q Bw. (15.39) 

Now classically we would also expect a thin strip of magnetic field to deflect 
all trajectories through some small angle, say a', as shown in Fig. 15-9(a). As the 
electrons go through the magnetic field, they feel a transverse force qv X B which 
lasts for a time w/v. The change in their transverse momentum is just equal to 
this impulse, so 

Ap x = qwB . (15.40) 



(a) 



The angular deflection [Fig. 15— 9(b)] is equal to the ratio of this transverse mo- 
mentum to the total momentum p . We get that 


P P 


(15.41) 



Pver. 


(b) 


We can compare this result with Eq. (15.39), which gives the same quantity 
computed quantum-mechanically. But the connection between classical mechanics 
and quantum mechanics is this; A particle of momentum p corresponds to a quan- 
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Fig. 15-9. Deflection of a particle 
due to passage through a strip of 
magnetic field. 



turn amplitude varying with the wavelength X = h/p . With this equality, a and a! 
are identical; the classical and quantum calculations give the same result. 

From the analysis we see how it is that the vector potential which appears in 
quantum mechanics in an explicit form produces a classical force which depends 
only on its derivatives. In quantum mechanics what matters is the interference 
between nearby paths ; it always turns out that the effects depend only on how much 
the field A changes from point to point, and therefore only on the derivatives of 
A and not on the value itself. Nevertheless, the vector potential A (together with 
the scalar potential d> that goes with it) appears to give the most direct description 
of the physics. This becomes more and more apparent the more deeply we go 
into the quantum theory. In the general theory of quantum electrodynamics, one 
takes the vector and scalar potentials as the fundamental quantities in a set 
of equations that replace the Maxwell equations: E and B are slowly disappear- 
ing from the modern expression of physical laws; they are being replaced by A 
and 0. 


15-6 What is true for statics is false for dynamics 

We are now at the end of our exploration of the subject of static fields. Already 
in this chapter we have come perilously close to having to worry about what 
happens when fields change with time. We were barely able to avoid it in our 
treatment of magnetic energy by taking refuge in a relativistic argument. Even so, 
our treatment of the energy problem was somewhat artificial and perhaps even 
mysterious, because we ignored the fact that moving coils must, in fact, produce 
changing fields. It is now time to take up the treatment of time-varying fields — the 
subject of electrodynamics. We will do so in the next chapter. First, however, we 
would like to emphasize a few points. 

Although we began this course with a presentation of the complete and correct 
equations of electromagnetism, we immediately began to study some incomplete 
pieces — because that was easier. There is a great advantage in starting with the 
simpler theory of static fields, and proceeding only later to the more complicated 
theory which includes dynamic fields. There is less new material to learn all at 
once, and there is time for you to develop your intellectual muscles in preparation 
for the bigger task. 

But there is the danger in this process that before we get to see the complete 
story, the incomplete truths learned on the way may become ingrained and taken 
as the whole truth — that what is true and what is only sometimes true will become 
confused. So we give in Table 15-1 a summary of the important formulas we have 
covered, separating those which are true in general from those which are true for 
statics, but false for dynamics. This summary also shows, in part, where we are 
going, since as we treat dynamics we will be developing in detail what we must just 
state here without proof. 

It may be useful to make a few remarks about the table. First, you should 
notice that the equations we started with are the true equations — we have not 
misled you there. The electromagnetic force (often called the Lorentz force) 
F = q(E + v X B) is true . It is only Coulomb’s law that is false, to be used only 
for statics. The four Maxwell equations for E and B are also true. The equations 
we took for statics are false, of course, because we left off all terms with time 
derivatives. 

Gauss’ law, V * E = p/€ 0 , remains, but the curl of E is not zero in general. 
So E cannot always be equated to the gradient of a scalar — the electrostatic po- 
tential. We will see that a scalar potential still remains, but it is a time-varying 
quantity that must be used together with vector potentials for a complete descrip- 
tion of the electric field. The equations governing this new scalar potential are, 
necessarily, also new. 

We must also give up the idea that E is zero in conductors. When the fields are 
changing, the charges in conductors do not, in general, have time to rearrange 
themselves to make the field zero. They are set in motion, but never reach equili- 
brium. The only general statement is: electric fields in conductors produce cur- 
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Table 15*1 


FALSE IN GENERAL (true only for statics) 


TRUE ALWAYS 


F = ~ — (Coulomb’s law) 

47t«o r 2 

V X F = 0 

F - -V0 

a?/i\ 1 P ( 2)*12 

£(1) 

For conductors, E = 0, <j> = constant. Q = CV 


F = q(E + v X B) (Lorentz force) 

— ► V • F = — (Gauss’ law) 

€0 


’X*--" 


F - -V<t> - 


(Faraday’s law) 


In a conductor, F makes currents. 


2_ 
c V 

X « = - 
€0 

(Ampere’s law) 

F(l) 

= 1 1 
47TC0C 2 J 

f /(2) X ei2 jr , 

r 2 

12 

V 2 </> 

c^l S 

1 

II 

(Poisson’s equation) 

V 2 A 

j_ 



€0 C 2 


with 



V * / 

< = 0 





The equations marked by an arrow (-*) are Maxwell’s equations. 
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rents. So in varying fields a conductor is not an equipotential. It also follows that 
the idea of a capacitance is no longer precise. 

Since there are no magnetic charges, the divergence of B is always zero. So 
B can always be equated to V X A . (Everything doesn’t change!) But the genera- 
tion of B is not only from currents; V X Bis proportional to the current density 
plus a new term dE/dt. This means that A is related to currents by a new equation. 
It is also related to 0. If we make use of our freedom to choose V ■ A for our own 
convenience, the equations for A or <f> can be arranged to take on a simple and ele- 
gant form. We therefore make the condition that c 2 V • A = —d<t>/dt, and the 
differential equations for A or <t> appear as shown in the table. 

The potentials A and </> can still be found by integrals over the currents and 
charges, but not the same integrals as for statics. Most wonderfully, though, the 
true integrals are like the static ones, with only a small and physically appealing 
modification. When we do the integrals to find the potentials at some point, say 
point (1) in Fig. 15-10, we must use the values of j and p at the point (2) at an 
earlier time t f = t — r\%/c. As you would expect, the influences propagate from 
point (2) to point (1) at the speed c. With this small change, one can solve for the 
fields of varying currents and charges, because once we have A and <t> , we get B 
from V X A, as before, and E from — V4> — dA/dt. 


Fig. 15-10. The potentials at point 
(1) and at the time t are given by sum- 
ming the contributions from each element 
of the source at the roving point (2), 
using the currents and charges which were 
present at the earlier time t — r^/c. 



Finally, you will notice that some results — for example, that the energy density 
in an electric field is e 0 E 2 /2 — are true for electrodynamics as well as for statics. 
You should not be misled into thinking that this is at all “natural.” The validity 
of any formula derived in the static case must be demonstrated over again for the 
dynamic case. A contrary example is the expression for the electrostatic energy in 
terms of a volume integral of ptf>. This result is true only for statics. 

We will consider all these matters in more detail in due time, but it will perhaps 
be useful to keep in mind this summary, so you will know what you can forget, 
and what you should remember as always true. 
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16 


Induced, Currents 


16-1 Motors and generators 

The discovery in 1820 that there was a close connection between electricity 
and magnetism was very exciting — until then, the two subjects had been considered 
as quite independent. The first discovery was that currents in wires make magnetic 
fields; then, in the same year, it was found that wires carrying current in a magnetic 
field have forces on them. 

One of the excitements whenever there is a mechanical force is the possibility 
of using it in an engine to do work. Almost immediately after their discovery, 
people started to design electric motors using the forces on current-carrying wires. 
The principle of the electromagnetic motor is shown in bare outline in Fig. 16-1. 
A permanent magnet — usually with some pieces of soft iron — is used to produce 
a magnetic field in two slots. Across each slot there is a north and south pole, 
as shown. A rectangular coil of copper is placed with one side in each slot. When 
a current passes through the coil, it flows in opposite directions in the two slots, 
so the forces are also opposite, producing a torque on the coil about the axis 
shown. If the coil is mounted on a shaft so that it can turn, it can be coupled to 
pulleys or gears and can do work. 

The same idea can be used for making a sensitive instrument for electrical 
measurements. Thus the moment the force law was discovered the precision of 
electrical measurements was greatly increased. First, the torque of such a motor 
can be made much greater for a given current by making the current go around 
many turns instead of just one. Then the coil can be mounted so that it turns with 
'very little torque — either by supporting its shaft on very delicate jewel bearings or 
by hanging the coil on a very fine wire or a quartz fiber. Then an exceedingly small 
current will make the coil turn, and for small angles the amount of rotation will 
be proportional to the current. The rotation can be measured by gluing a pointer 
to the coil or, for the most delicate instruments, by attaching a small mirror to the 
coil and looking at the shift of the image of a scale. Such instruments are called 
galvanometers. Voltmeters and ammeters work on the same principle. 

The same ideas can be applied on a large scale to make large motors for pro- 
viding mechanical power. The coil can be made to go around and around by ar- 
ranging that the connections to the coil are reversed each half-turn by contacts 
mounted on the shaft. Then the torque is always in the same direction. Small 
dc motors are made just this way. Larger motors, dc or ac, are often made by 
replacing the permanent magnet by an electromagnet, energized from the electrical 
power source. 

With the realization that electric currents make magnetic fields, people im- 
mediately suggested that, somehow or other, magnets might also make electric 
fields. Various experiments were tried. For example, two wires were placed parallel 
to each other and a current was passed through one of them in the hope of finding 
a current in the other. The thought was that the magnetic field might in some way 
drag the electrons along in the second wire, giving some such law as “likes prefer 
to move alike.” With the largest available current and the most sensitive gal- 
vanometer to detect any current, the result was negative. Large magnets next to 
wires also produced no observed effects. Finally, Faraday discovered in 1840 the 
essential feature that had been missed — that electric effects exist only when there 
is something changing . If one of a pair of wires has a changing current, a current 
is induced in the other, or if a magnet is moved near an electric circuit, there is a 
current. We say that currents are induced . This was the induction effect discovered 
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Fig. 16-1. Schematic outline of o 
simple electromagnetic motor. 



by Faraday. It transformed the rather dull subject of static fields into a very ex- 
citing dynamic subject with an enormous range of wonderful phenomena. This 
chapter is devoted to a qualitative description of some of them. As we will see, 
one can quickly get into fairly complicated situations that are hard to analyze 
quantitatively in all their details. But never mind, our main purpose in this chapter 
is first to acquaint you with the phenomena involved. We will take up the detailed 
analysis later. 

We can easily understand one feature of magnetic induction from what we 
already know, although it was not known in Faraday’s time. It comes from the 
v X B force on a moving charge that is proportional to its velocity in a magnetic 
field. Suppose that we have a wire which passes near a magnet, as shown in Fig. 
16-2, and that we connect the ends of the wire to a galvanometer. If we move the 
wire across the end of the magnet the galvanometer pointer moves. 

The magnet produces some vertical magnetic field, and when we push the 
wire across the field, the electrons in the wire feel a sideways force — at right angles 
to the field and to the motion. The force pushes the electrons along the wire. 
But why does this move the galvanometer, which is so far from the force? Because 
when the electrons which feel the magnetic force try to move, they push — by electric 
repulsion — the electrons a little farther down the wire; they, in turn, repel the 
electrons a little farther on, and so on for a long distance. An amazing thing. 

It was so amazing to Gauss and Weber — who first built a galvanometer — that 
they tried to see how far the forces in the wire would go. They strung a wire all the 
way across their city. Mr. Gauss, at one end, connected the wires to a battery 
(batteries were known before generators) and Mr. Weber watched the galvanometer 
move. They had a way of signaling long distances — it was the beginning of the 
telegraph! Of course, this has nothing directly to do with induction — it has to do 
with the way wires carry currents, whether the currents are pushed by induction 
or not. 

Now suppose in the setup of Fig. 16-2 we leave the wire alone and move the 
magnet. We still see an effect on the galvanometer. As Faraday discovered, moving 
the magnet under the wire — one way — has the same effect as moving the wire over 
the magnet — the other way. But when the magnet is moved, we no longer have 
any v X B force on the electrons in the wire. This is the new effect that Faraday 
found. Today, we might hope to understand it from a relativity argument. 

We already understand that the magnetic field of a magnet comes from its 
internal currents. So we expect to observe the same effect if instead of a magnet 
in Fig. 16-2 we use a coil of wire in which there is a current. If we move the wire 
past the coil there will be a current through the galvanometer, or also if we move 
the coil past the wire. But there is now a more exciting thing: If we change the 
magnetic field of the coil not by moving it, but by changing its current , there is 
again an effect in the galvanometer. For example, if we have a loop of wire near 
a coil, as shown in Fig. 16-3, and if we keep both of them stationary but switch 
off the current, there is a pulse of current through the galvanometer. When we 
switch the coil on again, the galvanometer kicks in the other direction. 

Whenever the galvanometer in a situation such as the one shown in Fig. 16-2, 
or in Fig. 16-3, has a current, there is a net push on the electrons in the wire in one 
direction along the wire. There may be pushes in different directions at different 
places, but there is more push in one direction than another. What counts is the 
push integrated around the complete circuit. We call this net integrated push the 
electromotive force (abbreviated emf) in the circuit. More precisely, the emf is 
defined as the tangential force per unit charge in the wire integrated over length, 
once around the complete circuit. Faraday’s complete discovery was that emf’s 
can be generated in a wire in three different ways: by moving the wire, by moving 
a magnet near the wire, or by changing a current in a nearby wire. 

Let’s consider the simple machine of Fig. 16-1 again, only now, instead of 
putting a current through the wire to make it turn, let’s turn the loop by an external 
force, for example by hand or by a waterwheel. When the coil rotates, its wires are 
moving in the magnetic field and we will find an emf in the circuit of the coil. 
The motor becomes a generator. 
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Fig. 16-2. Moving a wire through a magnetic field 
produces a current, as shown by the galvanometer. 


cry 
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Fig. 1 6-3. A coil with current produces a 
current in a second coil if the first coil is moved 
or if its current is changed. 


The coil of the generator has an induced emf from its motion. The amount of 
the emf is given by a simple rule discovered by Faraday. (We will just state the 
rule now and wait until later to examine it in detail.) The rule is that when the mag- 
netic flux that passes through the loop (this flux is the normal component of B 
integrated over the area of the loop) is changing with time, the emf is equal to 
the rate of change of the flux. We will refer to this as “the flux rule.” You see that 
when the coil of Fig. 16-1 is rotated, the flux through it changes. At the start 
some flux goes through one way; then when the coil has rotated 180° the same 
flux goes through the other way. If we continuously rotate the coil the flux is 
first positive, then negative, then positive, and so on. The rate of change of the 
flux must alternate also. So there is an alternating emf in the coil. If we connect 
the two ends of the coil to outside wires through some sliding contacts — called 
dip-rings — (just so the wires won’t get twisted) we have an alternating-current 
generator. 

Or we can also arrange, by means of some sliding contacts, that after every 
one-half rotation, the connection between the coil ends and the outside wires is 
reversed, so that when the emf reverses, so do the connections. Then the pulses of 
emf will always push currents in the same direction through the external circuit. 
We have what is called a direct-current generator. 

The machine of Fig. 16-1 is either a motor or a generator. The reciprocity 
between motors and generators is nicely shown by using two identical dc “motors” 
of the permanent magnet kind, with their coils connected by two copper wires. 
When the shaft of one is turned mechanically, it becomes a generator and drives 
tile other as a motor. If the shaft of the second is turned, it becomes the generator 
and drives the first as a motor. So here is an interesting example of a new kind of 
equivalence of nature: motor and generator are equivalent. The quantitative 
equivalence is, in fact, not completely accidental. It is related to the law of con- 
servation of energy. 

Another example of a device that can operate either to generate emfs or to 
respond to emf’s is the receiver of a standard telephone — that is, an “earphone.” 
The original telephone of Bell consisted of two such “earphones” connected by 
two long wires. The basic principle is shown in Fig. 16-4. A permanent magnet 
produces a magnetic field in two “yokes” of soft iron and in a thin diaphragm that 
is moved by sound pressure. When the diaphragm moves, it changes the amount 
of magnetic field in the yokes. Therefore a coil of wire wound around one of the 
yokes will have the flux through it changed when a sound wave hits the diaphragm. 
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Fig. 16-4. A telephone transmitter 
or receiver. 



So there is an emf in the coil. If the ends of the coil are connected to a circuit, a 
current which is an electrical representation of the sound is set up. 

If the ends of the coil of Fig. 16-4 are connected by two wires to another 
identical gadget, varying currents will flow in the second coil. These currents will 
produce a varying magnetic field and will make a varying attraction on the iron 
diaphragm. The diaphragm will wiggle and make sound waves approximately 
similar to the ones that moved the original diaphragm. With a few bits of iron and 
copper the human voice is transmitted over wires! 

(The modern home telephone uses a receiver like the one described but uses 
an improved invention to get a more powerful transmitter. It is the “carbon- 
button microphone,” that uses sound pressure to vary the electric current from 
a battery.) 


16-2 Transformers and inductances 



Fig. 16-5. Two coils, wrapped 
around bundles of iron sheets, allow a 
generator to light a bulb with no direct 
connection. 


One of the most interesting features of Faraday’s discoveries is not that an 
emf exists in a moving coil — which we can understand in terms of the magnetic 
force qv X B — but that a changing current in one coil makes an emf in a second 
coil. And quite surprisingly the amount of emf induced in the second coil is given 
by the same “flux rule”: that the emf is equal to the rate of change of the magnetic 
flux through the coil. Suppose that we take two coils, each wound around separate 
bundles of iron sheets (these help to make stronger magnetic fields), as shown in 
Fig. 16-5. Now we connect one of the coils — coil (a) — to an alternating-current 
generator. The continually changing current produces a continuously varying 
magnetic field. This varying field generates an alternating emf in the second coil — 
coil (b). This emf can, for example, produce enough power to light an electric bulb. 

The emf alternates in coil (b) at a frequency which is, of course, the same as the 
frequency of the original generator. But the current in coil (b) can be larger or 
smaller than the current in coil (a). The current in coil (b) depends on the emf 
induced in it and on the resistance and inductance of the rest of its circuit. The 
emf can be less than that of the generator if, say, there is little flux change. Or the 
emf in coil (b) can be made much larger than that in the generator by winding coil 
(b) with many turns, since in a given magnetic field the flux through the coil is 
then greater. (Or if you prefer to look at it another way, the emf is the same in each 
turn, and since the total emf is the sum of the emf’s of the separate turns, many 
turns in series produce a large emf.) 

Such a combination of two coils — usually with an arrangement of iron sheets 
to guide the magnetic fields — is called a transformer . It can “transform” one emf 
(also called a “voltage”) to another. 

There are also induction effects in a single coil. For instance, in the setup in 
Fig. 16-5 there is a changing flux not only through coil (b), which lights the bulb, 
but also through coil (a). The varying current in coil (a) produces a varying 
magnetic field inside itself and the flux of this field is continually changing, so there 
is a self-induced emf in coil (a). There is an emf acting on any current when it is 
building up a magnetic field — or, in general, when its field is changing in any way. 
The effect is called self -inductance. 

When we gave “the flux rule” that the emf is equal to the rate of change of the 
flux linkage, we didn’t specify the direction of the emf. There is a simple rule, 
called Lenz’s rule, for figuring out which way the emf goes: the emf tries to oppose 
any flux change. That is, the direction of an induced emf is always such that if a 
current were to flow in the direction of the emf, it would produce a flux of B that 
opposes the change in B that produces the emf. Lenz’s rule can be used to find 
the direction of the emf in the generator of Fig. 16-1 , or in the transformer winding 
of Fig. 16-3. 

In particular, if there is a changing current in a single coil (or in any wire) 
there is a “back” emf in the circuit. This emf acts on the charges flowing in coil 
(a) of Fig. 16-5 to oppose the change in magnetic field, and so in the direction to 
oppose the change in current. It tries to keep the current constant; it is opposite to 
the current when the current is increasing, and it is in the direction of the current 
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Fig. 1 6-6. Circuit connections for an 
electromagnet. The lamp allows the 
passage of current when the switch is 
opened, preventing the appearance of 
excessive emf's. 


when it is decreasing. A current in a self-inductance has “inertia,” because the 
inductive effects try to keep the flow constant, just as mechanical inertia tries to 
keep the velocity of an object constant. 

Any large electromagnet will have a large self-inductance. Suppose that a 
battery is connected to the coil of a large electromagnet, as in Fig. 16-6, and that a 
strong magnetic field has been built up. (The current reaches a steady value deter- 
mined by the battery voltage and the resistance of the wire in the coil.) But now 
suppose that we try to disconnect the battery by opening the switch. If we really 
opened the circuit, the current would go to zero rapidly, and in doing so it would 
generate an enormous emf. In most cases this emf would be large enough to de- 
velop an arc across the opening contacts of the switch. The high voltage that ap- 
pears might also damage the insulation of the coil — or you, if you are the person 
who opens the switch! For these reasons, electromagnets are usually connected in 
a circuit like the one shown in Fig. 16-6. When the switch is opened, the current 
does not change rapidly but remains steady, flowing instead through the lamp, 
being driven by the emf from the self-inductance of the coil. 


16-3 Forces on induced currents 

You have probably seen the dramatic demonstration of Lenz’s rule made with 
the gadget shown in Fig. 16-7. It is an electromagnet, just like coil (a) of Fig. 
16-5. An aluminum ring is placed on the end of the magnet. When the coil is 
connected to an alternating-current generator by closing the switch, the ring flies 
into the air. The force comes, of course, from the induced currents in the ring. 
The fact that the ring flies away shows that the currents in it oppose the change of 
the field through it. When the magnet is making a north pole at its top, the induced 
current in the ring is making a downward-point north pole. The ring and the coil 
are repelled just like two magnets with like poles opposite. If a thin radial cut is 
made in the ring the force disappears, showing that it does indeed come from the 
currents in the ring. 




Fig. 1 6-7. A conducting ring is strongly repelled 
by an electromagnet with a varying current. 


Fig. 16-8. An electromagnet near a perfectly 
conducting plate. 
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Fig. 16-9. A bar magnet is sus- 
pended above a superconducting bowl, 
by the repulsion of eddy currents. 
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Fig. 16-10. The braking of the pen- 
dulum shows the forces due to eddy cur- 
rents. 



Fig. 16-11. The eddy currents in the 
copper pendulum. 


If, instead of the ring, we place a disc of aluminum or copper across the end 
of the electromagnet of Fig. 16-7, it is also repelled; induced currents circulate in 
the material of the disc, and again produce a repulsion. 

An interesting effect, similar in origin, occurs with a sheet of a perfect con- 
ductor. In a “perfect conductor” there is no resistance whatever to the current. So 
if currents are generated in it, they can keep going forever. In fact, the slightest 
emf would generate an arbitrarily large current — which really means that there 
can be no emf’s at all. Any attempt to make a magnetic flux go through such a 
sheet generates currents that create opposite B fields — all with infinitesimal emf s, 
so with no flux entering. 

If we have a sheet of a perfect conductor and put an electromagnet next to it, 
when we turn on the current in the magnet, currents called eddy currents appear in 
the sheet, so that no magnetic flux enters. The field lines would look as shown in 
Fig. 16-8. The same thing happens, of course, if we bring a bar magnet near a 
perfect conductor. Since the eddy currents are creating opposing fields, the 
magnets are repelled from the conductor. This makes it possible to suspend a bar 
magnet in air above a sheet of perfect conductor shaped like a dish, as shown in 
Fig. 16-9. The magnet is suspended by the repulsion of the induced eddy currents 
in the perfect conductor. There are no perfect conductors at ordinary tempera- 
tures, but some materials become perfect conductors at low enough temperatures. 
For instance, below 3.8°K tin conducts perfectly. It is called a superconductor. 

If the conductor in Fig. 16-8 is not quite perfect there will be some resistance 
to flow of the eddy currents. The currents will tend to die out and the magnet will 
slowly settle down. The eddy currents in an imperfect conductor need an emf to 
keep them going, and to have an emf the flux must keep changing. The flux of 
the magnetic field gradually penetrates the conductor. 

In a normal conductor, there are not only repulsive forces from eddy currents, 
but there can also be sidewise forces. For instance, if we move a magnet sideways 
along a conducting surface the eddy currents produce a force of drag, because the 
induced currents are opposing the changing of the location of flux. Such forces are 
proportional to the velocity and are like a kind of viscous force. 

These effects show up nicely in the apparatus shown in Fig. 16-10. A square 
sheet of copper is suspended on the end of a rod to make a pendulum. The copper 
swings back and forth between the poles of an electromagnet. When the magnet 
is turned on, the pendulum motion is suddenly arrested. As the metal plate enters 
the gap of the magnet, there is a current induced in the plate which acts to oppose 
the change in flux through the plate. If the sheet were a perfect conductor, the 
currents would be so great that they would push the plate out again — it would 
bounce back. With a copper plate there is some resistance in the plate, so 
the currents at first bring the plate almost to a dead stop as it starts to enter the 
field. Then, as the currents die down, the plate slowly settles to rest in the magnetic 
field. 

The nature of the eddy currents in the copper pendulum is shown in Fig. 
16-11. The strength and geometry of the currents are quite sensitive to the shape 
of the plate. If, for instance, the copper plate is replaced by one which has several 
narrow slots cut in it, as shown in Fig. 16-12, the eddy-current effects are drastically 
reduced. The pendulum swings through the magnetic field with only a small 
retarding force. The reason is that the currents in each section of the copper have 
less flux to drive them, so the effects of the resistance of each loop are greater. 
The currents are smaller and the drag is less. The viscous character of the force 
is seen even more clearly if a sheet of copper is placed between the poles of the 
magnet of Fig. 16-10 and then released. It doesn’t fall; it just sinks slowly down- 
ward. The eddy currents exert a strong resistance to the motion — just like the 
viscous drag in honey. 

If, instead of dragging a conductor past a magnet, we try to rotate it in a 
magnetic field, there will be a resistive torque from the same effects. Alternatively, 
if we rotate a magnet — end over end — near a conducting plate or ring, the ring is 
dragged around; currents in the ring will create a torque that tends to rotate 
the ring with the magnet. 
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Fig. 16-12. Eddy-current effects are drasti- Fig. 16-13. Making a rotating magnetic field, 

cally reduced by cutting slots in the plate. 


A field just like that of a rotating magnet can be made with an arrangement 
of coils such as is shown in Fig. 16-13. We take a torus of iron (that is, a ring of 
iron like a doughnut) and wind six coils on it. If we put a current, as shown in 
part (a), through windings (1) and (4), there will be a magnetic field in the direction 
shown in the figure. If we now switch the current to windings (2) and (5), the 
magnetic field will be in a new direction, as shown in part (b) of the figure. Con- 
tinuing the process, we get the sequence of fields shown in the rest of the figure. 
If the process is done smoothly, we have a “rotating” magnetic field. We can easily 
get the required sequence of currents by connecting the coils to a three-phase 
power line, which provides just such a sequence of currents. “Three-phase power” 
is made in a generator using the principle of Fig. 16-1, except that there are three 
loops fastened together on the same shaft in a symmetrical way — that is, with an 
angle of 120° from one loop to the next. When the coils are rotated as a unit, the 
emf is a maximum in one, then in the next, and so on in a regular sequence. There 
are many practical advantages of three-phase power. One of them is the possibility 
of making a rotating magnetic field. The torque produced on a conductor by such 
a rotating field is easily shown by standing a metal ring on an insulating table just 
above the torus, as shown in Fig. 16-14. The rotating field causes the ring to spin 
about a vertical axis. The basic elements seen here are quite the same as those at 
play in a large commercial three-phase induction motor. 

Another form of induction motor is shown in Fig. 16-15. The arrangement 
shown is not suitable for a practical high-efficiency motor but will illustrate the 
principle. The electromagnet M, consisting of a bundle of laminated iron sheets 
wound with a solenoidal coil, is powered with alternating current from a generator. 
The magnet produces a varying flux of B through the aluminum disc. If we have 
just these two components, as shown in part (a) of the figure, we do not yet have 
a motor. There are eddy currents in the disc, but they are symmetric and there is 
no torque. (There will be some heating of the disc due to the induced currents.) If 
we now cover only one-half of the magnet pole with an aluminum plate, as shown 
in part (b) of the figure, the disc begins to rotate, and we have a motor. The 
operation depends on two eddy-current effects. First, the eddy currents in the 
aluminum plate oppose the change of flux through it, so the magnetic field above 
the plate always lags the field above that half of the pole which is not covered. This 
so-called “shaded-pole” effect produces a field which in the “shaded” region varies 
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Fig. 16-14. The rotating field of 
Fig. 1 6-1 3 can be used to provide torque 
on a conducting ring. 
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Fig. 1 6-1 5. A simple example of a shaded-pole induction motor. 


much like that in the “unshaded” region except that it is delayed a constant amount 
in time. The whole effect is as if there were a magnet only half as wide which is 
continually being moved from the unshaded region toward the shaded one. Then 
the varying fields interact with the eddy currents in the disc to produce the torque 
on it. 


16-4 Electrical technology 

When Faraday first made public his remarkable discovery that a changing 
magnetic flux produces an emf, he was asked (as anyone is asked when he dis- 
covers a new fact of nature), “What is the use of it?” All he had found was the 
oddity that a tiny current was produced when he moved a wire near a magnet. 
Of what possible “use” could that be? His answer was: “What is the use of a new- 
born baby?” 

Yet think of the tremendous practical applications his discovery has led to. 
What we have been describing are not just toys but examples chosen in most cases 
to represent the principle of some practical machine. For instance, the rotating ring 
in the turning field is an induction motor. There are, of course, some differences 
between it and a practical induction motor. The ring has a very small torque; it 
can be stopped with your hand. For a good motor, things have to be put together 
more intimately: there shouldn’t be so much “wasted” magnetic field out in the 
air. First, the field is concentrated by using iron. We have not discussed how iron 
does that, but iron can make the magnetic field tens of thousands of times stronger 
than copper coils alone could do. Second, the gaps between the pieces of iron are 
made small; to do that, some iron is even built into the rotating ring. Everything 
is arranged so as to get the greatest forces and the greatest efficiency — that is, 
conversion of electrical power to mechanical power — until the “ring” can no 
longer be held still by your hand. 

This problem of closing the gaps and making the thing work in the most 
practical way is engineering . It requires serious study of design problems, although 
there are no new basic principles from which the forces are obtained. But there 
is a long way to go from the basic principles to a practical and economic design. 
Yet it is just such careful engineering design that has made possible such a tre- 
mendous thing as Boulder Dam and all that goes with it. 

What is Boulder Dam? A huge river is stopped by a concrete wall. But what 
a wall it is! Shaped with a perfect curve that is very carefully worked out so that 
the least possible amount of concrete will hold back a whole river. It thickens at 
the bottom in that wonderful shape that the artists like but that the engineers can 
appreciate because they know that such thickening is related to the increase of 
pressure with the depth of the water. But we are getting away from electricity. 

Then the water of the river is diverted into a huge pipe. That’s a nice engineer- 
ing accomplishment in itself. The pipe feeds the water into a “waterwheel” — a 
huge turbine — and makes wheels turn. (Another engineering feat.) But why turn 
wheels? They are coupled to an exquisitely intricate mess of copper and iron, all 
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twisted and interwoven. With two parts — one that turns and one that doesn’t. 
All a complex intermixture of a few materials, mostly iron and copper but also 
some paper and shellac for insulation. A revolving monster thing. A generator. 
Somewhere out of the mess of copper and iron come a few special pieces of copper. 
The dam, the turbine, the iron, the copper, all put there to make something special 
happen to a few bars of copper — an emf. Then the copper bars go a little way and 
circle for several times around another piece of iron in a transformer; then their 
job is done. 

But around that same piece of iron curls another cable of copper which has 
no direct connection whatsoever to the bars from the generator ; they have just 
been influenced because they passed near it — to get their emf. The transformer 
converts the power from the relatively low voltages required for the efficient design 
of the generator to the very high voltages that are best for efficient transmission of 
electrical energy over long cables. 

And everything must be enormously efficient — there can be no waste, no loss. 
Why? The power for a metropolis is going through. If a small fraction were lost — 
one or two percent — think of the energy left behind! If one percent of the power 
were left in the transformer, that energy would need to be taken out somehow. If 
it appeared as heat, it would quickly melt the whole thing. There is, of course, some 
small inefficiency, but all that is required are a few pumps which circulate some oil 
through a radiator to keep the transformer from heating up. 

Out of the Boulder Dam come a few dozen rods of copper — long, long, long 
rods of copper perhaps the thickness of your wrist that go for hundreds of miles in 
all directions. Small rods of copper carrying the power of a giant river. Then the 
rods are split to make more rods . . . then to more transformers . . . sometimes to 
great generators which recreate the current in another form . . . sometimes to 
engines turning for big industrial purposes ... to more transformers . . . then 
more splitting and spreading . . . until finally the river is spread throughout the 
whole city — turning motors, making heat, making light, working gadgetry. The 
miracle of hot lights from cold water over 600 miles away — all done with specially 
arranged pieces of copper and iron. Large motors for rolling steel, or tiny motors 
for a dentist’s drill. Thousands of little wheels, turning in response to the turning 
of the big wheel at Boulder Dam. Stop the big wheel, and all the wheels stop; the 
lights go out. They really are connected. 

Yet there is more. The same phenomena that take the tremendous power of 
the river and spread it through the countryside, until a few drops of the river are 
running the dentist’s drill, come again into the building of extremely fine instru- 
ments ... for the detection of incredibly small amounts of current ... for the 
transmission of voices, music, and pictures ... for computers ... for automatic 
machines of fantastic precision. 

All this is possible because of carefully designed arrangements of copper and 
iron — efficiently created magnetic fields . . . blocks of rotating iron six feet in 
diameter whirling with clearances of 1/16 of an inch . . . careful proportions of 
copper for the optimum efficiency . . . strange shapes all serving a purpose, like 
the curve of the dam. 

If some future archaeologist uncovers Boulder Dam, we may guess that he 
would admire the beauty of its curves. But also the explorers from some great 
future civilizations will look at the generators and transformers and say: “Notice 
that every iron piece has a beautifully efficient shape. Think of the thought that 
has gone into every piece of copper!” 

This is the power of engineering and the careful design of our electrical tech- 
nology. There has been created in the generator something which exists nowhere 
else in nature. It is true that there are forces of induction in other places. Certainly 
in some places around the sun and stars there are effects of electromagnetic induc- 
tion. Perhaps also (though it’s not certain) the magnetic field of the earth is main- 
tained by an analog of an electric generator that operates on circulating currents 
in the interior of the earth. But nowhere have there been pieces put together with 
moving parts to generate electrical power as is done in the generator — with great 
efficiency and regularity. 
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You may think that designing electric generators is no longer an interesting 
subject, that it is a dead subject because they are all designed. Almost perfect 
generators or motors can be taken from a shelf. Even if this were true, we can 
admire the wonderful accomplishment of a problem solved to near perfection. 
But there remain as many unfinished problems. Even generators and transformers 
are returning as problems. It is likely that the whole field of low temperatures and 
superconductors will soon be applied to the problem of electric power distribution. 
With a radically new factor in the problem, new optimum designs will have to be 
created. Power networks of the future may have little resemblance to those of 
today. 

You can see that there is an endless number of applications and problems that 
one could take up while studying the laws of induction. The study of the design of 
electrical machinery is a life work in itself. We cannot go very far in that direction, 
but we should be aware of the fact that when we have discovered the law of induc- 
tion, we have suddenly connected our theory to an enormous practical develop- 
ment. We must, however, leave that subject to the engineers and applied scientists 
who are interested in working out the details of particular applications. Physics 
only supplies the base — the basic principles that apply, no matter what. (We have 
not yet completed the base, because we have yet to consider in detail the properties 
of iron and of copper. Physics has something to say about these as we will see a 
little later.) 

Modern electrical technology began with Faraday’s discoveries. The useless 
baby developed into a prodigy and changed the face of the earth in ways its proud 
father could never have imagined. 
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17 


The Taws of Induction 


17-1 The physics of induction 

In the last chapter we described many phenomena which show that the effects 
of induction are quite complicated and interesting. Now we want to discuss the 
fundamental principles which govern these effects. We have already defined the emf 
in a conducting circuit as the total accumulated force on the charges throughout 
the length of the loop. More specifically, it is the tangential component of the force 
per unit charge, integrated along the wire once around the circuit. This quantity 
is equal, therefore, to the total work done on a single charge that travels once 
around the circuit. 

We have also given the “flux rule,” which says that the emf is equal to the rate 
at which the magnetic flux through such a conducting circuit is changing. Let’s 
see if we can understand why that might be. First, we’ll consider a case in which 
the flux changes because a circuit is moved in a steady field. 

In Fig. 17-1 we show a simple loop of wire whose dimensions can be changed. 
The loop has two parts, a fixed U-shaped part (a) and a movable crossbar (b) 
that can slide along the two legs of the U. There is always a complete circuit, but 
its area is variable. Suppose we now place the loop m a uniform magnetic field with 
the plane of the U perpendicular to the field. According to the rule, when the cross- 
bar is moved there should be in the loop an emf that is proportional to the rate of 
change of the flux through the loop. This emf will cause a current in the loop. 
We will assume that there is enough resistance in the wire that the currents are 
small. Then we can neglect any magnetic field from this current. 

The flux through the loop is wLB , so the “flux rule” would give for the emf — 
which we write as £ — 

£ = wB ^ = wBv, 
dt 

where v is the speed of translation of the crossbar. 

Now we should be able to understand this result from the magnetic v X B 
forces on the charges in the moving crossbar. These charges will feel a force, 
tangential to the wire, equal to vB per unit charge. It is constant along the length 
w of the crossbar and zero elsewhere, so the integral is 

8 = wvB , 

which is the same result we got from the rate of change of the flux. 

The argument just given can be extended to any case where there is a fixed 
magnetic field and the wires are moved. One can prove, in general, that for any 
circuit whose parts move in a fixed magnetic field the emf is the time derivative 
of the flux, regardless of the shape of the circuit. 

On the other hand, what happens if the loop is stationary and the magnetic 
field is changed? We cannot deduce the answer to this question from the same 
argument. It was Faraday’s discovery — from experiment — that the “flux rule” 
is still correct no matter why the flux changes. The force on electric charges is 
given in complete generality by F = q(E + v X B); there are no new special 
“forces due to changing magnetic fields.” Any forces on charges at rest in a 
stationary wire come from the E term. Faraday’s observations led to the discovery 
that electric and magnetic fields are related by a new law: in a region where the 
magnetic field is changing with time, electric fields are generated. It is this electric 
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Fig. 17-1. An emf is induced in a 
loop if the flux is changed by varying the 
area of the circuit. 



field which drives the electrons around the wire — and so is responsible for the emf 
in a stationary circuit when there is a changing magnetic flux. 

The general law for the electric field associated with a changing magnetic 
field is 


V X E = 


SB 

dt 


(17.1) 


We will call this Faraday’s law. It was discovered by Faraday but was first written 
in differential form by Maxwell, as one of his equations. Let’s see how this equation 
gives the “flux rule” for circuits. 

Using Stokes’ theorem, this law can be written in integral form as 

f E * ds = / (V X E) - n da ~ — I ^ * nda, (17.2) 

Jt Js Js ot 

where, as usual, T is any closed curve and S is any surface bounded by it. Here, 
remember, r is a mathematical curve fixed m space, and 5 is a fixed surface. Then 
the time derivative can be taken outside the integral and we have 


f E • ds = — ~ I B* nda 

Jr ot J s 

— — ~ (flux through S'). (17.3) 

Applying this relation to a curve r that follows a fixed circuit of conductor, we 
get the “flux rule” once again. The integral on the left is the emf, and that on the 
right is the negative rate of change of the flux linked by the circuit. So Eq. (17.1) 
applied to a fixed circuit is equivalent to the “flux rule.” 

So the “flux rule” — that the emf in a circuit is equal to the rate of change of 
the magnetic flux through the circuit — applies whether the flux changes because the 
field changes or because the circuit moves (or both). The two possibilities — 
“circuit moves” or “field changes” — are not distinguished in the statement of the 
rule. Yet in our explanation of the rule we have used two completely distinct laws 
for the two cases — v X B for “circuit moves” and V X E = —dB/dt for “field 
changes.” 

We know of no other place in physics where such a simple and accurate 
general principle requires for its real understanding an analysis in terms of two 
different phenomena . Usually such a beautiful generalization is found to stem from 
a single deep underlying principle. Nevertheless, in this case there does not appear 
to be any such profound implication. We have to understand the “rule” as the 
combined effects of two quite separate phenomena. 

We must look at the “flux rule” in the following way. In general, the force per 
unit charge is F/q = E + v X B. In moving wires there is the force from the 
second term. Also, there is an E-field if there is somewhere a changing magnetic 
field. They are independent effects, but the emf around the loop of wire is always 
equal to the rate of change of magnetic flux through it. 


17-2 Exceptions to the “flux rule” 

We will now give some examples, due in part to Faraday, which show the 
importance of keeping clearly in mind the distinction between the two effects re- 
sponsible for induced emf’s. Our examples involve situations to which the “flux 
rule” cannot be applied — either because there is no wire at all or because the path 
taken by induced currents moves about within an extended volume of a conductor. 

We begin by making an important point: The part of the emf that comes from 
the E-field does not depend on the existence of a physical wire (as does the v X B 
part). The E-field can exist in free space, and its line integral around any imaginary 
line fixed in space is the rate of change of the flux of B through that line. (Note 
that this is quite unlike the E-field produced by static charges, for in that case the 
line integral of E around a closed loop is always zero.) 
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Fig. 17-2. When the disc rotates 
there is an emf from v X 8, but with 
no change in the linked flux. 


Now we will describe a situation in which the flux through a circuit does not 
change, but there is nevertheless an emf. Figure 17-2 shows a conducting disc 
which can be rotated on a fixed axis in the presence of a magnetic field. One 
contact is made to the shaft and another rubs on the outer periphery of the disc. 
A circuit is completed through a galvanometer. As the disc rotates, the “circuit,” 
in the sense of the place in space where the currents are, is always the same. But 
the part of the “circuit” in the disc is in material which is moving. Although the 
flux through the “circuit” is constant, there is still an emf, as can be observed by 
the deflection of the galvanometer. Clearly, here is a case where the v X B force in 
the moving disc gives rise to an emf which cannot be equated to a change of flux. 

Now we consider, as an opposite example, a somewhat unusual situation in 
which the flux through a “circuit” (again in the sense of the place where the current 
is) changes but where there is no emf. Imagine two metal plates with slightly curved 
edges, as shown in Fig. 17-3, placed in a uniform magnetic field perpendicular to 
their surfaces. Each plate is connected to one of the terminals of a galvanometer, 
as shown. The plates make contact at one point P, so there is a complete circuit 
If the plates are now rocked through a small angle, the point of contact will move 
to P' . If we imagine the “circuit” to be completed through the plates on the dotted 
line shown in the figure, the magnetic flux through this circuit changes by a large 
amount as the plates are rocked back and forth. Yet the rocking can be done with 
small motions, so that v X B is very small and there is practically no emf. The 
“flux rule” does not work in this case. It must be applied fo circuits m which the 
material of the circuit remains the same. When the material of the circuit is chang- 
ing, we must return to the basic laws. The correct physics is always given by the 
two basic laws 

F = q(E + v X B% 


V X E = 


dB 

~dt ‘ 



Fig. 17-3. When the plates are 
rocked in a uniform magnetic field, there 
can be a large change in the flux 
linkage without the generation of an 
emf. 


17-3 Particle acceleration by an induced electric field; the betatron 

We have said that the electromotive force generated by a changing magnetic 
field can exist even without conductors; that is, there can be magnetic induction 
without wires. We may still imagine an electromotive force around an arbitrary 
mathematical curve in space. It is defined as the tangential component of E 
integrated around the curve. Faraday’s law says that this line integral is equal to 
the rate of change of the magnetic flux through the closed curve, Eq. (17.3). 

As an example of the effect of such an induced electric field, we want now to 
consider the motion of an electron in a changing magnetic field. We imagine a 
magnetic field which, everywhere on a plane, points in a vertical direction, as shown 
in Fig. 17-4. The magnetic field is produced by an electromagnet, but we will not 
worry about the details For our example we will imagine that the magnetic field 
is symmetric about some axis, l e., that the strength of the magnetic field will 
depend only on the distance from the axis. The magnetic field is also varying with 
time We now imagine an electron that is moving in this field on a path that is a 
circle of constant radius with its center at the axis of the field. (We will see later 
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Fig. 17-4. An electron accelerating in an axially 
symmetric, time-varying magnetic field. 


how this motion can be arranged.) Because of the changing magnetic field, there 
will be an electric field E tangential to the electron’s orbit which will drive it around 
the circle. Because of the symmetry, this electric field will have the same value 
everywhere on the circle. If the electron’s orbit has the radius r, the line integral 
of E around the orbit is equal to the rate of change of the magnetic flux through 
the circle. The line integral of E is just its magnitude times the circumference of 
the circle, 2irr. The magnetic flux must, in general, be obtained from an integral. 
For the moment, we let B av represent the average magnetic field in the interior of 
the circle; then the flux is this average magnetic field times the area of the circle. 
We will have 

2irrE = ~ (fl av • irr 2 ). 


Since we are assuming r is constant, E is proportional to the time derivative of 
the average field : 


E - 


2~dT 


(17.4) 


The electron will feel the electric force qE and will be accelerated by it. Remember- 
ing that the relativistically correct equation of motion is that the rate of change of 
the momentum is proportional to the force, we have 


qE = 


dp 

dt 


(17.5) 


For the circular orbit we have assumed, the electric force on the electron is 
always in the direction of its motion, so its total momentum will be increasing at 
the rate given by Eq. (17.5). Combining Eqs. (17.5) and (17.4), we may relate the 
rate of change of momentum to the change of the average magnetic field : 

(17.6) 

dt 2 dt v ' 

Integrating with respect to t, we find for the electron’s momentum 

P = A, + f A B av , (17.7) 


where p 0 is the momentum with which the electrons start out, and A B dV is the sub- 
sequent change m Z? av . The operation of a betatron — a machine for accelerating 
electrons to high energies — is based on this idea. 

To see how the betatron operates in detail, we must now examine how the 
electron can be constrained to move on a circle. We have discussed in Chapter 1 1 
of Vol. I the principle involved. If we arrange that there is a magnetic field B at 
the orbit of the electron, there will be a transverse force qv X B which, for a suit- 
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ably chosen B , can cause the electron to keep moving on its assumed orbit. In the 
betatron this transverse force causes the electron to move in a circular orbit of 
constant radius. We can find out what the magnetic field at the orbit must be by 
using again the relativistic equation of motion, but this time, for the transverse 
component of the force. In the betatron (see Fig 17-4), B is at right angles to v , so 
the transverse force is qvB . Thus the force is equal to the rate of change of the trans- 
verse component p t of the momentum: 

"’‘‘-IT- 07 . 8 ) 


When a particle is moving in a circle , the rate of change of its transverse momentum 
is equal to the magnitude of the total momentum times co, the angular velocity of 
rotation (following the arguments of Chapter 11, Vol. I): 



where, since the motion is circular, 


co 


V 

r 


( 17 . 9 ) 


( 17 . 10 ) 


Setting the magnetic force equal to the transverse acceleration, we have 


(?^7?orbit 



( 17 . 11 ) 


where B ov b lt is the field at the radius r. 

As the betatron operates, the momentum of the electron grows in proportion 
to £ ilv , according to Eq. (17.7), and if the electron is to continue to move in its 
proper circle, Eq. (17.1 1) must continue to hold as the momentum of the electron 
increases. The value of B orhlt must increase in proportion to the momentum p . 
Comparing Eq. (17. 1 1) with Eq. (17.7), which determines /?, we see that the follow- 
ing relation must hold between the average magnetic field inside the orbit 
at the radius r, and the magnetic field B orhlt at the orbit: 

A B av = 2AB orbn . (17.12) 

The correct operation of a betatron requires that the average magnetic field inside 
the orbit increase at twice the rate of the magnetic field at the orbit itself. In these 
circumstances, as the energy of the particle is increased by the induced electric 
field the magnetic field at the orbit increases at just the rate required to keep the 
particle moving in a circle. 

The betatron is used to accelerate electrons to energies of tens of millions of 
volts, or even to hundreds of millions of volts. However, it becomes impractical for 
the acceleration of electrons to energies much higher than a few hundred million 
volts for several reasons. One of them is the practical difficulty of attaining the 
required high average value for the magnetic field inside the orbit. Another is that 
Eq. (17.6) is no longer correct at very high energies because it does not include the 
loss of energy from the particle due to its radiation of electromagnetic energy 
(the so-called synchrotron radiation discussed m Chapter 36, Vol. I). For these 
reasons, the acceleration of electrons to the highest energies — to many billions of 
electron volts— is accomplished by means of a different kind of machine, called a 
synchrotron. 


17-4 A paradox 

We would now like to describe for you an apparent paradox. A paradox is a 
situation which gives one answer when analyzed one way, and a different answer 
when analyzed another way, so that we are left in somewhat of a quandary as to 
actually what should happen. Of course, in physics there are never any real para- 
doxes because there is only one correct answer; at least we believe that nature will 
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Fig. 17-5. Will the disc rotate if the 
current / is stopped? 


act in only one way (and that is the right way , naturally). So in physics a paradox 
is only a confusion in our own understanding. Here is our paradox. 

Imagine that we construct a device like that shown in Fig. 17-5. There is a 
thm, circular plastic disc supported on a concentric shaft with excellent bearings, 
so that it is quite free to rotate. On the disc is a coil of wire in the form of a short 
solenoid concentric with the axis of rotation. This solenoid carries a steady current 
/ provided by a small battery, also mounted on the disc. Near the edge of the disc 
and spaced uniformly around its circumference are a number of small metal spheres 
insulated from each other and from the solenoid by the plastic material of the disc. 
Each of these small conducting spheres is charged with the same electrostatic 
charge Q. Everything is quite stationary, and the disc is at rest. Suppose now that 
by some accident — or by prearrangement — the current in the solenoid is inter- 
rupted, without, however, any intervention from the outside. So long as the current 
continued, there was a magnetic flux through the solenoid more or less parallel 
to the axis of the disc. When the current is interrupted, this flux must go to zero. 
There will, therefore, be an electric field induced which will circulate around in 
circles centered at the axis. The charged spheres on the perimeter of the disc will 
all experience an electric field tangential to the perimeter of the disc. This electnc 
force is in the same sense for all the charges and so will result in a net torque on the 
disc. From these arguments we would expect that as the current in the solenoid 
disappears, the disc would begin to rotate. If we knew the moment of inertia of 
the disc, the current in the solenoid, and the charges on the small spheres, we could 
compute the resulting angular velocity. 

But we could also make a different argument. Using the principle of the con- 
servation of angular momentum, we could say that the angular momentum of the 
disc with all its equipment is initially zero, and so the angular momentum of the 
assembly should remain zero. There should be no rotation when the current is 
stopped. Which argument is correct 9 Will the disc rotate or will it not 9 We will 
leave this question for you to think about. 

We should warn you that the correct answer does not depend on any non- 
essential feature, such as the asymmetric position of a battery, for example. In 
fact, you can imagine an ideal situation such as the following* The solenoid is 
made of superconducting wire through which there is a current. After the disc has 
been carefully placed at rest, the temperature of the solenoid is allowed to rise slowly 
When the temperature of the wire reaches the transition temperature between 
superconductivity and normal conductivity, the current in the solenoid will be 
brought to zero by the resistance of the wire. The flux will, as before, fall to zero, 
and there will be an electric field around the axis. We should also warn you that the 
solution is not easy, nor is it a trick. When you figure it out, you will have dis- 
covered an important principle of electromagnetism. 



Fig. 1 7-6. A coil of wire rotating in a 
uniform magnetic field — the basic idea 
of the ac generator. 


17-5 Alternating-current generator 

In the remainder of this chapter we apply the principles of Section 17-1 to 
analyze a number of the phenomena discussed in Chapter 16. We first look in more 
detail at the alternating-current generator. Such a generator consists basically of a 
coil of wire rotating m a uniform magnetic field. The same result can also be 
achieved by a fixed coil in a magnetic field whose direction rotates in the manner 
described in the last chapter. We will consider only the former case. Suppose we 
have a circular coil of wire which can be turned on an axis along one of its diam- 
eters. Let this coil be located in a uniform magnetic field perpendicular to the axis 
of rotation, as in Fig. 17-6 We also imagine that the two ends of the coil are 
brought to external connections through some kind of sliding contacts. 

Due to the rotation of the coil, the magnetic flux through it will be changing. 
The circuit of the coil will therefore have an emf in it. Let S be the area of the coil 
and d the angle between the magnetic field and the normal to the plane of the coil.* 


* Now that we are using the letter A for the vector potential, we prefer to let S stand 
for a Surface area. 
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The flux through the coil is then 


BS cos 6. (17.13) 

If the coil is rotating at the uniform angular velocity co, 8 varies with time as 
6 = art. The emf 8 in the coil is then 

8 = — ~ (flux) = — ~ ( BS cos cot), 
or 

8 = BSo) sin cot. (17.14) 

If we bring the wires from the generator to a point some distance from the 

rotating coil, where the magnetic field is zero, or at least is not varying with time, 

the curl of E in this region will be zero and we can define an electric potential. 
In fact, if there is no current being drawn from the generator, the potential differ- 
ence V between the two wires will be equal to the emf in the rotating coil. That is, 

V ~ 5Su>sinart = K 0 sin art. 

The potential difference between the wires varies as sin art. Such a varying potential 
difference is called an alternating voltage. 

Since there is an electric field between the wires, they must be electrically 
charged. It is clear that the emf of the generator has pushed some excess charges 
out to the wire until the electric field from them is strong enough to exactly counter- 
balance the induction force. Seen from outside the generator, the two wires appear 
as though they had been electrostatically charged to the potential difference V, 
and as though the charge was being changed with time to give an alternating po- 
tential difference. There is also another difference from an electrostatic situation. 
If we connect the generator to an external circuit that permits passage of a current, 
we find that the emf does not permit the wires to be discharged but continues to 
provide charge to the wires as current is drawn from them, attempting to keep the 
wires always at the same potential difference. If, in fact, the generator is connected 
in a circuit whose total resistance is R , the current through the circuit will be pro- 
portional to the emf of the generator and inversely proportional to R . Since the 
emf has a sinusoidal time variation, so also does the current. There is an alternating 
current 

r £ Vo • 

I=X=x-smcot. 

The schematic diagram of such a circuit is shown in Fig. 17-7. 

We can also see that the emf determines how much energy is supplied by the 
generator. Each charge in the wire is receiving energy at the rate F • u, where F is 
the force on the charge and v is its velocity. Now let the number of moving charges 
per unit length of the wire be n ; then the power being delivered into any element 
ds of the wire is 

F • vn ds. 

For a wire, v is always along ds , so we can rewrite the power as 

nvF * ds . 

The total power being delivered to the complete circuit is the integral of this 
expression around the complete loop: 

Power = <j> nvF ds. (17.15) 

Now remember that qnv is the current /, and that the emf is defined as the integral 
of F/q around the circuit. We get the result 


I 


A.C. 

Generator 




Fig. 17-7. A circuit with an ac 
generator and a resistance. 



Power from a generator = 8/. 


(17.16) 
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When there is a current in the coil of the generator, there will also be mechani- 
cal forces on it. In fact, we know that the torque on the coil is proportional to its 
magnetic moment, to the magnetic field strength B , and to the sine of the angle 
between. The magnetic moment is the current in the coil times its area. Therefore 
the torque is 

r - ISB sin 6. (17.17) 

The rate at which mechanical work must be done to keep the coil rotating is the 
angular velocity co times the torque : 

dW 

^ = cor = uISBsinO. (17.18) 

Comparing this equation with Eq. (17.14), we see that the rate of mechanical work 
required to rotate the coil against the magnetic forces is just equal to 8/, the rate 
at which electrical energy is delivered by the emf of the generator. All of the me- 
chanical energy used up in the generator appears as electrical energy in the circuit. 

As another example of the currents and forces due to an induced emf, let’s 
analyze what happens in the setup described in Section 12, and shown in Fig. 17-1 . 
There are two parallel wires and a sliding crossbar located in a uniform magnetic 
field perpendicular to the plane of the parallel wires. Now let’s assume that the 
“bottom” of the U (the left side in the figure) is made of wires of high resistance, 
while the two side wires are made of a good conductor like copper — then we don’t 
need to worry about the change of the circuit resistance as the crossbar is moved. 
As before, the emf in the circuit is 

8 = vBw. (17.19) 

The current in the circuit is proportional to this emf and inversely proportional 
to the resistance of the circuit: 

/ - | = * • 07 . 20 ) 

Because of this current there will be a magnetic force on the crossbar that is 
proportional to its length, to the current in it, and to the magnetic field, such that 

F = BIw . (17.21) 

Taking / from Eq. (17.20), we have for the force 

r 2 Uj 2 

F = v- (17.22) 

We see that the force is proportional to the velocity of the crossbar. The direction 
of the force, as you can easily see, is opposite to its velocity. Such a “velocity- 
proportional” force, which is like the force of viscosity, is found whenever induced 
currents are produced by moving conductors in a magnetic field. The examples of 
eddy currents we gave in the last chapter also produced forces on the conductors 
proportional to the velocity -of the conductor, even though such situations, in 
general, give a complicated distribution of currents which is difficult to analyze. 

It is often convenient in the design of mechanical systems to have damping 
forces which are proportional to the velocity. Eddy-current forces provide one of 
the most convenient ways of getting such a velocity-dependent force. An example 
of the application of such a force is found in the conventional domestic wattmeter. 
In the wattmeter there is a thin aluminum disc that rotates between the poles of a 
permanent magnet. This disc is driven by a small electric motor whose torque is 
proportional to the power being consumed in the electrical circuit of the house. 
Because of the eddy-current forces in the disc, there is a resistive force proportional 
to the velocity. In equilibrium, the velocity is therefore proportional to the rate of 
consumption of electrical energy. By means of a counter attached to the rotating 
disc, a record is kept of the number of revolutions it makes. This count is an indi- 
cation of the total energy consumption, i.e., the number of watthours used. 
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We may also point out that Eq. (17.22) shows that the force from induced 
currents — that is, any eddy-current force — is inversely proportional to the re- 
sistance. The force will be larger, the better the conductivity of the material. The 
reason, of course, is that an emf produces more current if the resistance is low, and 
the stronger currents represent greater mechanical forces. 

We can also see from our formulas how mechanical energy is converted into 
electrical energy. As before, the electrical energy supplied to the resistance of the 
circuit is the product 8/. The rate at which work is done in moving the conducting 
crossbar is the force on the bar times its velocity. Using Eq. (17.21) for the force, 
the rate of doing work is 

dW = v 2 B 2 w 2 
dt R 


We see that this is indeed equal to the product 8/ we would get from Eqs. (17.19) 
and (17.20). Again the mechanical work appears as electrical energy. 


17-6 Mutual inductance 


We now want to consider a situation in which there are fixed coils of wire but 
changing magnetic fields. When we described the production of magnetic fields by 
currents, we considered only the case of steady currents. But so long as the currents 
are changed slowly, the magnetic field will at each instant be nearly the same as the 
magnetic field of a steady current. We will assume in the discussion of this section 
that the currents are always varying sufficiently slowly that this is true. 

In Fig. 17-8 is shown an arrangement of two coils which demonstrates the 
basic effects responsible for the operation of a transformer. Coil 1 consists of a 
conducting wire wound in the form of a long solenoid. Around this coil — and 
insulated from it — is wound coil 2, consisting of a few turns of wire. If now a 
current is passed through coil 1, we know that a magnetic field will appear inside it. 
This magnetic field also passes through coil 2. As the current in coil 1 is varied, 
the magnetic flux will also vary, and there will be an induced emf in coil 2. We will 
now calculate this induced emf. 

We have seen in Section 13-5 that the magnetic field inside a long solenoid is 
uniform and has the magnitude 


1 NJi 


€ 0 C 2 l 


(17.23) 



Fig. 17-8. A current in coil 1 pro- 
duces a magnetic field through coil 2. 


where 7V X is the number of turns in coil 1, /i is the current through it, and / is its 
length. Let's say that the cross-sectional area of coil 1 is S; then the flux of B is 
its magnitude times S. If coil 2 has N 2 turns, this flux links the coil N 2 times. 
Therefore the emf in coil 2 is given by 

fi 2 = -N 2 S~- (17.24) 


The only quantity in Eq. (17.23) which varies with time is / x . The emf is therefore 
given by 


NiN 2 S dh 
e 0 c 2 / dt 


(17.25) 


We see that the emf in coil 2 is proportional to the rate of change of the current 
in coil 1 . The constant of proportionality, which is basically a geometric factor of 
the two coils, is called the mutual inductance , and is usually designated 9flt 2 i- Equa- 
tion (17.25) is then written 

s 2 = 31121 ^ • (17.26) 


Suppose now that we were to pass a current through coil 2 and ask about 
the emf in coil 1. We would compute the magnetic field, which is everywhere 
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proportional to the current / 2 . The flux linkage through coil 1 would depend on 
the geometry, but would be proportional to the current / 2 . The emf in coil 1 
would, therefore, again be proportional to dl 2 /dt : We can write 

Si = 91112 ■ (17.27) 

The computation of 3TC 12 would be more difficult than the computation we have 
just done for Sfll 21 . We will not carry through that computation now, because we 
will show later in this chapter that anz 1 2 is necessarily equal to 3fTC 21 . 

Since for any coil its field is proportional to its current, the same kind of 
result would be obtained for any two coils of wire. The equations (17.26) and 
(17.27) would have the same form; only the constants 3TC 21 and 3H 12 would be 
different. Their values would depend on the shapes of the coils and their relative 
positions. 


Fig. 17-9. Any two coils have a 
mutual inductance 3TC proportional to the 
integral of dsi * ds 2 /n 2 . 



Suppose that we wish to find the mutual inductance between any two arbitrary 
coils — for example, those shown in Fig. 17-9. We know that the general expression 
for the emf in coil 1 can be written as 


Si 



B • nda. 


where B is the magnetic field and the integral is to be taken over a surface bounded 
by circuit 1. We have seen in Section 14-1 that such a surface integral of B can be 
related to a line integral of the vector potential. In particular, 



B * nda 



A * ds i, 


where A represents the vector potential and ds i is an element of circuit 1. The line 
integral is to be taken around circuit 1 . The emf in coil 1 can therefore be written as 

81 = ~jtii/' dsv (i7-28) 


Now let’s assume that the vector potential at circuit 1 comes from currents 
in circuit 2. Then it can be written as a line integral around circuit 2: 

A = — <( , (17.29) 

47T€ 0 C 2 y (2) /*12 


where / 2 is the current in circuit 2, and r 12 is the distance from the element of the 
circuit ds 2 to the point on circuit 1 at which we are evaluating the vector potential. 
(See Fig. 17-9.) Combining Eqs. (17.28) and (17.29), we can express the emf in 
circuit 1 as a double line integral: 


Si 


47re 0 c 2 A .£> £) ri2 2 '* 1- 


In this equation the integrals are all taken with respect to stationary circuits. The 
only variable quantity is the current / 2 , which does not depend on the variables of 
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integration. We may therefore take it out of the integrals. The emf can then 
be written as 


3UT 


12 


dh 

dt 


where the coefficient 3fTC 12 is 


arc 13 = - 


1 


4tcq c 2 


( 1 ) J ( 2 ) 


ds 2 • dsi 
r\2 


(17.30) 


We see from this integral that am, x 2 depends only on the circuit geometry. It depends 
on a kind of average separation of the two circuits, with the average weighted most 
for parallel segments of the two coils. Our equation can be used for calculating 
the mutual inductance of any two circuits of arbitrary shape. Also, it shows that 
the integral for 3ffl 12 is identical to the integral for am 2 1 * We have therefore shown 
that the two coefficients are identical. For a system with only two coils, the co- 
efficients 3 Hi 2 and 3Tl 2 i are often represented by the symbol 3m without subscripts, 
called simply the mutual inductance : 


sfre 12 = 3m 2 ! = 3m. 


17-7 Self-inductance 

In discussing the induced electromotive forces in the two coils of Figs. 17-8 
or 17-9, we have considered only the case in which there was a current in one coil 
or the other. If there are currents in the two coils simultaneously, the magnetic 
flux linking either coil will be the sum of the two fluxes which would exist separately, 
because the law of superposition applies for magnetic fields. The emf in either 
coil will therefore be proportional not only to the change of the current in the 
other coil, but also to the change in the current of the coil itself. Thus the total 
emf in coil 2 should be written* 

S 2 = 3TC 2 i + 3^22 ' (17.31) 

Similarly, the emf in coil 1 will depend not only on the changing current m coil 2, 
but also on the changing current in itself: 

Si = + (1732) 

The coefficients flH 22 and 3fTC lx are always negative numbers. It is usual to write 

ami i = -£u am 23 = -£ 2 , (17.33) 

where £ x and Z 2 are called the self-inductances of the two coils. 

The self-induced emf will, of course, exist even if we have only one coil. 
Any coil by itself will have a self-inductance £. The emf will be proportional to the 
rate of change of the current in it. For a single coil, it is usual to adopt the con- 
vention that the emf and the current are considered positive if they are in the same 
direction. With this convention, we may write for the emf of a single coil 

&=-£-■ (17.34) 

The negative sign indicates that the emf opposes the change in current — it is often 
called a “back emf.” 

Since any coil has a self-inductance which opposes the change in current, the 
current in the coil has a kind of inertia. In fact, if we wish to change the current in 


* The sign of 1 2 and 3(172 1 in Eqs. (17.31) and (17.32) depends on the arbitrary choices 
for the sense of a positive current in the two coils. 
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Fig. 17-10 (a) A circuit with a 

voltage source and an inductance, (b) An 
analogous mechanical system. 


a coil we must overcome this inertia by connecting the coil to some external voltage 
source such as a battery or a generator, as shown in the schematic diagram of Fig. 
17-10(a). In such a circuit, the current I depends on the voltage 13 according to 
the relation 

V=£~- (17.35) 


This equation has the same form as Newton’s law of motion for a particle in 
one dimension. We can therefore study it by the principle that “the same equations 
have the same solutions.” Thus, if we make the externally applied voltage V corre- 
spond to an externally applied force F, and the current / in a coil correspond to the 
velocity v of a particle, the inductance £ of the coil corresponds to the mass m of the 
particle.* See Fig. 17— 10(b). We can make the following table of corresponding 
quantities. 


Particle 


Coil 


F (force) 
v (velocity) 
x (displacement) 


mv (momentum) 
\mv 2 (kinetic energy) 


T (potential difference) 
/ (current) 
q (charge) 


£/ 

^£/ 2 (magnetic energy) 


17-8 Inductance and magnetic energy 

Continuing with the analogy of the preceding section, we would expect that 
corresponding to the mechanical momentum p ~ mv , whose rate of change is 
the applied force, there should be an analogous quantity equal to £/, whose rate of 
change is *0. We have no right, of course, to say that £/ is the real momentum of the 
circuit; in fact, it isn’t. The whole circuit may be standing still and have no mo- 
mentum. It is only that £/ is analogous to the momentum mv in the sense of satisfy- 
ing corresponding equations. In the same way, to the kinetic energy \mv 2 , there 
corresponds an analogous quantity j£/ 2 . But there we have a surprise. This 
^£/ 2 is really the energy in the electrical case also. This is because the rate of doing 
work on the inductance is VI , and in the mechanical system it is Fv, the corre- 
sponding quantity. Therefore, m the case of the energy, the quantities not only 
correspond mathematically, but also have the same physical meaning as well. 

We may see this in more detail as follows. As we found in Eq. (17.16), the 
rate of electrical work by induced forces is the product of the electromotive force 
and the current: 


Replacing 8 by its expression in terms of the current from Eq. (17.34), we have 


dW 

dt 



(17.36) 


Integrating this equation, we find that the energy required from an external source 
to overcome the emf in the self-inductance while building up the current! (which 
must equal the energy stored, U) is 

- W = U = ^£/ 2 (17.37) 

Therefore the energy stored in an inductance is ^£/ 2 . 


* This is, incidentally, not the only way a correspondence can be set up between me- 
chanical and electrical quantities. 

t We are neglecting any energy loss to heat from the current in the resistance of the coil. 
Such losses require additional energy from the source but do not change the energy which 
goes into the inductance. 
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Applying the same arguments to a pair of coils such as those in Figs. 17-8 or 
17-9, we can show that the total electrical energy of the system is given by 

u = iJEx/f + \& 2 l\ + STC/x/a. (17.38) 

For, starting with / = 0 in both coils, we could first turn on the current 7 X in 
coil 1, with 1 2 ~ 0. The work done is just j£ x 7 x . But now, on turning up 7 2 , 
we not only do the work ^£ 2 7 2 against the emf in circuit 2, but also an additional 
amount Ml x 7 2 , which is the integral of the emf [M(dl 2 /dt)] in circuit 1 times the 
now constant current 7 X in that circuit. 

Suppose we now wish to find the force between any two coils carrying the 
currents 7 X and 7 2 . We might at first expect that we could use the principle of 
virtual work, by taking the change in the energy of Eq. (17.38). We must remember, 
of course, that as we change the relative positions of the coils the only quantity 
which varies is the mutual inductance M. We might then write the equation of 
virtual work as 


— FAx~AU = I X I 2 A3TC (wrong). 

But this equation is wrong because, as we have seen earlier, it includes only the 
change in the energy of the two coils and not the change in the energy of the sources 
which are maintaining the currents 7 X and 7 2 at their constant values. We can now 
understand that these sources must supply energy against the induced emf’s in the 
coils as they are moved. If we wish to apply the principle of virtual work correctly, 
we must also include these energies. As we have seen, however, we may take a 
short cut and use the principle of virtual work by remembering that the total 
energy is the negative of what we have called (J mvc the “mechanical energy.’ ’ We 
can therefore write for the force 


— F Ax = A U mvch = -A U. 
The force between two coils is then given by 


(17.39) 


F Ax = I X I 2 A3TC. 


Equation (17.38) for the energy of a system of two coils can be used to show 
that an interesting inequality exists between mutual inductance M and the self- 
inductances £i and £ 2 of the two coils. It is clear that the energy of two coils 
must be positive. If we begin with zero currents in the coils and increase these 
currents to some values, we have been adding energy to the system. If not, the 
currents would spontaneously increase with release of energy to the rest of the 
world — an unlikely thing to happen! Now our energy equation, Eq. (17.38), can 
equally well be written in the following form: 



h 




(17.40) 


That is just an algebraic transformation. This quantity must always be positive 
for any values of 7 X and 7 2 . In particular, it must be positive if 7 2 should happen to 
have the special value 


h 



(17.41) 


But with this current for 7 2 , the first term in Eq. (17.40) is zero. If the energy is to 
be positive, the last term in (17.40) must be greater than zero. We have the require- 
ment that 

£ x £ 2 > M 2 . 


We have thus proved the general result that the magnitude of the mutual inductance 
9H of any two coils is necessarily less than or equal to the geometric mean of the 
two self-inductances. (2fT£ itself may be positive or negative, depending on the sign 
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conventions for the currents I\ and / 2 .) 


|9fll| < V £i£ 2 - (17.42) 

The relation between 2fH and the self-inductances is usually written as 

911 - (17.43) 

The constant k is called the coefficient of coupling. If most of the flux from one 
coil links the other coil, the coefficient of coupling is near one; we say the coils are 
“tightly coupled.” If the coils are far apart or otherwise arranged so that there is 
very little mutual flux linkage, the coefficient of coupling is near zero and the 
mutual inductance is very small. 

For calculating the mutual inductance of two coils, we have given in Eq. 
(17.30) a formula which is a double line integral around the two circuits. We 
might think that the same formula could be used to get the self-inductance of a 
single coil by carrying out both line integrals around the same coil. This, however, 
will not work, because in integrating around the two coils, the denominator r 12 of 
the integrand will go to zero when the two line elements are at the same point. 
The self-inductance obtained from this formula is infinite. The reason is that this 
formula is an approximation that is valid only when the cross sections of the wires 
of the two circuits are small compared with the distance from one circuit to the 
other. Clearly, this approximation doesn’t hold for a single coil. It is, in fact, true 
that the inductance of a single coil tends logarithmically to infinity as the diameter 
of its wire is made smaller and smaller. 

We must, then, look for a different way of calculating the self-inductance of a 
single coil. It is necessary to take into account the distribution of the currents 
within the wires because the size of the wire is an important parameter. We should 
therefore ask not what is the inductance of a “circuit,” but what is the inductance 
of a distribution of conductors. Perhaps the easiest way to find this inductance is 
to make use of the magnetic energy. We found earlier, m Section 15-3, an ex- 
pression for the magnetic energy of a distribution of stationary currents: 

U = %fj-AdV. (17.44) 

If we know the distribution of current density /', we can compute the vector po- 
tential A and then evaluate the integral of Eq. (17.44) to get the energy. This 
energy is equal to the magnetic energy of the self-inductance, ^£/ 2 . Equating 
the two gives us a formula for the inductance: 

£ = ~fj-AdV. (17.45) 

We expect, of course, that the inductance is a number depending only on the 
geometry of the circuit and not on the current I m the circuit. The formula of Eq. 
(17.45) will indeed give such a result, because the integral in this equation is pro- 
portional to the square of the current — the current appears once through j and 
again through the vector potential A. The integral divided by I 2 will depend on the 
geometry of the circuit but not on the current /. 

Equation (17.44) for the energy of a current distribution can be put in a quite 
different form which is sometimes more convenient for calculation. Also, as we 
will see later, it is a form that is important because it is more generally valid. In 
the energy equation, Eq. (17.44), both A and j can be related to B , so we can hope 
to express the energy in terms of the magnetic field — just as we were able to relate 
the electrostatic energy to the electric field. We begin by replacing / by e 0 c 2 V X B. 
We cannot replace A so easily, since B = V X A cannot be reversed to give A in 
terms of B . Anyway, we can write 



(V X B) • A dV. 


(17.46) 
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The interesting thing is that — with some restrictions — this integral can be 
written as 

U = f B-{V X A) dv. (17.47) 

To see this, we write out in detail a typical term. Suppose that we take the term 
(V X B) Z A Z which occurs in the integral of Eq. (17.46). Writing out the com- 
ponents, we get 

f(^-^) A ‘ dxdrd2 - 

(There are, of course, two more integrals of the same kind.) We now integrate the 
first term with respect to x — integrating by parts. That is, we can say 

/ ik Azdx = ByAz ~ I By i!x dx ■ 

Now suppose that our system — meaning the sources and fields — is finite, so that 
as we go to large distances all fields go to zero. Then if the integrals are carried out 
over all space, evaluating the term B y A z at the limits will give zero. We have left 
only the term with B y (dA z /dx), which is evidently one part of B y (V X A) y and, 
therefore, of B * (V X A). If you work out the other five terms, you will see that 
Eq. (17.47) is indeed equivalent to Eq. (17.46). 

But now we can replace (V X A) by B, to get 

U = j B ■ B dV. (17.48) 

We have expressed the energy of a magnetostatic situation in terms of the magnetic 
field only. The expression corresponds closely to the formula we found for the 
electrostatic energy: 


U = J E-EdV. 


(17.49) 


One reason for emphasizing these two energy formulas is that sometimes they 
are more convenient to use. More important, it turns out that for dynamic fields 
(when E and B are changing with time) the two expressions (17.48) and (17.49) 
remain true, whereas the other formulas we have given for electric or magnetic 
energies are no longer correct — they hold only for static fields. 

If we know the magnetic field B of a single coil, we can find the self-inductance 
by equating the energy expression (17.48) to \£l 2 . Let’s see how this works by 
finding the self-inductance of a long solenoid. We have seen earlier that the mag- 
netic field inside a solenoid is uniform and B outside is zero. The magnitude of the 
field inside is B ~ nl/e 0 c 2 , where n is the number of turns per unit length in the 
winding and / is the current. If the radius of the coil is r and its length is L (we 
take L very long, so that we can neglect end effects, i.e., L r), the volume inside 
is 7 rr 2 L. The magnetic energy is therefore 


"=¥* 2 -(Vol) = S 


7T r 2 L, 


which is equal to j£/ 2 . Or, 


2 2 

7 rr n T 

£ = r L. 

e 0 c 2 


(17.50) 
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The Maxwell Equations 


18-1 Maxwell’s equations 

In this chapter we come back to the complete set of the four Maxwell equations 
that we took as our starting point in Chapter 1 . Until now, we have been studying 
Maxwell’s equations in bits and pieces; it is time to add one final piece, and to put 
them all together. We will then have the complete and correct story for electro- 
magnetic fields that may be changing with time in any way. Anything said in this 
chapter that contradicts something said earlier is true and what was said earlier is 
false — because what was said earlier applied to such special situations as, for 
instance, steady currents or fixed charges. Although we have been very careful to 
point out the restrictions whenever we wrote an equation, it is easy to forget all of 
the qualifications and to learn too well the wrong equations. Now we are ready 
to give the whole truth, with no qualifications (or almost none). 

The complete Maxwell equations are written in Table 18-1, in words as well 
as in mathematical symbols. The fact that the words are equivalent to the equations 
should by this time be familiar — you should be able to translate back and forth 
from one form to the other. 

The first equation — that the divergence of E is the charge density over e 0 — is 
true in general. In dynamic as well as in static fields, Gauss’ law is always valid. 
The flux of E through any closed surface is proportional to the charge inside. 
The third equation is the corresponding general law for magnetic fields. Since 
there are no magnetic charges, the flux of B through any closed surface is always 
zero. The second equation, that the curl of E is —dB/dt, is Faraday’s law and was 
discussed in the last two chapters. It also is generally true. The last equation has 
something new. We have seen before only the part of it which holds for steady 
currents. In that case we said that the curl of B is j/e 0 c 2 , but the correct general 
equation has a new part that was discovered by Maxwell. 

Until Maxwell’s work, the known laws of electricity and magnetism were 
those we have studied in Chapters 3 through 17. In particular, the equation for 
the magnetic field of steady currents was known only as 


V X B = 


_ J 


€ 0 C^ 


(18.1) 


Maxwell began by considering these known laws and expressing them as differ- 
ential equations, as we have done here. (Although the V notation was not yet 
invented, it is mainly due to Maxwell that the importance of the combinations of 
derivatives, which we today call the curl and the divergence, first became apparent.) 
He then noticed that there was something strange about Eq. (18.1). If one takes the 
divergence of this equation, the left-hand side will be zero, because the divergence 
of a curl is always zero. So this equation requires that the divergence of j also be 
zero. But if the divergence of j is zero, then the total flux of current out of any 
closed surface is also zero. 

The flux of current from a closed surface is the decrease of the charge inside 
the surface. This certainly cannot in general be zero because we know that the 
charges can be moved from one place to another. The equation 




dp 

dt 


(18.2) 


has, in fact, been almost our definition of / This equation expresses the very funda- 
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Table 18-1 Classical Physics 


Maxwell's equations 

I. V • E = — (Flux of E through a closed surface) = (Charge inside )/ eo 

€0 

dS d 

II. V X E = — (Line integral of E around a loop) — (Flux of B through the loop) 

dt at 

III. V * B = 0 (Flux of B through a closed surface) = 0 

2 J dE 2 

IV. c V X B — 1 — — c (Integralof B around a loop) = (Current through the loop )/eo 

eo dt 

+ — (Flux of E through the loop) 
dt 


Conservation of charge 

V - } — — ^ (Flux of current through a closed surface) = — — (Charge inside) 

dt dt 

Force law 

F - q(E + v X B) 


Law of motion 


d 

dt 


(p) = 


F, 


, mv 

where p = — - 

Vl — v 2 /c 2 


(Newton’s law, with Einstein’s modification) 


Gravitation 


n mim 2 

F = —G — — e r 


mental law that electric charge is conserved — any flow of charge must come from 
some supply. Maxwell appreciated this difficulty and proposed that it could be 
avoided by adding the term dE/dt to the right-hand side of Eq. (18.1); he then got 
the fourth equation in Table 18-1 : 

IV. c 2 V X B = j- + ?§ ■ 

€o dt 

It was not yet customary in Maxwell’s time to think in terms of abstract fields. 
Maxwell discussed his ideas in terms of a model in which the vacuum was like an 
elastic solid. He also tried to explain the meaning of his new equation in terms of 
the mechanical model. There was much reluctance to accept his theory, first be- 
cause of the model, and second because there was at first no experimental justi- 
fication. Today, we understand better that what counts are the equations themselves 
and not the model used to get them. We may only question whether the equations 
are true or false. This is answered by doing experiments, and untold numbers of 
experiments have confirmed Maxwell’s equations. If we take away the scaffolding 
he used to build it, we find that Maxwell’s beautiful edifice stands on its own. He 
brought together all of the laws of electricity and magnetism and made one complete 
and beautiful theory. 

Let us show that the extra term is just what is required to straighten out the 
difficulty Maxwell discovered. Taking the divergence of his equation (IV in Table 
18-1), we must have that the divergence of the right-hand side is zero: 

V-J-+V-~=0. (18.3) 

Co dt 
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In the second term, the order of the derivatives with respect to coordinates and 
time can be reversed, so the equation can be rewritten as 

v •/ + e° ^ V • £ = °. (18.4) 

But the first of Maxwell’s equations says that the divergence of E is p/e 0 . Inserting 
this equality in Eq. (18.4), we get back Eq. (18.2), which we know is true. Con- 
versely, if we accept Maxwell's equations — and we do because no one has ever 
found an experiment that disagrees with them — we must conclude that charge is 
always conserved. 

The laws of physics have no answer to the question: “What happens if a 
charge is suddenly created at this point — what electromagnetic effects are pro- 
duced?” No answer can be given because our equations say it doesn’t happen. 
If it were to happen, we would need new laws, but we cannot say what they would 
be. We have not had the chance to observe how a world without charge con- 
servation behaves. According to our equations, if you suddenly place a charge at 
some point, you had to carry it there from somewhere else. In that case, we can 
say what would happen. 

When we added a new term to the equation for the curl of E , we found that a 
whole new class of phenomena was described. We shall see that Maxwell’s little 
addition to the equation for V X B also has far-reaching consequences. We can 
touch on only a few of them in this chapter. 


18-2 How the new term works 

As our first example we consider what happens with a spherically symmetric 
radial distribution of current. Suppose we imagine a little sphere with radioactive 
material on it. This radioactive material is squirting out some charged particles. 
(Or we could imagine a large block of jello with a small hole in the center into 
which some charge had been injected with a hypodermic needle and from which 
the charge is slowly leaking out.) In either case we would have a current that is 
everywhere radially outward. We will assume that it has the same magnitude in 
all directions. 

Let the total charge inside any radius r be Q(r). If the radial current density 
at the same radius is j(r ), then Eq. (18.2) requires that Q decreases at the rate 

= - 4 rr 2 J^- < 18 - 5 ) 

We now ask about the magnetic field produced by the currents in this situation. 
Suppose we draw some loop T on a sphere of radius r , as shown in Fig. 18-1. 
There is some current through this loop, so we might expect to find a magnetic 
field circulating in the direction shown. 

But we are already in difficulty. How can the B have any particular direction 
on the sphere? A different choice of T would allow us to conclude that its direction 
is exactly opposite to that shown. So how can there be any circulation of B around 
the currents? 

We are saved by Maxwell’s equation. The circulation of B depends not only 
on the total current through T but also on the rate of change with time of the 
electric flux through it. It must be that these two parts just cancel. Let’s see if that 
works out. 

The electric field at the radius r must be g(r)/47re 0 r 2 — so long as the charge 
is symmetrically distributed, as we assume. It is radial, and its rate of change is then 



Fig. 18-1. What is the magnetic 
field of a spherically symmetric current? 


6E = 1 dQ 

dt 4i re 0 r 2 dt 


(18.6) 


Comparing this with Eq. (18.5), we see that at any radius 


dE _ _ J_ 
dt 6 0 


(18.7) 
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Fig. 1 8-2. The magnetic field near a charging capacitor. 


In Eq. IV the two source terms cancel and the curl of B is always zero. There is 
no magnetic field in our example. 

As our second example, we consider the magnetic field of a wire used to 
charge a parallel-plate condenser (see Fig. 18-2). If the charge Q on the plates is 
changing with time (but not too fast), the current in the wires is equal to dQ/dt . 
We would expect that this current will produce a magnetic field that encircles the 
wire. Surely, the current close to the wire must produce the normal magnetic 
field — it cannot depend on where the current is going. 

Suppose we take a loop r x which is a circle with radius r, as shown in part (a) 
of the figure. The line integral of the magnetic field should be equal to the current 
I divided by e 0 c 2 . We have 

2irrB = — ■ (18.8) 


This is what we would get for a steady current, but it is also correct with Maxwell’s 
addition, because if we consider the plane surface S inside the circle, there are no 
electric fields on it (assuming the wire to be a very good conductor). The surface 
integral of dE/dt is zero. 

Suppose, however, that we now slowly move the curve r downward. We get 
always the same result until we draw even with the plates of the condenser. Then 
the current I goes to zero. Does the magnetic field disappear? That would be 
quite strange. Let’s see what Maxwell’s equation says for the curve r 2 , which is a 
circle of radius r whose plane passes between the condenser plates [Fig. 18-2(b)]. 
The line integral of B around r 2 is 2irrB. This must equal the time derivative of 
the flux of E through the plane circular surface S 2 . This flux of E, we know from 
Gauss’ law, must be equal to l/e 0 times the charge Q on one of the condenser plates. 
We have , v 


That is very convenient. It is the same result we found in Eq. (18.8). Inte- 
grating over the changing electric field gives the same magnetic field as does inte- 
grating over the current in the wire. Of course, that is just what Maxwell’s equation 
says. It is easy to see that this must always be so by applying our same arguments 
to the two surfaces Si and S( that are bounded by the same circle T x in Fig. 
18-2(b). Through Si there is the current /, but no electric flux. Through S[ there 
is no current, but an electric flux changing at the rate //e 0 . The same B is obtained 
if we use Eq. IV with either surface. 

From our discussion so far of Maxwell’s new term, you may have the im- 
pression that it doesn’t add much — that it just fixes up the equations to agree with 
what we already expect. It is true that if we just consider Eq. IV by itself \ nothing 
particularly new comes out. The words “by itself 9 ’ are, however, all-important. 
Maxwell’s small change in Eq. IV, when combined with the other equations, does 
indeed produce much that is new and important. Before we take up these matters, 
however, we want to speak more about Table 18-1. 
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18-3 All of classical physics 

In Table 18-1 we have all that was known of fundamental classical physics, 
that is, the physics that was known by 1905. Here it all is, in one table. With these 
equations we can understand the complete realm of classical physics. 

First we have the Maxwell equations — written in both the expanded form and 
the short mathematical form. Then there is the conservation of charge, which is 
even written in parentheses, because the moment we have the complete Maxwell 
equations, we can deduce from them the conservation of charge. So the table is 
even a little redundant. Next, we have written the force law, because having all 
the electric and magnetic fields doesn’t tell us anything until we know what they 
do to charges. Knowing E and B, however, we can find the force on an object with 
the charge q moving with velocity v. Finally, having the force doesn’t tell us any- 
thing until we know what happens when a force pushes on something; we need the 
law of motion, which is that the force is equal to the rate of change of the mo- 
mentum. (Remember? We had that in Volume I.) We even include relativity 
effects by writing the momentum as p — — t> 2 /c 2 * 

If we really want to be complete, we should add one more law — Newton’s 
law of gravitation — so we put that at the end. 

Therefore in one small table we have all the fundamental laws of classical 
physics — even with room to write them out in words and with some redundancy. 
This is a great moment. We have climbed a great peak. We are on the top of 
K-2 — we are nearly ready for Mount Everest, which is quantum mechanics. We 
have climbed the peak of a “Great Divide,” and now we can go down the other 
side. 

We have mainly been trying to learn how to understand the equations. Now 
that we have the whole thing put together, we are going to study what the equations 
mean — what new things they say that we haven’t already seen. We’ve been working 
hard to get up to this point. It has been a great effort, but now we are going to have 
nice coasting downhill as we see all the consequences of our accomplishment. 


18-4 A travelling field 

Now for the new consequences. They come from putting together all of 
Maxwell’s equations. First, let’s see what would happen in a circumstance which 
we pick to be particularly simple. By assuming that all the quantities vary only in 
one coordinate, we will have a one-dimensional problem. The situation is shown 
in Fig. 18-3. We have a sheet of charge located on the yz-plane. The sheet is first 
at rest, then instantaneously given a velocity u in the y-direction, and kept moving 
with this constant velocity. You might worry about having such an “infinite” 
acceleration, but it doesn’t really matter ; just imagine that the velocity is brought to 
u very quickly. So we have suddenly a surface current J (J is the current per unit 



Fig. 1 8-3. An infinite sheet of charge 
is suddenly set into motion parallel to 
itself. There are magnetic and electric 
fields that propagate out from the sheet 
at a constant speed. 
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Fig. 18-4. (a) The magnitude of B 

(or E) as a function of x at the time f after 
the charge sheet is set in motion, (b) The 
fields for a charge sheet set in motion, 
toward negative y at t = 7*. (c) The sum 
of (a) and (b). 


width in the z-direction). To keep the problem simple, we suppose that there is 
also a stationary sheet of charge of opposite sign superposed on the yz- plane, so 
that there are no electrostatic effects. Also, although in the figure we show only 
what is happening in a finite region, we imagine that the sheet extends to infinity 
in and ±z. In other words, we have a situation where there is no current, and 
then suddenly there is a uniform sheet of current. What will happen? 

Well, when there is a sheet of current in the plus y-direction, there is, as we 
know, a magnetic field generated which will be in the minus z-direction for x > 0 
and in the opposite direction for x < 0. We could find the magnitude of B by 
using the fact that the line integral of the magnetic field will be equal to the current 
over e 0 c 2 . We would get that B = J/le^c 2 (since the current I in a strip of width 
w is Jw and the line integral of B is 2 Bw). 

This gives us the field next to the sheet — for small x — but since we are im- 
agining an infinite sheet, we would expect the same argument to give the magnetic 
field farther out for larger values of x. However, that would mean that the moment 
we turn on the current, the magnetic field is suddenly changed from zero to a 
finite value everywhere. But wait! If the magnetic field is suddenly changed, it 
will produce tremendous electrical effects. (If it changes in any way, there are 
electrical effects.) So because we moved the sheet of charge, we make a changing 
magnetic field, and therefore electric fields must be generated. If there are electric 
fields generated, they had to start from zero and change to something else. There 
will be some dE/dt that will make a contribution, together with the current /, to the 
production of the magnetic field. So through the various equations there is a big 
intermixing, and we have to try to solve for all the fields at once. 

By looking at the Maxwell equations alone, it is not easy to see directly how 
to get the solution. So we will first show you what the answer is and then verify 
that it does indeed satisfy the equations. The answer is the following: The field B 
that we computed is, in fact, generated right next to the current sheet (for small x). 
It must be so, because if we make a tiny loop around the sheet, there is no room 
for any electric flux to go through it. But the field B out farther — for larger x — is, 
at first, zero. It stays zero for awhile, and then suddenly turns on. In short, we 
turn on the current and the magnetic field immediately next to it turns on to a 
constant value B\ then the turning on of B spreads out from the source region. 
After a certain time, there is a uniform magnetic field everywhere out to some 
value x, and then zero beyond. Because of the symmetry, it spreads in both the 
plus and minus x-directions. 

The £-field does the same thing. Before t — 0 (when we turn on the current), 
the field is zero everywhere. Then after the time u both E and B are uniform out 
to the distance x = vt, and zero beyond. The fields make their way forward like 
a tidal wave, with a front moving at a uniform velocity which turns out to be c, 
but for a while we will just call it v . A graph of the magnitude of E or B versus x, 
as they appear at the time t, is shown in Fig. 18-4(a). Looking again at Fig. 18-3, 
at the time r, the region between x = ±vt is “filled” with the fields, but they have 
not yet reached beyond. We emphasize again that we are assuming that the current 
sheet and, therefore the fields E and B , extend infinitely far in both the y- and z-di- 
rections. (We cannot draw an infinite sheet, so we have shown only what happens 
in a finite area.) 

We want now to analyze quantitatively what is happening. To do that, we 
want to look at two cross-sectional views, a top view looking down along the y-axis, 
as shown in Fig, 18-5, and a side view looking back along the z-axis, as shown in 
Fig. 18-6. Suppose we start with the side view. We see the charged sheet moving 
up; the magnetic field points into the page for +x, and out of the page for — x, 
and the electric field is downward everywhere — out to x = =±vt. 

Let’s see if these fields are consistent with Maxwell’s equations. Let’s first 
draw one of those loops that we use to calculate a line integral, say the rectangle 
r 2 shown in Fig. 18-6. You notice that one side of the rectangle is in the region 
where there are fields, but one side is in the region the fields have still not reached. 
There is some magnetic flux through this loop. If it is changing, there should be 
an emf around it. If the wavefront is moving, we will have a changing magnetic 
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flux, because the area in which B exists is progressively increasing at the velocity v . 
The flux inside r 2 is B times the part of the area inside r 2 which has a magnetic 
field. The rate of change of the flux, since the magnitude of B is constant, is the 
magnitude times the rate of change of the area. The rate of change of the area is 
easy. If the width of the rectangle r 2 is L, the area in which B exists changes by 
Lv At in the time At . (See Fig. 18-6.) The rate of change of flux is then BLv. 
According to Faraday’s law, this should equal the line integral of E around r 2> 
which is just EL. We have the equation 

E = vB . (18.10) 

So if the ratio of E to B is v, the fields we have assumed will satisfy Faraday’s 
equation. 

But that is not the only equation; we have the other equation relating E and B: 


c 2 VXB = ^ + ~- (18.11) 

To apply this equation, we look at the top view in Fig. 18-5. We have seen that 
this equation will give us the value of B next to the current sheet. Also, for any 
loop drawn outside the sheet but behind the wavefront, there is no curl of B nor 
any j or changing E, so the equation is correct there. Now let’s look at what hap- 
pens for the curve V i that intersects the wavefront, as shown in Fig. 18-5. Here 
there are no currents, so Eq. (18.1 1) can be written — in integral form — as 



B * ds 



inside T 


E* nda. 


(18.12) 


The line integral of B is just B times L. The rate of change of the flux of E is due 
only to the advancing wavefront. The area inside T u where E is not zero, is in- 
creasing at the rate vL . The right-hand side of Eq. (18.12) is then vLE . That equa- 


tion becomes 


c 2 B = Ev. 


(18.13) 


We have a solution in which we have a constant B and a constant E behind 
the front, both at right angles to the direction in which the front is moving and at 
right angles to each other. Maxwell’s equations specify the ratio of E to B. From 
Eqs. (18.10) and (18.13), 

c 2 

E = vB , and E = — B. 

v 


But one moment! We have found two different conditions on the ratio E/B . Can 
such a field as we describe really exist? There is, of course, only one velocity v for 
which both of these equations can hold, namely v ~ c. The wavefront must 
travel with the velocity c. We have an example in which the electrical influence 
from a current propagates at a certain finite velocity c . 
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Now let’s ask what happens if we suddenly stop the motion of the charged 
sheet after it has been on for a short time T. We can see what will happen by the 
principle of superposition. We had a current that was zero and then was suddenly 
turned on. We know the solution for that case. Now we are going to add another 
set of fields. We take another charged sheet and suddenly start it moving, in the 
opposite direction with the same speed, only at the time T after we started the first 
current. The total current of the two added together is first zero, then on for a 
time T, then off again — because the two currents cancel. We have a square 
“pulse” of current. 

The new negative current produces the same fields as the positive one, only 
with all the signs reversed and, of course, delayed in time by T. A wavefront again 
travels out at the velocity c. At the time t it has reached the distance x = 
=*=c (t — T), as shown in Fig. 18-4(b). So we have two “blocks” of field marching 
out at the speed c, as in parts (a) and (b) of Fig. 18-4. The combined fields are as 
shown in part (c) of the figure. The fields are zero for x > ct , they are constant 
(with the values we found above) between x = c(t — T) and x = ct , and again 
zero for x < c(t — T). 

In short, we have a little piece of field — a block of thickness cT— which has 
left the current sheet and is travelling through space all by itself. The fields have 
“taken off”; they are propagating freely through space, no longer connected in any 
way with the source. The caterpillar has turned into a butterfly! 

How can this bundle of electric and magnetic fields maintain itself? The an- 
swer is: by the combined effects of the Faraday law, v X E = —dB/dt, and the 
new term of Maxwell, c 2 V X B = dE/dt. They cannot help maintaining them- 
selves. Suppose the magnetic field were to disappear. There would be a changing 
magnetic field which would produce an electric field. If this electric field tries to 
go away, the changing electric field would create a magnetic field back again. So 
by a perpetual interplay — by the swishing back and forth from one field to the 
other — they must go on forever. It is impossible for them to disappear.* They 
maintain themselves in a kind of a dance — one making the other, the second making 
the first — propagating onward through space. 

18-5 The speed of light 

We have a wave which leaves the material source and goes outward at the 
velocity c, which is the speed of light. But let’s go back a moment. From a his- 
torical point of view, lit wasn’t known that the coefficient c in Maxwell’s equations 
was also the speed of light propagation. There was just a constant in the equations. 
We have called it c from the beginning, because we knew what it would turn out 
to be. We didn’t think it would be sensible to make you learn the formulas with a 
different constant and then go back to substitute c wherever it belonged. From the 
point of view of electricity and magnetism, however, we just start out with two 
constants, € 0 and c 2 , that appear in the equations of electrostatics and magneto- 
statics: 

V • E = -£■ (18.14) 

€0 
and 

VXB = -^- (18.15) 

If we take any arbitrary definition of a unit of charge, we can determine experi- 
mentally the constant e 0 required in Eq. (18.14) — say by measuring the force 
between two unit charges at rest, using Coulomb’s law. We must also determine 
experimentally the constant € 0 c 2 that appears in Eq. (18.15), which we can do, say, 
by measuring the force between two unit currents. (A unit current means one unit 
of charge per second.) The ratio of these two experimental constants is c 2 — just 
another “electromagnetic constant.” 


* Well, not quite. They can be “absorbed” if they get to a region where there are charges. 
By which we mean that other fields can be produced somewhere which superpose on these 
fields and “cancel” them by destructive interference (see Chapter 31, Vol. I). 
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Notice now that this constant c 2 is the same no matter what we choose for 
our unit of charge. If we put twice as much “charge” — say twice as many proton 
charges — in our “unit” of charge, e 0 would need to be one-fourth as large. When 
we pass two of these “unit” currents through two wires, there will be twice as much 
“charge” per second in each wire, so the force between two wires is four times 
larger. The constant € 0 c 2 must be reduced by one-fourth. But the ratio € 0 c 2 /eo 
is unchanged. 

So just by experiments with charges and currents we find a number c 2 which 
turns out to be the square of the velocity of propagation of electromagnetic in- 
fluences. From static measurements — by measuring the forces between two unit 
charges and between two unit currents — we find that c = 3.00 X 10 8 meters/sec. 
When Maxwell first made this calculation with his equations, he said that bundles 
of electric and magnetic fields should be propagated at this speed. He also re- 
marked on the mysterious coincidence that this was the same as the speed of light. 
“We can scarcely avoid the inference,” said Maxwell, “that light consists in the 
transverse undulations of the same medium which is the cause of electric and 
magnetic phenomena.” 

Maxwell had made one of the great unifications of physics. Before his time, 
there was light, and there was electricity and magnetism. The latter two had been 
unified by the experimental work of Faraday, Oersted, and Ampere. Then, all 
of a sudden, light was no longer “something else,” but was only electricity and 
magnetism in this new form — little pieces of electric and magnetic fields which 
propagate through space on their own. 

We have called your attention to some characteristics of this special solution, 
which turn out to be true, however, for any electromagnetic wave: that the mag- 
netic field is perpendicular to the direction of motion of the wavefront; that the 
electric field is likewise perpendicular to the direction of motion of the wavefront; 
and that the two vectors E and B are perpendicular to each other. Furthermore, 
the magnitude of the electric field E is equal to c times the magnitude of the 
magnetic field B . These three facts — that the two fields are transverse to the direc- 
tion of propagation, that B is perpendicular to E, and that E = cB — are generally 
true for any electromagnetic wave. Our special case is a good one — it shows all 
the main features of electromagnetic waves. 

18-6 Solving Maxwell’s equations; the potentials and the wave equation 

Now we would like to do something mathematical; we want to write Maxwell’s 
equations in a simpler form. You may consider that we are complicating them, 
but if you will be patient a little bit, they will suddenly come out simpler. Although 
by this time you are thoroughly used to each of the Maxwell equations, there are 
many pieces that must all be put together. That’s what we want to do. 

We begin with V * B = 0 — the simplest of the equations. We know that it 
implies that B is the curl of something. So, if we write 

B = V X A, (18.16) 

we have already solved one of Maxwell’s equations. (Incidentally, you appreciate 
that it remains true that another vector A r would be just as good if A* = A + 

— where \p is any scalar field — because the curl of V\p is zero, and B is still the same. 
We have talked about that before.) 

We take next the Faraday law, V X E = —dB/dt, because it doesn’t involve 
any currents or charges. If we write B as V X A and differentiate with respect to 
t, we can write Faraday’s law in the form 

VXE.-lvXA. 

Since we can differentiate either with respect to time or to space first, we can also 
write this equation as 

(18.17) 


t x ( E + d 4) - °- 
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We see that E + dA/dt is a vector whose curl is equal to zero. Therefore that vec- 
tor is the gradient of something. When we worked on electrostatics, we had 
V X E = 0, and then we decided that E itself was the gradient of something. 
We took it to be the gradient of —0 (the minus for technical convenience). We 
do the same thing for E + dA/dt ; we set 

£+^=-V*. (18.18) 

We use the same symbol <t> so that, in the electrostatic case where nothing changes 
with time and the dA/dt term disappears, E will be our old So Faraday’s 

equation can be put in the form 

E = -V<f> - (18.19) 


We have solved two of Maxwell’s equations already, and we have found that 
to describe the electromagnetic fields E and B, we need four potential functions: 
a scalar potential <j> and a vector potential A, which is, of course, three functions. 

Now that A determines part of E, as well as B , what happens when we change 
A to A f = A + V^? In general, E would change if we didn’t take some special 
precaution. We can, however, still allow A to be changed in this way without 
affecting the fields E and B — that is, without changing the physics — if we always 
change A and <t> together by the rules 


A’ = A + V^, 


¥ = 0 


d$ 

It 


(18.20) 


Then neither B nor £, obtained from Eq. (18.19), is changed. 

Previously, we chose to make V • A — 0, to make the equations of statics 
somewhat simpler. We are not going to do that now; we are going to make a 
different choice. But we’ll wait a bit before saying what the choice is, because 
later it will be clear why the choice is made. 

Now we return to the two remaining Maxwell equations which will give us 
relations between the potentials and the sources p and / Once we can determine A 
and <f> from the currents and charges, we can always get E and B from Eqs. (18.16) 
and (18.19), so we will have another form of Maxwell’s equations. 

We begin by substituting Eq. (18.19) into V • E — p/e 0 ; we get 



which we can write also as 

-V 2 <t>~iv-A= f. (18.21) 

or Co 


This is one equation relating <f> and A to the sources. 

Our final equation will be the most complicated, 
fourth Maxwell equation as 


c 2 V 


X B dt « 0 


We start by rewriting the 


and then substitute for B and E in terms of the potentials, using Eqs. (18.16) 
and (18.19): 

.*TX(»X 

The first term can be rewritten using the algebraic identity: V X (V X A) = 
V(V * A) — V 2 A; we get 

~c 2 V 2 A + c 2 V(V ' 4) + ^ V* + ^ ~ ( 18 - 22 ) 


It’s not very simple! 
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Fortunately, we can now make use of our freedom to choose arbitrarily the 
divergence of A. What we are going to do is to use our choice to fix things so that 
the equations for A and for 0 are separated but have the same form. We can do 
this by taking* 


1 ^ 

2 dt 


(18.23) 


When we do that, the two middle terms in A and 0 in Eq. (18.22) cancel, and that 
equation becomes much simpler: 


V 2 A - 


JL *lA 

c 2 dt 2 


j 

€ 0 C 2 


(18.24) 


And our equation for 0 — Eq. (18.21) — takes on the same form: 


V 2 * - ± 

* c 2 dt 2 


P _ . 

Co 


(18.25) 


What a beautiful set of equations! They are beautiful, first, because they are 
nicely separated — with the charge density, goes <f > ; with the current, goes A . Further- 
more, although the left side looks a little funny — a Laplacian together with a 
{d/di) 2 — when we unfold it we see 


d 2 <t> . d 2 <j> . d 2 <t> i 

dx 2 dy 2 dz 2 c 2 dt 2 €o 


(18.26) 


It has a nice symmetry in x, y, z , t — the — 1/c 2 is necessary because, of course, 
time and space are different; they have different units. 

Maxwell’s equations have led us to a new kind of equation for the potentials 
<f> and A but to the same mathematical form for all four functions </>, A x , A y , and 
A z . Once we learn how to solve these equations, we can get B and E from 
V X A and *- V<f> — dA/dt. We have another form of the electromagnetic laws 
exactly equivalent to Maxwell’s equations, and in many situations they are much 
simpler to handle. 

We have, in fact, already solved an equation much like Eq. (18.26). When 
we studied sound in Chapter 47 of Vol. I, we had an equation of the form 


= J_ aV 

dx 2 c 2 dt 2 


and we saw that it described the propagation of waves in the x-direction at the 
speed c. Equation (18.26) is the corresponding wave equation for three dimensions. 
So in regions where there are no longer any charges and currents, the solution of 
these equations is not that <j> and A are zero. (Although that is indeed one possible 
solution.) There are solutions in which there is some set of <£ and A which are 
changing in time but always moving out at the speed c. The fields travel onward 
through free space, as in our example at the beginning of the chapter. 

With Maxwell’s new term in Eq. IV, we have been able to write the field equa- 
tions in terms of A and 0 in a form that is simple and that makes immediately 
apparent that there are electromagnetic waves. For many practical purposes, it 
will still be convenient to use the original equations in terms of E and B . But 
they are on the other side of the mountain we have already climbed. Now we are 
ready to cross over to the other side of the peak. Things will look different — we are 
ready for some new and beautiful views. 


* Choosing the V • A is called “choosing a gauge.” Changing A by adding V\p is called 
a “gauge transformation.” Equation (18.23) is called “the Lorentz gauge.” 
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19 


The Principle of Least Action 



A special lecture— almost verbatim* 


“When I was in high school, my physics teacher — whose name was Mr. Bader 
— called me down one day after physics class and said, ‘You look bored; I want to 
tell you something interesting.’ Then he told me something which I found ab- 
solutely fascinating, and have, since then, always found fascinating. Every time 
the subject comes up, I work on it. In fact, when I began to prepare this lecture 
I found myself making more analyses on the thing. Instead of worrying about the 
lecture, I got involved in a new problem. The subject is this — the principle of 
least action. 

“Mr. Bader told me the following: Suppose you have a particle (in a gravita- 
tional field, for instance) which starts somewhere and moves to some other point 
by free motion — you throw it, and it goes up and comes down, 

It goes from the original place to the final place in a certain amount of time. Now, 
you try a different motion. Suppose that to get from here to there, it went like this 

» 

but got there in just the same amount of time. Then he said this: If you calculate 
the kinetic energy at every moment on the path, take away the potential energy, 
and integrate it over the time during the whole path, you’ll find that the number 
you’ll get is bigger than that for the actual motion. 


* Later chapters do not depend on the material of this special lecture — which is in- 
tended to be for “entertainment.” 
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“In other words, the laws of Newton could be stated not in the form F = ma 
but in the form: the average kinetic energy less the average potential energy is as 
little as possible for the path of an object going from one point to another. 

“Let me illustrate a little bit better what it means. If you take the case of the 
gravitational field, then if the particle has the path x(t) (let’s just take one dimension 
for a moment; we take a trajectory that goes up and down and not sideways), 
where x is the height above the ground, the kinetic energy is \m (dx/dt) 2 , and the 
potential energy at any time is mgx. Now I take the kinetic energy minus the 
potential energy at every moment along the path and integrate that with respect 
to time from the initial time to the final time. Let’s suppose that at the original 
time t\ we started at some height and at the end of the time t 2 we are definitely 
ending at some other place. 

“Then the integral is 

/, [i m ($y - mex ] d ‘- 

The actual motion is some kind of a curve — it’s a parabola if we plot against the 
time— and gives a certain value for the integral. But we could imagine some other 
motion that went very high and came up and down in some peculiar way. 

■■ ■ » 

We can calculate the kinetic energy minus the potential energy and integrate for 
such a path ... or for any other path we want. The miracle is that the true path is 
the one for which that integral is least. 

“Let’s try it out. First, suppose we take the case of a free particle for which 
there is no potential energy at all. Then the rule says that in going from one point 
to another in a given amount of time, the kinetic energy integral is least, so it must 
go at a uniform speed. (We know that’s the right answer — to go at a uniform speed.) 
Why is that? Because if the particle were to go any other way, the velocities would 
be sometimes higher and sometimes lower than the average. The average velocity 
is the same for every case because it has to get from ‘here’ to ‘there’ in a given 
amount of time. 

“As an example, say your job is to start from home and get to school in a given 
length of time with the car. You can do it several ways: You can accelerate like 
mad at the beginning and slow down with the brakes near the end, or you can go 
at a uniform speed, or you can go backwards for a while and then go forward, 
and so on. The thing is that the average speed has got to be, of course, the total 
distance that you have gone over the time. But if you do anything but go at a uni- 
form speed, then sometimes you are going too fast and sometimes you are going 
too slow. Now the mean square of something that deviates around an average, as 
you know, is always greater than the square of the mean; so the kinetic energy 
integral would always be higher if you wobbled your velocity than if you went at a 
uniform velocity. So we see that the integral is a minimum if the velocity is a 
constant (when there are no forces). The correct path is like this. 

“Now, an object thrown up in a gravitational field does rise faster first and 
then slow down. That is because there is also the potential energy, and we must 
have the least difference of kinetic and potential energy on the average. Because 
the potential energy rises as we go up in space, we will get a lower difference if we 
can get as soon as possible up to where there is a high potential energy. Then we 
can take that potential away from the kinetic energy and get a lower average. So 
it is better to take a path which goes up and gets a lot of negative stuff from the 
potential energy. 

“On the other hand, you can’t go up too fast, or too far, because you will then 
have too much kinetic energy involved — you have to go very fast to get way 
up and come down again in the fixed amount of time available. So you don’t want 
to go too far up, but you want to go up some. So it turns out that the solution is 
some kind of balance between trying to get more potential energy with the least 
amount of extra kinetic energy — trying to get the difference, kinetic minus the 
potential, as small as possible. 
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“That is all my teacher told me, because he was a very good teacher and knew 
when to stop talking. But I don’t know when to stop talking. So instead of leaving 
it as an interesting remark, I am going to horrify and disgust you with the complexi- 
ties of life by proving that it is so. The kind of mathematical problem we will 
have is very difficult and a new kind. We have a certain quantity which is called 
the action , 5. It is the kinetic energy, minus the potential energy, integrated over 
time. 

Action = S = f h (KE - PE) dt. 

Remember that the PE and KE are both functions of time. For each different 
possible path you get a different number for this action. Our mathematical problem 
is to find out for what curve that number is the least. 

“You say — Oh, that’s just the ordinary calculus of maxima and minima. 
You calculate the action and just differentiate to find the minimum. 

“But watch out. Ordinarily we just have a function of some variable, and we 
have to find the value of that variable where the function is least or most. For 
instance, we have a rod which has been heated in the middle and the heat is spread 
around. For each point on the rod we have a temperature, and we must find the 
point at which that temperature is largest. But now for each path in space we have 
a number — quite a different thing — and we have to find the path in space for which 
the number is the minimum. That is a completely different branch of mathematics. 
It is not the ordinary calculus. In fact, it is called the calculus of variations. 

“There are many problems in this kind of mathematics. For example, the 
circle is usually defined as the locus of all points at a constant distance from a 
fixed point, but another way of defining a circle is this: a circle is that curve of 
given length which encloses the biggest area. Any other curve encloses less area for 
a given perimeter than the circle does. So if we give the problem: find that curve 
which encloses the greatest area for a given perimeter, we would have a problem 
of the calculus of variations — a different kind of calculus than you’re used to. 

“So we make the calculation for the path of an object. Here is the way we 
are going to do it. The idea is that we imagine that there is a true path and that 
any other curve we draw is a false path, so that if we calculate the action for the 
false path we will get a value that is bigger than if we calculate the action for the 
true path. 

“Problem: Find the true path. Where is it? One way, of course, is to calculate 
the action for millions and millions of paths and look at which one is lowest. 
When you find the lowest one, that’s the true path. 

“That’s a possible way, But we can do it better than that. When we have a 
quantity which has a minimum — for instance, in an ordinary function like the 
temperature — one of the properties of the minimum is that if we go away from the 
minimum in the first order, the deviation of the function from its minimum value 
is only second order. At any place else on the curve, if we move a small distance 
the value of the function changes also in the first order. But at a minimum, a tiny 
motion away makes, in the first approximation, no difference. 

“That is what we are going to use to calculate the true path. If we have the 
true path, a curve which differs only a little bit from it will, in the first approxima- 
tion, make no difference in the action. Any difference will be in the second 
approximation, if we really have a minimum. 

“That is easy to prove. If there is a change in the first order when I deviate 
the curve a certain way, there is a change in the action that is proportional to the 
deviation. The change presumably makes the action greater; otherwise we haven’t 
got a minimum. But then if the change is proportional to the deviation, reversing 
the sign of the deviation will make the action less. We would get the action to 
increase one way and to decrease the other way. The only way that it could really 
be a minimum is that in the first approximation it doesn’t make any change, that 
the changes are proportional to the square of the deviations from the true path. 
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“So we work it this way: We call x(/) (with an underline) the true path — the 
one we are trying to find. We take some trial path x(f) that differs from the true 
path by a small amount which we will call rj( i ) (eta of i ). 

“Now the idea is that if we calculate the action S for the path x(f), then the 
difference between that S and the action that we calculated for the path x(t) — to 
simplify the writing we can call it S — the difference of S and S must be zero in 
the first-order approximation of small 77. It can differ in the second order, but 
in the first order the difference must be zero. 

“And that must be true for any 77 at all. Well, not quite. The method doesn’t 
mean anything unless you consider paths which all begin and end at the same two 
points — each path begins at a certain point at t\ and ends at a certain other point 
at t 2 , and those points and times are kept fixed. So the deviations in our t? have to 
be zero at each end, t;(/i) = 0 arid rj(t 2 ) = 0. With that condition, we have speci- 
fied our mathematical problem. 

“If you didn’t know any calculus, you might do the same kind of thing to 
find the minimum of an ordinary function fix). You could discuss what happens 
if you take fix) and add a small amount h to x and argue that the correction to fix) 
in the first order in h must be zero at the minimum. You would substitute x + h 
for x and expand out to the first order in h . . . just as we are going to do with 77. 

“The idea is then that we substitute xit) = x(f) + rjit) in the formula for 
the action: 



where I call the potential energy K(x). The derivative dx/dt is, of course, the 
derivative of x(0 plus the derivative of rj(0, so for the action I get this expression: 



“Now I must write this out in more detail. 

/*V , ~ d£ dy , 

\dt) dt dt^ 


Vix + 77) 


dt 


For the squared term I get 



But wait. I’m not worrying about higher than the first order, so I will take all the 
terms which involve r/ 2 and higher powers and put them in a little box called 
‘second and higher order.’ From this term I get only second order, but there will 
be more from something else. So the kinetic energy part is 


~ + m "fit Ifi ^ secon< ^ an< * higher order). 

“Now we need the potential V at x + 77. I consider 77 small, so I can write 
Vix) as a Taylor series. It is approximately Vix); in the next approximation 
(from the ordinary nature of derivatives) the correction is 77 times the rate of change 
of V with respect to x, and so on: 

V(x + „) = V(x) + V V'(x) + V"(x) + ■■■ 


I have written V 9 for the derivative of V with respect to x in order to save writing. 
The term in rj 2 and the ones beyond fall into the ‘second and higher order’ category 
and we don’t have to worry about them. Putting it all together, 


— f * 2 m 

” Jt x L2 \di) 


jss x . dx dr) 

v ^ +m din 


— 77 K'(x) + (second and higher order) dt , 
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Now if we look carefully at the thing, we see that the first two terms which I have 
arranged here correspond to the action S that I would have calculated with the 
true path x. The thing I want to concentrate on is the change in S— the difference 
between the S and the S that we would get for the right path. This difference we 
will write as SS, called the variation in S. Leaving out the ‘second and higher 
order’ terms, I have for SS 


“Now the problem is this: Here is a certain integral. I don’t know what the 
x is yet, but I do know that no matter what ij is, this integral must be zero. Well, 
you think, the only way that that can happen is that what multiplies i? must be 
zero. But what about the first term with dn/dt’l Well, after all, if ij can be anything 
at all, its derivative is anything also, so you conclude that the coefficient of dy/dt 
must also be zero. That isn’t quite right. It isn’t quite right because there is a 
connection between n and its derivative; they are not absolutely independent, 
because jj( 0 must be zero at both ti and r 2 . 

“The method of solving all problems in the calculus of variations always uses 
the same general principle. You make the shift in the thing you want to vary 
(as we did by adding ij); you look at the first-order terms; then you always arrange 
things in such a form that you get an integral of the form ‘some kind of stuff times 
the shift (ij),’ but with no other derivatives (no dy/dt). It must be rearranged so it 
is always ‘something’ times tj. You will see the great value of that in a minute. 
(There are formulas that tell you how to do this in some cases without actually 
calculating, but they are not general enough to be worth bothering about; the best 
way is to calculate it out this way.) 

“How can I rearrange the term in drj/dt to make it have an jj? I can do that 
by integrating by parts. It turns out that the whole trick of the calculus of variations 
consists of writing down the variation of S and then integrating by parts so that 
the derivatives of ij disappear. It is always the same in every problem in which 
derivatives appear. 

“You remember the general principle for integrating by parts. If you have 
any function f times dr\/dt integrated with respect to t, you write down the derivative 
of iif: 

d / f\ _„<*/, f dl > 

dt ivf) ~ n di + f di' 


The integral you want is over the last term, so 

“In our formula for SS, the function /is m times dx/dt ; therefore, I have the 
following formula for SS. 


SS = 


dx 

"a”*') 




K'(*) ,(!)<*• 


The first term must be evaluated at the two limits t x and t 2 . Then I must have the 
integral from the rest of the integration by parts. The last term is brought down 
without change. 

“Now comes something which always happens — the integrated part disappears. 
(In fact, if the integrated part does not disappear, you restate the principle, adding 
conditions to make sure it does!) We have already said that ti must be zero at both 
ends of the path, because the principle is that the action is a minimum provided 
that the varied curve begins and ends at the chosen points. The condition is that 
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7j(/i) = 0, and rj(/ 2 ) = 0. So the integrated term is zero. We collect the other 
terms together and obtain this: 

8S = f - F'(*)]i,(0*. 

The variation in S is now the way we wanted it — there is the stuff in brackets, say 
F, all multiplied by 77(f) and integrated from 1 1 to t 2 . 

“We have that an integral of something or other times 77(f) is always zero: 

J F(t) v(0 dt = 0. 

I have some function of /; I multiply it by 77(f); and I integrate it from one end to 
the other. And no matter what the 77 is, I get zero. That means that the function 
F(t ) is zero. That’s obvious, but anyway I’ll show you one kind of proof. 

“Suppose that for 77(f) I took something which was zero for all f except right 
near one particular value. It stays zero until it gets to this f, 

then it blips up for a moment and blips right back down. When we do the integral 
of this 77 times any function F, the only place that you get anything other than zero 
was where 77(f) was blipping, and then you get the value of F at that place times the 
integral over the blip. The integral over the blip alone isn’t zero, but when multi- 
plied by F it has to be; so the function F has to be zero where the blip was. But 
the blip was anywhere I wanted to put it, so F must be zero everywhere. 

“We see that if our integral is zero for any 77, then the coefficient of tj must be 
zero. The action integral will be a minimum for the path that satisfies this compli- 
cated differential equation: 

[-»$-*"<*>] -0. 

It’s not really so complicated ; you have seen it before. It is just F = ma. The first 
term is the mass times acceleration, and the second is the derivative of the potential 
energy, which is the force. 

“So, for a conservative system at least, we have demonstrated that the principle 
of least action gives the right answer; it says that the path that has the minimum 
action is the one satisfying Newton’s law. 

“One remark: I did not prove it was a minimum — maybe it’s a maximum. In 
fact, it doesn’t really have to be a minimum. It is quite analogous to what we found 
for the ‘principle of least time’ which we discussed in optics. There also, we said 
at first it was ‘least’ time. It turned out, however, that there were situations in which 
it wasn’t the least time. The fundamental principle was that for any first-order 
variation away from the optical path, the change in time was zero; it is the same 
story. What we really mean by ‘least’ is that the first-order change in the value 
of S, when you change the path, is zero. It is not necessarily a ‘minimum.’ 

“Next, I remark on some generalizations. In the first place, the thing can be 
done in three dimensions. Instead of just x, I would have x, y, and z as functions 
of t; the action is more complicated. For three-dimensional motion, you have to 
use the complete kinetic energy — (m/2) times the whole velocity squared. That is, 

" -?[($)■ +@)‘+(S)T 

Also, the potential energy is a function of x, y, and z. And what about the path? 
The path is some general curve in space, which is not so easily drawn, but the idea 
is the same. And what about the 77? Well, 77 can have three components. You 
could shift the paths in x, or in y, or in z — or you could shift in all three directions 
simultaneously. So 77 would be a vector. This doesn’t really complicate things too 
much, though. Since only the first-order variation has to be zero, we can do the 
calculation by three successive shifts. We can shift tj only in the x-direction and 
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say that coefficient must be zero. We get one equation. Then we shift it in the 
y-direction and get another. And in the z-direction and get another. Or, of course, 
in any order that you want. Anyway, you get three equations. And, of course, 
Newton’s law is really three equations in the three dimensions — one for each com- 
ponent. I think that you can practically see that it is bound to work, but we will 
leave you to show for yourself that it will work for three dimensions. Incidentally, 
you could use any coordinate system you want, polar or otherwise, and get Newton’s 
laws appropriate to that system right off by seeing what happens if you have the 
shift in radius, or in angle, etc. 

“Similarly, the method can be generalized to any number of particles. If you 
have, say, two particles with a force between them, so that there is a mutual 
potential energy, then you just add the kinetic energy of both particles and take 
the potential energy of the mutual interaction. And what do you vary? You 
vary the paths of both particles. Then, for two particles moving in three dimensions, 
there are six equations. You can vary the position of particle 1 in the x-direction, 
in the y-direction, and in the z-direction, and similarly for particle 2; so there are 
six equations. And that’s as it should be. There are the three equations that deter- 
mine the acceleration of particle 1 in terms of the force on it and three for the ac- 
celeration of particle 2, from the force on it. You follow the same game through, 
and you get Newton’s law in three dimensions for any number of particles. 

“I have been saying that we get Newton’s law. That is not quite true, because 
Newton’s law includes nonconservative forces like friction. Newton said that ma 
is equal to any F. But the principle of least action only works for conservative 
systems — where all forces can be gotten from a potential function. You know, 
however, that on a microscopic level — on the deepest level of physics — there are 
no nonconservative forces. Nonconservative forces, like friction, appear only be- 
cause we neglect microscopic complications — there are just too many particles to 
analyze. But the fundamental laws can be put in the form of a principle of least 
action. 

“Let me generalize still further. Suppose we ask what happens if the particle 
moves relativistically. We did not get the right relativistic equation of motion; 
F = ma is only right nonrelativistically. The question is: Is there a corresponding 
principle of least action for the relativistic case? There is. The formula in the case 
of relativity is the following: 

S = —m 0 c 2 [ H y/\ — c 2 dt — q f h [« <t>(x , y, z, t) — v • A(x , y, z, *)] dt. 

Jt t Jt x 

The first part of the action integral is the rest mass m 0 times c 2 times the integral 
of a function of velocity, \/l — v%/c 2 . Then instead of just the potential energy, 
we have an integral over the scalar potential <f> and over v times the vector potential 
A . Of course, we are then including only electromagnetic forces. All electric and 
magnetic fields are given in terms of <p and A. This action function gives the com- 
plete theory of relativistic motion of a single particle in an electromagnetic field. 

“Of course, wherever I have written v, you understand that before you try to 
figure anything out, you must substitute dx/dt for v x and so on for the other com- 
ponents. Also, you put the point along the path at time t, x(t) 9 y(t ) , z(t) where I wrote 
simply x, y, z. Properly, it is only after you have made those replacements for the 
i/’s that you have the formula for the action for a relativistic particle. I will leave 
to the more ingenious of you the problem to demonstrate that this action formula 
does, in fact, give the correct equations of motion for relativity. May I suggest 
you do it first without the A, that is, for no magnetic field? Then you should get 
the components of the equation of motion, dp/dt = —q v<j>, where, you remember, 
p — mv/yf 1 — v%/c 2 . 

“It is much more difficult to include also the case with a vector potential. 
The variations get much more complicated. But in the end, the force term does 
come out equal to q(E + v X B), as it should. But I will leave that for you to 
play with. 

“I would like to emphasize that in the general case, for instance in the rela- 
tivistic formula, the action integrand no longer has the form of the kinetic energy 
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minus the potential energy. That’s only true in the nonrelativistic approximation. 
For example, the term m 0 c 2 \/ 1- — v 2 /c 2 is not what we have called the kinetic 
energy. The question of what the action should be for any particular case must 
be determined by some kind of trial and error. It is just the same problem as deter- 
mining what are the laws of motion in the first place. You just have to fiddle around 
with the equations that you know and see if you can get them into the form of the 
principle of least action. 

“One other point on terminology. The function that is integrated over time 
to get the action S is called the Lagrangian , <£, which is a function only of the 
velocities and positions of particles. So the principle of least action is also written 

S = ( h £(Xi, Vi) dt, 

Jh 

where by and v t are meant all the components of the positions and velocities. 
So if you hear someone talking about the ‘Lagrangian,’ you know they are talking 
about the function that is used to find 5. For relativistic motion in an electro- 
magnetic field 

£ = — m 0 c 2 V I — v*Tc 2 ~ q(<t> + v A). 

“Also, I should say that S is not really called the ‘action’ by the most precise 
and pedantic people. It is called ‘Hamilton’s first principal function.’ Now I hate 
to give a lecture on ‘the-principle-of-least-Hamilton’s-first-principal-fimction.’ 
So I call it ‘the action.’ Also, more and more people are calling it the action. You 
see, historically something else which is not quite as useful was called the action, 
but I think it’s more sensible to change to a newer definition. So now you too 
will call the new function the action, and pretty soon everybody will call it by that 
simple name. 

“Now I want to say some things on this subject which are similar to the dis- 
cussions I gave about the principle of least time. There is quite a difference in the 
characteristic of a law which says a certain integral from one place to another is a 
minimum — which tells something about the whole path — and of a law which says 
that as you go along, there is a force that makes it accelerate. The second way tells 
how you inch your way along the path, and the other is a grand statement about the 
whole path. In the case of light, we talked about the connection of these two. 
Now, I would like to explain why it is true that there are differential laws when 
there is a least action principle of this kind. The reason is the following: Consider 
the actual path in space and time. As before, let’s take only one dimension, so 
we can plot the graph of x as a function of /. Along the true path, S is a minimum. 
Let’s suppose that we have the true path and that it goes through some point a 
in space and time, and also through another nearby point b . 

Now if the entire integral from ti to / 2 is a minimum, it is also necessary that the 
integral along the little section from a to b is also a minimum. It can’t be that the 
part from a to b is a little bit more. Otherwise you could just fiddle with just that 
piece of the path and make the whole integral a little lower. 

“So every subsection of the path must also be a minimum. And this is true 
no matter how short the subsection. Therefore, the principle that the whole path 
gives a minimum can be stated also by saying that an infinitesimal section of path 
also has a curve such that it has a minimum action. Now if we take a short enough 
section of path — between two points a and b very close together — how the potential 
varies from one place to another far away is not the important thing, because you 
are staying almost in the same place over the whole little piece of the path. The 
only thing that you have to discuss is the first-order change in the potential. The 
answer can only depend on the derivative of the potential and not on the potential 
everywhere. So the statement about the gross property of the whole path becomes 
a statement of what happens for a short section of the path — a differential statement. 
And this differential statement only involves the derivatives of the potential, that 
is, the force at a point. That’s the qualitative explanation of the relation between 
the gross law and the differential law. 
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“In the case of light we also discussed the question: How does the particle 
find the right path? From the differential point of view, it is easy to understand. 
Every moment it gets an acceleration and knows only what to do at that instant. 
But all your instincts on cause and effect go haywire when you say that the particle 
decides to take the path that is going to givethe minimum action. Does it ‘smell’ 
the neighboring paths to find out whether or not they have more action? In the 
case of light, when we put blocks in the way so that the photons could not test all 
the paths, we found that they couldn’t figure out which way to go, and we had the 
phenomenon of diffraction. 

“Is the same thing true in mechanics? Is it true that the particle doesn’t just 
‘take the right path’ but that it looks at all the other possible trajectories? And if 
by having things in the way, we don’t let it look, that we will get an analog of 
diffraction? The miracle of it all is, of course, that it does just that. That’s what 
the laws of quantum mechanics say. So our principle of least action is incom- 
pletely stated. It isn’t that a particle takes the path of least action but that it 
smells all the paths in the neighborhood and chooses the one that has the least 
action by a method analogous to the one by which light chose the shortest time. 
You remember that the way light chose the shortest time was this: If it went on a 
path that took a different amount of time, it would arrive at a different phase. And 
the total amplitude at some point is the sum of contributions of amplitude for all 
the different ways the light can arrive. All the paths that give wildly different 
phases don’t add up to anything. But if you can find a whole sequence of paths 
which have phases almost all the same, then the little contributions will add up and 
you get a reasonable total amplitude to arrive. The important path becomes the 
one for which there are many nearby paths which give the same phase. 

“It is just exactly the same thing for quantum mechanics. The complete 
quantum mechanics (for the nonrelativistic case and neglecting electron spin) 
works as follows: The probability that a particle starting at point 1 at the time 1 1 
will arrive at point 2 at the time t 2 is the square of a probability amplitude. The 
total amplitude can be written as the sum of the amplitudes for each possible path — 
fcft each way of arrival. For every x(t) that we could have — for every possible 
imaginary trajectory — we have to calculate an amplitude. Then we add them all 
together. Whatdo we take for the amplitude for each path? Our action integral 
tells us what the amplitude for a single path ought to be. The amplitude is pro- 
portional to some constant times e“ s/ *, where S is the action for that path. That 
is, if we represent the phase of the amplitude by a complex number, the phase angle 
is S/h. The action S has dimensions of energy times time, and Planck’s constant h 
has the same dimensions. It is the constant that determines when quantum me- 
chanics is important. 

“Here is how it works: Suppose that for all paths, S is very large compared to 
h. One path contributes a certain amplitude. For a nearby path, the phase is quite 
different, because with an enormous S even a small change in 5 means a completely 
different phase — because h is so tiny. So nearby paths will normally cancel their 
effects out in taking the sum — except for one region, and that is when a path and 
a nearby path all give the same phase in the first approximation (more precisely, 
the same action within h). Only those paths will be the important ones. So in the 
limiting case in which Planck’s constant h goes to zero, the correct quantum- 
mechanical laws. can be summarized by simply saying: ‘Forget about all these 
probability amplitudes. The particle does go on a special path, namely, that one for 
which S does not vary in the first approximation.’ That’s the relation between the 
principle of least action and quantum mechanics. The fact that quantum mechanics 
can be formulated in this way was discovered in 1942 by a student of that same 
teacher, Bader, I spoke of at the beginning of this lecture. [Quantum mechanics 
was originally formulated by giving a differential equation for the amplitude 
(Schrodinger) and also by some other matrix mathematics (Heisenberg).] 

“Now I want to talk about other minimum principles in physics. There are 
many very interesting ones. I will not try to list them all now but will only describe 
one more. Later on, when we come to a physical phenomenon which has a nice 
minimum principle, I will tell about it then. I want now to show that we can de- 
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scribe electrostatics, not by giving a differential equation for the field, but by saying 
that a certain integral is a maximum or a minimum. First, let’s take the case where 
the charge density is known everywhere, and the problem is to find the potential 0 
everywhere in space. You know that the answer should be 

V 2 0 = -p/e 0 . 

But another way of stating the same thing is this: Calculate the integral £/*, where 
u* = y / (V*) 2 dV - j p<t> dV , 

which is a volume integral to be taken over all space. This thing is a minimum 
for the correct potential distribution </>(x, y, z). 

“We can show that the two statements about electrostatics are equivalent. 
Let’s suppose that we pick any function <t>. We want to show that when we take 
for ^ the correct potential <f>, plus a small deviation /, then in the first order, the 
change in U* is zero. So we write 


<t> = 0 + /• 

The <l> is what we are looking for, but we are making a variation of it to find what 
it has to be so that the variation of U* is zero to first order. For the first part of 
U* 9 we need 

(V<*>) 2 = (V^) 2 + 2 V<£ • V/ + (v/) 2 . 

The only first-order term that will vary is 

2 V/. 

In the second term of the quantity (/*, the integrand is 


p<*> = p</> + p/, 

whose variable part is p/. So, keeping only the variable parts, we need the integral 
A U* = f (€ 0 V« • V/ - pf ) dV. 

“Now, following the old general rule, we have to get the darn thing all clear 
of derivatives of /. Let’s look at what the derivatives are. The dot product is 

d± + d A d JL + 

dx dx dy dy dz dz 9 

which we have to integrate with respect to x, to y, and to z. Now here is the trick: 
to get rid of df/dx we integrate by parts with respect to x. That will carry the 
derivative over onto the <£. It’s the same general idea we used to get rid of derivatives 
with respect to t . We use the equality 


/ 


d/ , _ ,50 

dx dx aX ~ J dx 


l 




The integrated term is zero, since we have to make / zero at infinity. (That corre- 
sponds to making v) zero at and / 2 - So our principle should be more accurately 
stated: U* is less for the true <j> than for any other <f>(x 9 y> z) having the same values 
at infinity.) Then we do4he same thing for y and z. So our integral A U* is 

A U* = f (~e 0 V 2 ± - p)fdV. 
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In order for this variation to be zero for any /, no matter what, the coefficient of 
/ must be zero and, therefore, 

V 2 « = -p/6 0 . 

We get back our old equation. So our ‘minimum’ proposition is correct. 

“We can generalize our proposition if we do our algebra in a little different 
way. Let’s go back and do our integration by parts without taking components. 
We start by looking at the following equality: 

V • (/V«) = Vf • V* + /TV 

If I differentiate out the left-hand side, I can show that it is just equal to the right- 
hand side. Now we can use this equation to integrate by parts. In our integral At/*, 
we replace — • Vfby fV 2 <)> — V * (/V0), which gets integrated over volume. 
The divergence term integrated over volume can be replaced by a surface integral: 

y V • (/ V4>) dV = JfVt-nda. 

Since we are integrating over all space, the surface over which we are integrating is 
at infinity. There, /is zero and we get the same answer as before. 

“Only now we see how to solve a problem when we don't know where all the 
charges are. Suppose that we have conductors with charges spread out on them in 
some way. We can still use our minimum principle if the potentials of all the 
conductors are fixed. We carry out the integral for U* only in the space outside 
of all conductors. Then, since we can’t vary <t> on the conductor, / is zero on all 
those surfaces, and the surface integral 

/ fV$-nda 

is still zero. The remaining volume integral 

At/* = j (-€ O V 2 0 - p±)fdV 

is only to be carried out in the spaces between conductors. Of course, we get 
Poisson’s equation again, 

V 2 * = — p/e 0 . 


So we have shown that our original integral U* is also a minimum if we evaluate 
it over the space outside of conductors all at fixed potentials (that is, such that any 
trial <t>(x , y, z) must equal the given potential of the conductors when x, y f z is a 
point on the surface of a conductor). 

“There is an interesting case when the only charges are on conductors. Then 

V* = J / (VtfO 2 dV. 


Our minimum principle says that in the case where there are conductors set at 
certain given potentials, the potential between them adjusts itself so that integral 
U* is least. What is this integral? The term V<f> is the electric field, so the integral 
is the electrostatic energy. The true field is the one, of all those coming from the 
gradient of a potential, with the minimum total energy. 

“I would like to use this result to calculate something particular to show you 
that these things are really quite practical. Suppose I take two conductors in the 
form of a cylindrical condenser. 

The inside conductor has the potential V, and the outside is at the potential zero. 
Let the radius of the inside conductor be a and that of the outside, b. Now we can 
suppose any distribution of potential between the two. If we use the correct <£, 
and calculate c 0 /2/ (V±) 2 dV, it should be the energy of the system, \CV 2 , 
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So we can also calculate C by our principle. But if we use a wrong distribution of 
potential and try to calculate the capacity C by this method, we will get a capacity 
that is too big, since V is specified. Any assumed potential <j> that is not the exactly 
correct one will give a fake C that is larger than the correct value. But if my false 
<t> is any rough approximation, the C will be a good approximation, because the 
error in C is second order in the error in <t>. 

“Suppose I don’t know the capacity of a cylindrical condenser. I can use this 
principle to find it. I just guess at the potential function <j> until I get the lowest C. 
Suppose, for instance, I pick a potential that corresponds to a constant field. (You 
know, of course, that the field isn’t really constant here; it varies as 1/r.) A field 
which is constant means a potential which goes linearly with distance. To fit the 
conditions at the two conductors, it must be 



This function is V at r = a, zero at r = ft, and in between has a constant slope 
equal to — V/(b — a). So what one does to find the integral U* is multiply the 
square of this gradient by e 0 /2 and integrate overall volume. Let’s do this cal- 
culation for a cylinder of unit length. A volume element at the radius r is 2wr dr . 
Doing the integral, I find that my first try at the capacity gives 

I CV°<Sm try) 2„ dr. 

The integral is easy; it is just 

So I have a formula for the capacity which is not the true one but is an approximate 
job: 

C ft a 

27T€ 0 2(ft — a) 

It is, naturally, different from the correct answer C — 27 r€ 0 /ln(ft/a), but it’s not 
too bad. Let’s compare it with the right answer for several values of ft/a. I have 
computed out the answers in this table: 


ft 

a 

Qrue 

2reo 

C (first approx.) 
2 tT€0 

2 

1.4423 

1.500 

4 

0.721 

0.833 

10 

0.434 

0.612 

100 

0.267 

0.51 

1.5 

2.4662 

2.50 

1.1 

10.492070 

10.500000 


Even when ft/a is as big as 2 — which gives a pretty big variation in the field com- 
pared with a linearly varying field — I get a pretty fair approximation. The answer 
is, of course, a little too high, as expected. The thing gets much worse if you have 
a tiny wire inside a big cylinder. Then the field has enormous variations and if you 
represent it by a constant, you’re not doing very well. With ft/a = 100, we’re off 
by nearly a factor of two. Things are much better for small ft/a. To take the op- 
posite extreme, when the conductors are not very far apart — say ft/a = 1.1 — then 
the constant field is a pretty good approximation, and we get the correct value for 
C to within a tenth of a percent. 

“Now I would like to tell you how to improve such a calculation. (Of course, 
you know the right answer for the cylinder, but the method is the same for some 
other odd shapes, where you may not know the right answer.) The next step is to 
try a better approximation to the unknown true <t>. For example, we might try a 
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constant plus an exponential etc. But how do you know when you have a better 
approximation unless you know the true <j>7 Answer: You calculate C; the lowest 
Cis the value nearest the truth. Let us try this idea out. Suppose that the potential 
is not linear but say quadratic in r — that the electric field is not constant but linear. 
The most general quadratic form that fits <t> = 0 at r = b and <t> = V at r = a is 

where a is any constant number. This formula is a little more complicated. It 
involves a quadratic term in the potential as well as a linear term. It is very easy 
to get the field out of it. The field is just 


E 


d<f> 

Tr 


aV 

b — a 


+ 2(1 + a) 


(r - a)V 
(b - a) 2 


Now we have to square this and integrate over volume. But wait a moment. What 
should I take for a? I can take a parabola for the <t>; but what parabola? Here’s 
what I do: Calculate the capacity with an arbitrary a. What I get is 


C 

2ire 0 



I 1 2 i 

+ 6 “ + 


r 

3. ' 


It looks a little complicated, but it comes out of integrating the square of the field. 
Now I can pick my a. I know that the truth lies lower than anything that I am 
going to calculate, so whatever I put in for a is going to give me an answer too big. 
But if I keep playing with a and get the lowest possible value I can, that lowest 
value is nearer to the truth than any other value. So what I do next is to pick the 
a that gives the minimum value for C. Working it out by ordinary calculus, I get 
that the minimum C occurs for a = —2 b/(b + a). Substituting that value into 
the formula, I obtain for the minimum capacity 


C _ b 2 + 4 ab + a 2 
2t€o 3 {b 2 — a 2 ) 

‘Tve worked out what this formula gives for C for various values of b/a. I 
call these numbers C(quadratic). Here is a table that compares C(quadratic) 
with the true C. 


b 

a 

Ctrue 

2x€0 

! C (quadratic) 

2x£0 

2 

1.4423 

1.444 

4 

0.721 

0.733 

10 

0.434 

0.475 

100 

0.267 

0.346 

1.5 

2.4662 

2.4667 

1.1 

10.492070 

10.492065 


“For example, when the ratio of the radii is 2 to 1, 1 have 1.444, which is a 
very good approximation to the true answer, 1.4423. Even for larger b/a, it stays 
pretty good — it is much, much better than the first approximation. It is even fairly 
good — only off by 10 percent — when b/a is 10 to 1 . But when it gets to be 100 to 1 — 
well, things begin to go wild. I get that C is 0.346 instead of 0.267. On the other 
hand, for a ratio of radii of 1.5, the answer is excellent; and for a b/a of 1.1, the 
answer comes out 10.492065 instead of 10.492070. Where the answer should be 
good, it is very, very good. 

“I have given these examples, first, to show the theoretical value of the princi- 
ples of minimum action and minimum principles in general and, second, to show 
their practical utility — not just to calculate a capacity when we already know the 
answer. For any other shape, you can guess an approximate field with some 
unknown parameters like a and adjust them to get a minimum. You will get ex- 
cellent numerical results for otherwise intractable problems.” 
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A note added after the lecture 

should like to add something that I didn’t have time for in the lecture. 
(I always seem to prepare more than 1 have time to tell about.) As I mentioned 
earlier, I got interested in a problem while working on this lecture. I want to tell 
you what that problem is. Among the minimum principles that I could mention, 
I noticed that most of them sprang in one way or another from the least action 
principle of mechanics and electrodynamics. But there is also a class that does not. 
As an example, if currents are made to go through a piece of material obeying 
Ohm’s law, the currents distribute themselves inside the piece so that the rate at 
which heat is generated is as little as possible. Also we can say (if things are kept 
isothermal) that the rate at which energy is generated is a minimum. Now, this 
principle also holds, according to classical theory, in determining even the dis- 
tribution of velocities of the electrons inside a metal which is carrying a current. 
The distribution of velocities is not exactly the equilibrium distribution [Chapter 
40, Vol. I; Eq. (40.6)] because they are drifting sideways The new distribution 
can be found from the principle that it is the distribution for a given current for 
which the entropy developed per second by collisions is as small as possible. The 
true description of the electrons’ behavior ought to be by quantum mechanics, 
however. The question is: Does the same principle of minimum entropy generation 
also hold when the situation is described quantum-mechanically? I haven’t found 
out yet. 

“The question is interesting academically, of course. Such principles are 
fascinating, and it is always worth while to try to see how general they are. But 
also from a more practical point of view, I want to know. I, with some colleagues, 
have published a paper in which we calculated by quantum mechanics approxi- 
mately the electrical resistance felt by an electron moving through an ionic crystal 
like NaCl. [Feynman, Hellworth, Iddings, and Platzman, “Mobility of Slow 
Electrons in a Polar Crystal,” Phys Rev . 127, 1004 (1962).] But if a minimum 
principle existed, we could use it to make the results much more accurate, just as 
the minimum principle for the capacity of a condenser permitted us to get such 
accuracy for that capacity even though we had only a rough knowledge of the elec- 
tric field.” 
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20 


Solutions of Maxwell 9 * Equations in 
Free Space 


20-1 Waves in free space; plane waves 


In Chapter 18 we had reached the point where we had the Maxwell equations 
in complete form. All there is to know about the classical theory of the electric 
and magnetic fields can be found in the four equations: 


I. V ■ E = ■£ 
Co 


III. V • B = 0 


II. 


V X E = 


SB 

dt 


IV. c 2 V X B = i- + ~ 
e 0 dt 


( 20 . 1 ) 


When we put all these equations together, a remarkable new phenomenon occurs: 
fields generated by moving charges can leave the sources and travel alone through 
space. We considered a special example in which an infinite current sheet is 
suddenly turned on. After the current has been on for the time r, there are uniform 
electric and magnetic fields extending out the distance ct from the source. Suppose 
that the current sheet lies in the jz-plane with a surface current density J going 
toward positive y. The electric field will have only a ^-component, and the mag- 
netic field, only a z-component. The magnitude of the field components is given by 

E v = cB z = - (20.2) 

for positive values of x less than ct. For larger x the fields are zero. There are, 
of course, similar fields extending the same distance from the current sheet in the 
negative x-direction. In Fig. 20-1 we show a graph of the magnitude of the fields 
as a function of x at the instant t . As time goes on, the “wavefront” at ct moves 
outward in x at the constant velocity c. 

Now consider the following sequence of events. We turn on a current of unit 
strength for a while, then suddenly increase the current strength to three units, 
and hold it constant at this value. What do the fields look like then? We can see 
what the fields will look like in the following way. First, we imagine a current of 
unit strength that is turned on at t = 0 and left constant forever. The fields for 
positive x are then given by the graph in part (a) of Fig. 20-2. Next, we ask what 
would happen if we turn on a steady current of two units at the time t\. 

The fields in this case will be twice as high as before, but will extend out in 
x only the distance c(t — fj), as shown in part (b) of the figure. When we add 
these two solutions, using the principle of superposition, we find that the sum of 
the two sources is a current of one unit for the time from zero to /i and a current 
of three units for times greater than t\. At the time t the fields will vary with x 
as shown in part (c) of Fig. 20-2. 

Now let’s take a more complicated problem. Consider a current which is 
turned on to one unit for a while, then turned up to three units, and later turned 
off to zero. What are the fields for such a current? We can find the solution in 
the same way — by adding the solutions of three separate problems. First, we find 
the fields for a step current of unit strength. (We have solved that problem already.) 
Next, we find the fields produced by a step current of two units. Finally, we solve 
for the fields of a step current of minus three units. When we add the three solutions, 
we will have a current which is one unit strong from / = 0 to some later time, 
say t u then three units strong until a still later time / 2 , and then turned off — that 
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20-1 Waves in free space; plane 
waves 

20-2 Three-dimensional waves 
20-3 Scientific imagination 
20-4 Spherical waves 


References: Chapter 47, Vol. I: Sound: 
The Wave Equation 
Chapter 28, Vol. 1: Electro- 
magnetic Radiation 
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Fig. 20-1. The electric and mag- 
netic field as a function of x at the time t 
after the current sheet is turned on. 



current sheet, (a) One unit of current 
turned on at t = 0; (b) Two units of 
current turned on at f = fi; (c) Super- 
position of (a) and (b). 





Fig. 20-3. If the current source strength varies as shown in (a), then at the 
time t shown by the arrow the electric field as a function of x is as shown in (b). 


is, to zero. A graph of the current as a function of time is shown in Fig. 20-3(a). 
When we add the three solutions for the electric field, we find that its variation 
with x, at a given instant /, is as shown in Fig. 20-3 (b). The field is an exact 
representation of the current. The field distribution in space is a nice graph of 
the current variation with time — only drawn backwards. As time goes on the whole 
picture moves outward at the speed c, so there is a little blob of field, travelling 
toward positive x, which contains a completely detailed memory of the history of 
all the current variations. If we were to stand miles away, we could tell from the 
variation of the electric or magnetic field exactly how the current had varied 
at the source. 

You will also notice that long after all activity at the source has completely 
stopped and all charges and currents are zero, the block of field continues to travel 
through space. We have a distribution of electric and magnetic fields that exist 
independently of any charges or currents. That is the new effect that comes from 
the complete set of Maxwell’s equations. If we want, we can give a complete 
mathematical representation of the analysis we have just done by writing that the 
electric field at a given place and a given time is proportional to the current at the 
source, only not at the same time, but at the earlier time t — x/c. We can write 

E y (t ) = - ■ (20.3) 

We have, believe it or not, already derived this same equation from another 
point of view in Vol. I, when we were dealing with the theory of the index of re- 
fraction. Then, we had to figure out what fields were produced by a thin layer of 
oscillating dipoles in a sheet of dielectric material with the dipoles set in motion 
by the electric field of an incoming electromagnetic wave. Our problem was to 
calculate the combined fields of the original wave and the waves radiated by the 
oscillating dipoles. How could we have calculated the fields generated by moving 
charges when we didn’t have Maxwell’s equations? At that time we took as our 
starting point (without any derivation) a formula for the radiation fields produced 
at large distances from an accelerating point charge. If you will look in Chapter 
31 of Vol. I, you will see that Eq. (31.10) there is just the same as the Eq. (20.3) 
that we have just written down. Although our earlier derivation was correct only 
at large distances from the source, we see now that the same result continues to 
be correct even right up to the source. 

We want now to look in a general way at the behavior of electric and magnetic 
fields in empty space far away from the sources, i.e., from the currents and charges. 
Very near the sources — near enough so that during the delay in transmission, the 
source has not had time to change much — the fields are very much the same as we 
have found in what we called the electrostatic or magnetostatic cases. If we go out 
to distances large enough so that the delays become important, however, the 
nature of the fields can be radically different from the solutions we have found. 
In a sense, the fields begin to take on a character of their own when they have 
gone a long way from all the sources. So we can begin by discussing the behavior 
of the fields in a region where there are no currents or charges. 
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Suppose we ask: What kind of fields can there be in regions where p and j are 
both zero? In Chapter 18 we saw that the physics of Maxwell’s equations 
could also be expressed in terms of differential equations for the scalar and vector 
potentials: 


VU - -L 


dt 2 




(20.4) 


V*A = 


1 


d 2 A 


c 2 dt 2 


Co C l 


(20.5) 


If p and j are zero, these equations take on the simpler form 


<2a6 > 

(2 °' 7> 


Thus in free space the scalar potential <£ and each component of the vector potential 
A all satisfy the same mathematical equation. Suppose we let ^ (psi) stand for 
any one of the four quantities <t > , A x , A y , A z ; then we want to investigate the general 
solutions of the following equation: 

= ( 20 . 8 ) 


This equation is called the three-dimensional wave equation — three-dimensional, 
because the function \f/ may depend in general on x , y f and z, and we need to worry 
about variations in all three coordinates. This is made clear if we write out ex- 
plicitly the three terms of the Laplacian operator: 


sV sV aV_j_sV = 0 

dx 2 dy 2 dz 2 c 2 dt 2 


(20.9) 


In free space, the electric fields E and B also satisfy the wave equation. For 
example, since B = V X we can get a differential equation for B by taking 
the curl of Eq. (20.7). Since the Laplacian is a scalar operator, the order of the 
Laplacian and curl operations can be interchanged: 


V X (V 2 A) = V 2 (V X A) = v 2 b. 


Similarly, the order of the operations curl and d/dt can be interchanged: 


V X 


d 2 A 

dt 2 


1 n2 1 

14?^ x = 7s 


d 2 B 
2 dt 2 ' 


Using these results, we get the following differential equation for B: 


V 2 B - 


1 ^ = 0. 
c 2 dt 2 


(20.10) 


So each component of the magnetic field B satisfies the three-dimensional wave 
equation. Similarly, using the fact that E = —V<t> — dA/dt , it follows that the 
electric field E in free space also satisfies the three-dimensional wave equation: 

V2£ -^0 = °- (20H) 


All of our electromagnetic fields satisfy the same wave equation, Eq. (20.8). 
We might well ask: What is the most general solution to this equation? However, 
rather than tackling that difficult question right away, we will look first at what 
can be said in general about those solutions in which nothing varies in y and z. 
(Always do an easy case first so that you can see what is going to happen, and 
then you can go to the more complicated cases.) Let’s suppose that the magnitudes 
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of the fields depend only upon x — that there are no variations of the fields with 
y and z. We are, of course, considering plane waves again. We should expect to 
get results something like those in the previous section. In fact, we will find 
precisely the same answers. You may ask: “Why do it all over again?” It is im- 
portant to do it again, first, because we did not show that the waves we found were 
the most general solutions for plane waves, and second, because we found the fields 
only from a very particular kind of current source. We would like to ask now: 
What is the most general kind of one-dimensional wave there can be in free space? 
We cannot find that by seeing what happens for this or that particular source, but 
must work with greater generality. Also we are going to work this time with differ- 
ential equations instead of with integral forms. Although we will get the same re- 
sults, it is a way of practicing back and forth to show that it doesn’t make any 
difference which way you go. You should know how to do things every which 
way, because when you get a hard problem, you will often find that only one of 
the various ways is tractable. 

We could consider directly the solution of the wave equation for some elec- 
tromagnetic quantity. Instead, we want to start right from the beginning with 
Maxwell’s equations in free space so that you can see their close relationship to 
the electromagnetic waves. So we start with the equations in (20.1), setting the 
charges and currents equal to zero. They become 


I. V E = 0 

II. V X E = 

III. V B = 0 


dB 

dt 


( 20 . 12 ) 


IV. c 2 v X B = ~ 
dt 


We write the first equation out in components: 


V E = 


dE x dE y dE z = 
fix ^ fiy ~ h dz 


(20.13) 


We are assuming that there are no variations with y and z, so the last two terms are 
zero. This equation then tells us that 


dE x 

dx 


= 0 . 


(20.14) 


Its solution is that E x , the component of the electric field in the x-direction. is a 
constant in space. If you look at IV in (20.12), supposing no ^-variation in y and 
z either, you can see that E x is also constant in time. Such a field could be the 
steady dc field from some charged condenser plates a long distance away. We are 
not interested now m such an uninteresting static field; we are at the moment 
interested only in dynamically varying fields. For dynamic fields, E x — 0. 

We have then the important result that for the propagation of plane waves 
in any direction, the electric field must be at right angles to the direction of propaga- 
tion. It can, of course, still vary in a complicated way with the coordinate *. 

The transverse ZT-field can always be resolved into two components, say the 
^-component and the z-c omponent. So let’s first work out a case in which the elec- 
tric field has only one transverse component. We’ll take first an electric field that 
is always in the ^-direction, with zero z-component. Evidently, if we solve this 
problem we can also solve for the case where the electric field is always in the 
z-direction. The general solution can always be expressed as the superposition of 
two such fields. 

How easy our equations now get. The only component of the electric field 
that is not zero is E y , and all derivatives — except those with respect to a; — are zero 
The rest of Maxwell’s equations then become quite simple. 
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Let’s look next at the second of Maxwell’s equations [II of Eq. (20.12)]. 
Writing out the components of the curl E , we have 

X - a -§ - «• 

= f - 0 , 

(xr v P\ — dE x __ 

( X )z dx dy dx * 

The x-component of V X E is zero because the derivatives with respect to y and 
2 are zero. The ^-component is also zero ; the first term is zero because the derivative 
with respect to z is zero, and the second term is zero because E z is zero. The only 
components of the curl of E that is not zero is the z-component, which is equal to 
dEy/dx . Setting the three components of V X E equal to the corresponding 
components of — dB/dt , we can conclude the following: 


dB x 

dt 


0 , 


dt 


= 0 . 


(20.15) 


dB z __ __ dEy 
dt dx 


(20.16) 


Since the x-component of the magnetic field and the y-component of the magnetic 
field both have zero time derivatives, these two components are just constant fields 
and correspond to the magnetostatic solutions we found earlier. Somebody may 
have left some permanent magnets near where the waves are propagating. We will 
ignore these constant fields and set B x and B y equal to zero. 

Incidentally, we would already have concluded that the x-component of B 
should be zero for a different reason. Since the divergence of B is zero (from the 
third Maxwell equation), applying the same arguments we used above for the 
electric field, we would conclude that the longitudinal component of the magnetic 
field can have no variation with x. Since we are ignoring such uniform fields in 
our wave solutions, we would have set B x equal to zero. In plane electromagnetic 
waves the 5-field, as well as the 5-field, must be directed at right angles to the 
direction of propagation. 

Equation (20.16) gives us the additional proposition that if the electric field 
has only a ^-component, the magnetic field will have only a z-component. So 
E and B are at right angles to each other. This is exactly what happened in the 
special wave we have already considered. 

We are now ready to use the last of Maxwell’s equations for free space [IV 
of Eq. (20.12)]. Writing out the components, we have 


c*(V X 8), - c* 
C*(T X 8), - c 2 ^ 


2 dB y _ dE x ^ 
C dz dt 

2 dB z _ dE y ^ 
C dx dt 


c 2 (V X B) z 



2 dB x __ dE z 
C dy dt 


(20.17) 


Of the six derivatives of the components of B, only the term dB z /dx is not equal 
to zero. So the three equations give us simply 


2 dB z __ dEy 
C dx dt 


(20.18) 


The result of all our work is that only one component each of the electric and 
magnetic fields is not zero, and that these components must satisfy Eqs. (20.16) 
and (20.18). The two equations can be combined into one if we differentiate the 
first with respect to x and the second with respect to t\ the left-hand sides of the 
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Fig. 20-4. The function f(x — ct ) 
represents a constant "shape” that travels 
toward positive x with the speed c. 


two equations will then be the same (except for the factor c 2 ). 
E y satisfies the equation 

d 2 Ey l d 2 Ey __ 

dx 2 c 2 dt 2 


So we find that 


(20.19) 


We have seen the same differential equation before, when we studied the propaga- 
tion of sound. It is the wave equation for one-dimensional waves. 

Y ou should note that in the process of our derivation we have found something 
more than is contained in Eq. (20 11). Maxwell’s equations have given us the 
further information that electromagnetic waves have field components only at 
right angles to the direction of the wave propagation. 

Let’s review what we know about the solutions of the one-dimensional wave 
equation. If any quantity \p satisfies the one-dimensional wave equation 


0 

dx 2 C 2 dt 2 


( 20 . 20 ) 


then one possible solution is a function \p(x, t ) of the form 


iK*, 0 = Ax - ct), 


( 20 . 21 ) 


that is, some function of the single variable (x — ct ). The function f(x — ct) 
represents a “rigid” pattern in x which travels toward positive x at the speed c 
(see Fig. 20-4). For example, if the function / has a maximum when its argument 
is zero, then for t = 0 the maximum of if/ will occur at x = 0. At some later time, 
say t — 10, ^ will have its maximum at x = 10c. As time goes on, the maximum 
moves toward positive x at the speed c. 

Sometimes it is more convenient to say that a solution of the one-dimensional 
wave equation is a function of (t ~ x/c), However, this is saying the same thing, 
because any function of (/ — x/c ) is also a function of (x — ct): 


F(t - x/c) = F 



= fix - ct). 


Let’s show that fix — ct) is indeed a solution of the wave equation. Since 
it is a function of only one variable — the variable ix — ct) — we will let /' represent 
the derivative of / with respect to its variable and /" represent the second derivative 
of /. Differentiating Eq. (20.21) with respect to x , we have 


fx = f,(x - Ctl 


since the derivative of (x — ct) with respect to x is L 
\p with respect to x is clearly 


dV 

dx 2 


= /"(* - Ct). 


The second derivative of 


( 20 . 22 ) 


Taking derivatives of ^ with respect to /, we find 

% = / '(* - ctx-c), 

2 

H = +C 2 f"(x - ct) (20.23) 


We see that if/ does indeed satisfy the one-dimensional wave equation. 

You may be wondering: “If I have the wave equation, how do I know that 
I should take /(x — ct) as a solution? I don’t like this backward method. Isn’t 
there some forward way to find the solution?” Well, one good forward way is 
to know the solution. It is possible to “cook up” an apparently forward mathe- 
matical argument, expecially because we know what the solution is supposed to 
be, but with an equation as simple as this we don’t have to play games. Soon 
you will get so that when you see Eq. (20.20), you nearly simultaneously see 
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\p = f(x — ct) as a solution. (Just as now when you see the integral of x 2 dx , you 
know right away that the answer is x 3 /3.) 

Actually you should also see a little more. Not only is any function of (x — ct) 
a solution, hut any function of ( x + ct) is also a solution. Since the wave equation 
contains only c 2 , changing the sign of c makes no difference. In fact, the most 
general solution of the one-dimensional wave equation is the sum of two arbitrary 
functions, one of (x — ct) and the other of (x + ct): 

t = f(x - ct) + g(x + ct). (20.24) 

The first term represents a wave travelling toward positive jc, and the second term 
an arbitrary wave travelling toward negative jc. The general solution is the super- 
position of two such waves both existing at the same time. 


We will leave the following amusing question for you to think about. Take a function 
\p of the following form : 


\p = cos kx cos kct . 


This equation isn’t in the form of a function of (jc — ct) or of (jc + ct). Yet you can 
easily show that this function is a solution of the wave equation by direct substitution into 
Eq. (20.20). How can we then say that the general solution is of the form of Eq. (20.24)? 


Applying our conclusions about the solution of the wave equation to the 
y-component of the electric field, E y , we conclude that E y can vary with jc in any 
arbitrary fashion. However, the fields which do exist can always be considered as 
the sum of two patterns. One wave is sailing through space in one direction with 
speed c, with an associated magnetic field perpendicular to the electric field; 
another wave is travelling in the opposite direction with the same speed. Such 
waves correspond to the electromagnetic waves that we know about — light, radio- 
waves, infrared radiation, ultraviolet radiation, x-rays, and so on. We have already 
discussed the radiation of light in great detail in Vol. I. Since everything we learned 
there applies to any electromagnetic wave, we don’t need to consider in great detail 
here the behavior of these waves. 

We should perhaps make a few further remarks on the question of the polariza- 
tion of the electromagnetic waves. In our solution we chose to consider the special 
case in which the electric field has only a y-component. There is clearly another 
solution for waves travelling in the plus or minus x-direction, with an electric 
field which has only a z-component. Since Maxwell’s equations are linear, the 
general solution for one-dimensional waves propagating in the x-direction is the 
sum of waves of E y and waves of E z . This general solution is summarized in the 
following equations: 


E 

r * 

cT 

II 

E.) 

Ey 

= /(* - 

ct) + g(x + ct) 

E z 

= F(x - 

ct) + G(x + ct) 

B 

= (0, By, 

B.) 

cB z 

= fix-ct) -g(x 4- ct) 

CBy 

= -F(x 

— ct) + G(x + ct). 


Such electromagnetic waves have an ^-vector whose direction is not constant but 
which gyrates around in some arbitrary way in the yz-plane. At every point 
the magnetic field is always perpendicular to the electric field and to the direction 
of propagation. 
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If there are only waves travelling in one direction, say the positive ^-direction, 
there is a simple rule which tells the relative orientation of the electric and mag- 
netic fields. The rule is that the cross product E X B — which is, of course, a 
vector at right angles to both E and B — points in the direction in which the wave is 
travelling. If E is rotated into B by a right-hand screw, the screw points in the 
direction of the wave velocity. (We shall see later that the vector E X B has a 
special physical significance: it is a vector which describes the flow of energy in an 
electromagnetic field.) 


20-2 Three-dimensional waves 

We want now to turn to the subject of three-dimensional waves. We have 
already seen that the vector E satisfies the wave equation. It is also easy to arrive 
at the same conclusion by arguing directly from Maxwell’s equations. Suppose we 
start with the equation 


v X 


dB 

E ~ ~ Tt 


and take the curl of both sides : 


V X (V X E) = - ~ (V X B). 

dt 


(20.26) 


You will remember that the curl of the curl of any vector can be written as the sum 
of two terms, one involving the divergence and the other the Laplacian, 

V X (V X E) = V(V * E) - V 2 E. 


In free space, however, the divergence of E is zero, so only the Laplacian term 
remains. Also, from the fourth of Maxwell’s equations in free space [Eq. (20.12)] 
the time derivative of c 2 V X B is the second derivative of E with respect to /: 


^(VX*) = 


d*E 
~dt 2 ' 


Equation (20.26) then becomes 



d^E 
dt 2 ’ 


which is the three-dimensional wave equation. Written out in all its glory, this 
equation is, of course, 


d^E,^E,d^E_^d^E 
dx 2 dy 2 dz 2 c 2 dt 2 


(20.27) 


How shall we find the general wave solution 7 The answer is that all the solu- 
tions of the three-dimensional wave equation can be represented as a superposition 
of the one-dimensional solutions we have already found. We obtained the equation 
for waves which move in the x-direction by supposing that the field did not depend 
on y and z. Obviously, there are other solutions in which the fields do not depend 
on x and z, representing waves going in the y-direction. Then there are solutions 
which do not depend on x and y, representing waves travelling in the z-direction. 
Or in general, since we have written our equations in vector form, the three- 
dimensional wave equation can have solutions which are plane waves moving in 
any direction at all. Again, since the equations are linear, we may have simultane- 
ously as many plane waves as we wish, travelling in as many different directions. 
Thus the most general solution of the three-dimensional wave equation is a 
superposition of all sorts of plane waves moving in all sorts of directions. 

Try to imagine what the electric and magnetic fields look like at present in 
the space in this lecture room. First of all, there is a steady magnetic field ; it comes 
from the currents in the interior of the earth — that is, the earth’s steady magnetic 
field. Then there are some irregular, nearly static electric fields produced perhaps 
by electric charges generated by friction as various people move about in their 
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chairs and rub their coat sleeves against the chair arms. Then there are other 
magnetic fields produced by oscillating currents in the electrical wiring — fields 
which vary at a frequency of 60 cycles per second, in synchronism with the genera- 
tor at Boulder Dam. But more interesting are the electric and magnetic fields vary- 
ing at much higher frequencies. For instance, as light travels from window to 
floor and wall to wall, there are little wiggles of the electric and magnetic fields 
moving along at 186,000 miles per second. Then there are also infrared waves 
travelling from the warm foreheads to the cold blackboard. And we have forgotten 
the ultraviolet light, the x-rays, and the radiowaves travelling through the room. 

Flying across the room are electromagnetic waves which carry music of a jazz 
band. There are waves modulated by a series of impulses representing pictures of 
events going on in other parts of the world, or of imaginary aspirins dissolving in 
imaginary stomachs. To demonstrate the reality of these waves it is only necessary 
to turn on electronic equipment that converts these waves into pictures and sounds. 

If we go into further detail to analyze even the smallest wiggles, there are 
tiny electromagnetic waves that have come into the room from enormous distances. 
There are now tiny oscillations of the electric field, whose crests are separated by 
a distance of one foot, that have come from millions of miles away, transmitted 
to the earth from the Mariner II space craft which has just passed Venus. Its 
signals carry summaries of information it has picked up about the planets (infor- 
mation obtained from electromagnetic waves that travelled from the planet to 
the space craft). 

There are very tiny wiggles of the electric and magnetic fields that are waves 
which originated billions of light years away — from galaxies in the remotest corners 
of the universe. That this is true has been found by “filling the room with wires” — 
by building antennas as large as this room. Such radiowaves have been detected 
from places in space beyond the range of the greatest optical telescopes. Even they, 
the optical telescopes, are simply gatherers of electromagnetic waves. What we 
call the stars are only inferences, inferences drawn from the only physical reality 
we have yet gotten from them — from a careful study of the unendingly complex 
undulations of the electric and magnetic fields reaching us on earth. 

There is, of course, more: the fields produced by lightning miles away, the 
fields of the charged cosmic ray particles as they zip through the room, and more, 
and more. What a complicated thing is the electric field in the space around you! 
Yet it always satisfies the three-dimensional wave equation. 


20-3 Scientific imagination 

I have asked you to imagine these electric and magnetic fields. What do you 
do? Do you know how? How do / imagine the electric and magnetic field? What 
do / actually see? What are the demands of scientific imagination? Is it any 
different from trying to imagine that the room is full of invisible angels? No, it is 
not like imagining invisible angels. It requires a much higher degree of imagination 
to understand the electromagnetic field than to understand invisible angels. Why? 
Because to make invisible angels understandable, all I have to do is to alter their 
properties a little bit — I make them slightly visible, and then I can see the shapes 
of their wings, and bodies, and halos. Once I succeed in imagining a visible angel, 
the abstraction required — which is to take almost invisible angels and imagine 
them completely invisible — is relatively easy. So you say, “Professor, please give 
me an approximate description of the electromagnetic waves, even though it may 
be slightly inaccurate, so that I too can see them as well as I can see almost invisible 
angels. Then I will modify the picture to the necessary abstraction.” 

I’m sorry I can’t do that for you. I don’t know how. I have no picture of this 
electromagnetic field that is in any sense accurate. I have known about the electro- 
magnetic field a long time — I was in the same position 25 years ago that you are 
now, and I have had 25 years more of experience thinking about these wiggling 
waves. When I start describing the magnetic field moving through space, I speak 
of the £- and B fields and wave my arms and you may imagine that I can see them. 
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I’ll tell you what I see. I see some kind of vague shadowy, wiggling lines — here 
and there is an E and B written on them somehow, and perhaps some of the lines 
have arrows on them — an arrow here or there which disappears when I look too 
closely at it. When I talk about the fields swishing through space, I have a terrible 
confusion between the symbols I use to describe the objects and the objects them- 
selves. I cannot really make a picture that is even nearly like the true waves. So 
if you have some difficulty in making such a picture, you should not be worried 
that your difficulty is unusual. 

Our science makes terrific demands on the imagination. The degree of 
imagination that is required is much more extreme than that required for some of 
the ancient ideas. The modern ideas are much harder to imagine. We use a lot 
of tools, though. We use mathematical equations and rules, and make a lot of 
pictures. What I realize now is that when I talk about the electromagnetic field in 
space, I see some kind of a superposition of all of the diagrams which I’ve ever 
seen drawn about them. I don’t see little bundles of field lines running about be- 
cause it worries me that if I ran at a different speed the bundles would disappear. 
I don’t even always see the electric and magnetic fields because sometimes I think 
I should have made a picture with the vector potential and the scalar potential, 
for those were perhaps the more physically significant things that were wiggling. 

Perhaps the only hope, you say, is to take a mathematical view. Now what is 
a mathematical view? From a mathematical view, there is an electric field vector 
and a magnetic field vector at every point in space; that is, there are six numbers 
associated with every point. Can you imagine six numbers associated with each 
point in space? That’s too hard. Can you imagine even one number associated 
with every point? I cannot! I can imagine such a thing as the temperature at every 
point in space. That seems to be understandable. There is a hotness and coldness 
that varies from place to place. But I honestly do not understand the idea of a 
number at every point. 

So perhaps we should put the question: Can we represent the electric field by 
something more like a temperature, say like the displacement of a piece of jello? 
Suppose that we were to begin by imagining that the world was filled with thin 
jello and that the fields represented some distortion — say a stretching or twisting — 
of the jello. Then we could visualize the field. After we “see” what it is like we 
could abstract the jello away. For many years that’s what people tried to do. 
Maxwell, Ampere, Faraday, and others tried to understand electromagnetism 
this way. (Sometimes they called the abstract jello “ether.”) But it turned out that 
the attempt to imagine the electromagnetic field in that way was really standing in 
the way of progress. We are unfortunately limited to abstractions, to using in- 
struments to detect the field, to using mathematical symbols to describe the field, 
etc. But nevertheless, in some sense the fields are real, because after we are all 
finished fiddling around with mathematical equations — with or without making 
pictures and drawings or trying to visualize the thing — we can still make the instru- 
ments detect the signals from Mariner II and find out about galaxies a billion miles 
away, and so on. 

The whole question of imagination in science is often misunderstood by people 
in other disciplines. They try to test our imagination in the following way. They 
say, “Here is a picture of some people in a situation. What do you imagine will 
happen next?” When we say, “I can’t imagine,” they may think we have a weak 
imagination. They overlook the fact that whatever we are allowed to imagine in 
science must be consistent with everything else we know: that the electric fields and 
the waves we talk about are not just some happy thoughts which we are free to 
make as we wish, but ideas which must be consistent with all the laws of physics 
we know. We can’t allow ourselves to seriously imagine things which are obviously 
in contradiction to the known laws of nature. And so our kind of imagination is 
quite a difficult game. One has to have the imagination to think of something that 
has never been seen before, never been heard of before. At the same time the 
thoughts are restricted in a strait jacket, so to speak, limited by the conditions that 
come from our knowledge of the way nature really is. The problem of creating 
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something which is new, but which is consistent with everything which has been 
seen before, is one of extreme difficulty. 

While I’m on this subject I want to talk about whether it will ever be possible 
to imagine beauty that we can’t see It is an interesting question. When we look 
at a rainbow, it looks beautiful to us. Everybody says, “Ooh, a rainbow/’ (You 
see how scientific I am. I am afraid to say something is beautiful unless I have an 
experimental way of defining it.) But how would we describe a rainbow if we were 
blind? We are blind when we measure the infrared reflection coefficient of sodium 
chloride, or when we talk about the frequency of the waves that are coming from 
some galaxy that we can*t see — we make a diagram, *we make a plot. For instance, 
for the rainbow, such a plot would be the intensity of radiation vs. wavelength 
measured with a spectrophotometer for each direction in the sky. Generally, such 
measurements would give a curve that was rather flat. Then some day, someone 
would discover that for certain conditions of the weather, and at certain angles in 
the sky, the spectrum of intensity as a function of wavelength would behave 
strangely; it would have a bump. As the angle of the instrument was varied only a 
little bit, the maximum of the bump would move from one wavelength to another. 
Then one day the physical review of the blind men might publish a technical article 
with the title “The Intensity of Radiation as a Function of Angle under Certain 
Conditions of the Weather.” In this article there might appear a graph such as 
the one in Fig. 20-5 The author would perhaps remark that at the larger angles 
there was more radiation at long wavelengths, whereas for the smaller angles the 
maximum in the radiation came at shorter wavelengths. (From our point of view, 
we would say that the light at 40° is predominantly green and the light at 42° is 
predominantly red.) 


Fig. 20-5. The intensity of electro- 
magnetic waves as a function of wave- 
length for three angles (measured from 
the direction opposite the sun), observed 
only with certain meteorological con- 
ditions. 

Now do we find the graph of Fig. 20-5 beautiful? It contains much more de- 
tail than we apprehend when we look at a rainbow, because our eyes cannot see 
the exact details in the shape of a spectrum. The eye, however, finds the rainbow 
beautiful. Do we have enough imagination to see in the spectral curves the same 
beauty we see when we look directly at the rainbow? I don’t know. 

But suppose I have a graph of the reflection coefficient of a sodium chloride 
crystal as a function of wavelength in the infrared, and also as a function of angle. 

I would have a representation of how it would look to my eyes if they could see 
m the infrared — perhaps some glowing, shiny “green,” mixed with reflections from 
the surface in a “metallic red.” That would be a beautiful thing, but I don’t know 
whether I can ever look at a graph of the reflection coefficient of NaCl measured 
with some instrument and say that it has the same beauty. 

On the other hand, even if we cannot see beauty in particular measured results, 
we can already claim to see a certain beauty in the equations which describe general 
physical laws. For example, in the wave equation (20.9), there’s something nice 
about the regularity of the appearance of the x, the y s the z, and the t. And this 
nice symmetry in appearance of the x. y, z, and t suggests to the mind still a greater 
beauty which has to do with the four dimensions, the possibility that space has 
four-dimensional symmetry, the possibility of analyzing that and the developments 
of the special theory of relativity. So there is plenty of intellectual beauty asso- 
ciated with the equations. 
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20-4 Spherical waves 

We have seen that there are solutions of the wave equation which corre- 
spond to plane waves, and that any electromagnetic wave can be described as a 
superposition of many plane waves. In certain special cases, however, it is more 
convenient to describe the wave field in a different mathematical form. We would 
like to discuss now the theory of spherical waves — waves which correspond to 
spherical surfaces that are spreading out from some center. When you drop a 
stone into a lake, the ripples spread out in circular waves on the surface — they are 
two-dimensional waves. A spherical wave is a similar thing except that it spreads 
out in three dimensions. 

Before we start describing spherical waves, we need a little mathematics. 
Suppose we have a function that depends only on the radial distance r from a 
certain origin — in other words, a function that is spherically symmetric. Let’s 
call the function ^(r), where by r we mean 


r = s/x 2 +~ y 2 + z 2 , 

the radial distance from the origin. In order to find out what functions ^(r) satisfy 
the wave equation, we will need an expression for the Laplacian of So we want 
to find the sum of the second derivatives of $ with respect to x 3 y, and z. We will 
use the notation that yp f (r) represents the derivative of \f/ with respect to r and ^"(r) 
represents the second derivative of \p with respect to r. 

First, we find the derivatives with respect to x. The first derivative is 


dx 




d L 

~dx' 


The second derivative of ^ with respect to x is 


= j," M 2 

dx 2 W \dx/ 


+ V 


d 2 r 

dx 2 


We can evaluate the partial derivatives of r with respect to x from 


dr 

dx 


X 

— D 

r 


dx 2 r V r2 J 


So the second derivative of \p with respect to x is 


Likewise, 


i± = ?L r + 

dx 2 r^ W ^ 




+ 7 (‘ - $ *'• 


tl 

dy 


d 2 & z 2 
dz 2 r 2 * ^ 




(20.28) 

(20.29) 

(20.30) 


The Laplacian is the sum of these three derivatives. Remembering that 
x 2 + y 2 + z 2 = r 2 , we get 

?V(r) = V>(r) + - r V(r). (20.31) 

It is often more convenient to write this equation in the following form: 

= 7 £ W). (20.32) 


If you carry out the differentiation indicated in Eq. (20.32), you will see that the 
right-hand side is the same as in Eq. (20.31). 

If we wn>h to consider spherically symmetric fields which can propagate as 
spherical waves, our field quantity must be a function of both r and t. Suppose 
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we ask, then, what functions \p(r, t) are solutions of the three-dimensional wave 
equation 


vV</, t) 


i <r 

C 2 dt 2 


' Hr, t) = 0. 


(20.33) 


Since \p(r , t) depends only on the spatial coordinates through /*, we can use the equa- 
tion for the Laplacian we found above, Eq. (20.32). To be precise, however, since 
^ is also a function of r, we should write the derivatives with respect to r as partial 
derivatives. Then the wave equation becomes 


! il (f *) _ 1 il * 

r dr 2 yy} c 2 dt 2 W 


0 . 


We must now solve this equation, which appears to be much more complicated 
than the plane wave case. But notice that if we multiply this equation by r, we get 

£ «> - 3 W> = »■ < 20 - 34 > 

This equation tells us that the function r\p satisfies the one-dimensional wave equa- 
tion in the variable r. Using the general principle which we have emphasized so 
often, that the same equations always have the same solutions, we know that if 
is a function only of (r — ct) then it will be a solution of Eq. (20.34). So we 
know that spherical waves must have the form 


rf(r, t) = f(r - ct). 

Or, as we have seen before, we can equally well say that r\p can have the form 


rty = At - r/c). 


Dividing by r, we find that the field quantity ^ (whatever it may be) has the follow- 
ing form: 


r 


(20.35) 


Such a function represents a general spherical wave travelling outward from the 
origin at the speed c. If we forget about the r in the denominator for a moment, 
the amplitude of the wave as a function of the distance from the origin at a given 
time has a certain shape that travels outward at the speed c. The factor r in the 
denominator, however, says that the amplitude of the wave decreases in proportion 
to l/r as the wave propagates. In other words, unlike a plane wave in which the 
amplitude remains constant as the wave runs along, in a spherical wave the ampli- 
tude steadily decreases, as shown in Fig. 20-6. This effect is easy to understand 
from a simple physical argument. 




Fig. 20—6. A spherical wave ~ f(t ~ r/c)/r. (a) | as a function of r for t — h and the 
same wave for the later time 1 2 . (b) ^ as a function of f for r = r^ and the same wave seen at r 2 . 
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We know that the energy density in a wave depends on the square of the wave 
amplitude. As the wave spreads, its energy is spread over larger and larger areas 
proportional to the radial distance squared. If the total energy is conserved, the 
energy density must fall as 1/r 2 , and the amplitude of the wave must decrease as 
1 /r. So Eq. (20.35) is the “reasonable” form for a spherical wave. 

We have disregarded the second possible solution to the one-dimensional 
wave equation: 

= g(l + r/c), 
or 

= g(t + r/c) 


This also represents a spherical wave, but one which travels inward from large r 
toward the origin. 

We are now going to make a special assumption. We say, without any demon- 
stration whatever, that the waves generated by a source are only the waves which 
go outward. Since we know that waves are caused by the motion of charges, we 
want to think that the waves proceed outward from the charges. It would be 
rather strange to imagine that before charges were set in motion, a spherical wave 
started out from infinity and arrived at the charges just at the time they began to 
move. That is a possible solution, but experience shows that when charges are 
accelerated the waves travel outward from the charges. Although Maxwell’s 
equations would allow either possibility, we will put in an additional fact — based 
on experience — that only the outgoing wave solution makes “physical sense.” 

We should remark, however, that there is an interesting consequence to this 
additional assumption: we are removing the symmetry with respect to time that 
exists in Maxwell’s equations. The original equations for E and B, and also the 
wave equations we derived from them, have the property that if we change the sign 
of t , the equation is unchanged. These equations say that for every solution 
corresponding to a wave going in one direction there is an equally valid solution 
for a wave travelling in the opposite direction. Our statement that we will consider 
only the outgoing spherical waves is an important additional assumption. (A 
formulation of electrodynamics in which this additional assumption is avoided has 
been carefully studied. Surprisingly, in many circumstances it does not lead to 
physically absurd conclusions, but it would take us too far astray to discuss these 
ideas just now. We will talk about them a little more in Chapter 28.) 

We must mention another important point. In our solution for an outgoing 
wave, Eq. (20.35), the function \j/ is infinite at the origin. That is somewhat peculiar. 
We would like to have a wave solution which is smooth everywhere. Our solution 
must represent physically a situation in which there is some source at the origin. 
In other words, we have inadvertently made a mistake. We have not solved the 
free wave equation (20.33) everywhere; we have solved Eq. (20.33) with zero on 
the right everywhere, except at the origin. Our mistake crept in because some of 
the steps in our derivation are not “legal” when r — 0. 

Let’s show that it is easy to make the same kind of mistake in an electrostatic 
problem. Suppose we want a solution of the equation for an electrostatic potential 
in free space, V 2 <p = 0. The Laplacian is equal to zero, because we are assuming 
that there are no charges anywhere. But what about a spherically symmetric 
solution to this equation — that is, some function <j> that depends only on r. Using 
the formula of Eq. (20.32) for the Laplacian, we have 


\jP_ 

r dr 2 


(r<t>) = 0. 


Multiplying this equation by r , we have an equation which is readily integrated: 


d 2 
dr 2 


0 4 >) = o. 


If we integrate once with respect to r, we find that the first derivative of r<f> is a 
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constant, which we may call a : 


d , . 

dr ™ = fl - 


Integrating again, we find that r<j> is of the form 

r<j> = ar + b , 

where b is another constant of integration. So we have found that the following <t> 
is a solution for the electrostatic potential in free space: 



Something is evidently wrong. In the region where there are no electric 
charges, we know the solution for the electrostatic potential: the potential is 
everywhere a constant. That corresponds to the first term in our solution. But we 
also have the second term, which says that there is a contribution to the potential 
that varies as one over the distance from the origin. We know, however, that such 
a potential corresponds to a point charge at the origin. So, although we thought 
we were solving for the potential in free space, our solution also gives the field 
for a point source at the origin. Do you see the similarity between what happened 
now and what happened when we solved for a spherically symmetric solution to 
the wave equation? If there were really no charges or currents at the origin, there 
would not be spherical outgoing waves. The spherical waves must, of course, be 
produced by sources at the origin. In the next chapter we will investigate the con- 
nection between the outgoing electromagnetic waves and the currents and voltages 
which produce them. 
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2 / 


Solutions of MaxwvelVs Equations with 
Currents and Charges 


21-1 Light and electromagnetic waves 


We saw in the last chapter that among their solutions, Maxwell’s equations 
have waves of electricity and magnetism. These waves correspond to the phe- 
nomena of radio, light, x-rays, and so on, depending on the wavelength. We have 
already studied light in great detail in Vol. I. In this chapter we want to tie together 
the two subjects — we want to show that Maxwell’s equations can indeed form the 
base for our earlier treatment of the phenomena of light. 

When we studied light, we began by writing down an equation for the electric 
field produced by a charge which moves in any arbitrary way. That equation was 


.j-\ e JL + ^ l ( e x) + 1 i- 

47re 0 [ r ' 2 c dt \r f2 / c 2 dt 2 


(21 1 ) 


cB = e r * X E . 

[See Eq. (28.3), Vol. I.] 

If a charge moves in an arbitrary way, the electric field we would find now at 
some point depends only on the position and motion of the charge not now, but 
at an earlier time — at an instant which is earlier by the time it would take light, 
going at the speed c, to travel the distance r f from the charge to the field point. 
In other words, if we want the electric field at point (1) at the time r, we must cal- 
culate the location (2') of the charge and its motion at the time (t — r'/c), where 
r f is the distance to the point (l) from the position of the charge (20 at the time 
(/ — r'/c). The prime is to remind you that r f is the so-called “retarded distance” 
from the point (20 to the point (1), and not the actual distance between point (2), the 
position of the charge at the time /, and the field point (1) (see Fig. 21-1). Note 
that we are using a different convention now for the direction of the unit vector 
e r . In Chapters 28 and 36 of Vol. I it was convenient to take r (and hence e r ) 
pointing toward the source. Now we are following the definition we took for Cou- 
lomb’s law, in which r is directed from the charge, at (2), toward the field point at (1). 
The only difference, of course, is that our new r (and e r ) are the negatives of the 
old ones. 

We have also seen that if the velocity v of a charge is always much less than 
c, and if we consider only points at large distances from the charge, so that only the 
last term of Eq. (21.1) is important, the fields can also be written as 


and 


E = 


<1 

47 r€ 0 c 2 r' 


acceleration of the charge at (t — r f /c) 
projected at right angles to r 


(2i. r) 


cB = e/ X E. 


Let’s look at what the complete equation, Eq. (21.1), says in a little more 
detail. The vector e r > is the unit vector to point (1) from the retarded position (2'). 
The first term, then, is what we would expect for the Coulomb field of the charge 
at its retarded position — we may call this “the retarded Coulomb field.” The 
electric field depends inversely on the square of the distance and is directed away 
from the retarded position of the charge (that is, in the direction of e r ). 

But that is only the first term. The other terms tell us that the laws of electricity 
do not say that all the fields are the same as the static ones, but just retarded (which 
is what people sometimes like to say). To the “retarded Coulomb field” we must 
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Fig. 21-1. The fields at (1) at the 
time t depend on the position (2') occupied 
by the charge q at the time (f — r'/c). 



add the other two terms. The second term says that there is a “correction” to the 
retarded Coulomb field which is the rate of change of the retarded Coulomb field 
multiplied by r'/c, the retardation delay. In a way of speaking, this term tends to 
compensate for the retardation in the first term. The first two terms correspond to 
computing the “retarded Coulomb field” and then extrapolating it toward the 
future by the amount r'/c, that is, right up to the time Z 1 The extrapolation is linear, 
as if we were to assume that the “retarded Coulomb field” would continue to change 
at the rate computed for the charge at the point (2'). If the field is changing slowly, 
the effect of the retardation is almost completely removed by the correction term, 
and the two terms together give us an electric field that is the “instantaneous Cou- 
lomb field” — that is, the Coulomb field of the charge at the point (2) — to a very 
good approximation. 

Finally, there is a third term in Eq. (21.1) which is the second derivative of the 
unit vector e r *. For our study of the phenomena of light, we made use of the fact 
that far away from the charge the first two terms went inversely as the square of 
the distance and, for large distances, became very weak in comparison to the last 
term, which decreases as 1 fr. So we concentrated entirely on the last term, and we 
showed that it is (again, for large distances) proportional to the component of the 
acceleration of the charge at right angles to the line of sight. (Also, for most of our 
work in Vol. 1, we took the case in which the charges were moving nonrelativistic- 
ally. We considered the relativistic effects in only one chapter, Chapter 36.) 

Now we should try to connect the two things together. We have the Maxwell 
equations, and we have Eq. (21.1) for the field of a point charge. We should cer- 
tainly ask whether they are equivalent. If we can deduce Eq. (21.1) from Maxwell’s 
equations, we will really understand the connection between light and electro- 
magnetism. To make this connection is the main purpose of this chapter. 

It turns out that we won’t quite make it — that the mathematical details get^ 
too complicated for us to carry through in all their gory details. But we will come 
close enough so that you should easily see how the connection could be made. 
The missing pieces will only be in the mathematical details Some of you may 
find the mathematics in this chapter rather complicated, and you may not wish to 
follow the argument very closely. We think it is important, however, to make the 
connection between what you have learned earlier and what you are learning now, 
or at least to indicate how such a connection can be made. You will notice, if 
you look over the earlier chapters, that whenever we have taken a statement as a 
starting point for a discussion, we have carefully explained whether it is a new 
“assumption” that is a “basic law,” or whether it can ultimately be deduced from 
some other laws. We owe it to you in the spirit of these lectures to make the con- 
nection between light and Maxwell’s equations. If it gets difficult in places, well, 
that’s life — there is no other way. 


21-2 Spherical waves from a point source 

In Chapter 18 we found that Maxwell’s equations could be solved by letting 


17 — , hA 

E- ~V t ~ - 

and 

(21.2) 

B V X A, 

where </> and A must then be solutions of the equations 

(21.3) 

v 2, 1_ = _ P 

c- dt 2 e 0 

and 

(21.4) 

V 2j _ 1 f 2 A = _J_ 

C 2 dt 2 € { )C 2 ’ 

(21.5) 


and must also satisfy the condition that 

i:i -' 
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Now we will find the solution of Eqs. (21.4) and (21,5). To do that we have 
to find the solution of the equation 


V 2 t 


JS aV 

C 1 dt 2 


— S, 


(21.7) 


where v, which we call the source, is known. Of course, v corresponds to p/e 0 and 
\p to <f> for Eq (21.4), or s is j x /€ 0 c 2 if ^ is A x , etc , but we want to solve Eq (21 7) 
as a mathematical problem no matter what \f/ and s are physically. 

In places where p and j are zero — in what we have called ‘Tree” space — the 
potentials <j> and A , and the fields E and B , all satisfy the three-dimensional wave 
equation without sources, whose mathematical form is 

= < 2L8) 

In Chapter 20 we saw that solutions of this equation can represent waves of various 
kinds: plane waves in the ^-direction, \p = f (t — x/c ); plane waves in the y- or 
z-direction, or in any other direction; or spherical waves of the form 

Mx,y,z, t ) = f(t r/c) ~ • (21.9) 

(The solutions can be written in still other ways, for example cylindrical waves 
that spread out from an axis.) 

We also remarked that, physically, Eq. (21.9) does not represent a wave in 
free space — that there must be charges at the origin to get the outgoing wave started. 
In other words, Eq. (21.9) is a solution of Eq. (21.8) everywhere except right near 
r = 0, where it must be a solution of the complete equation (21.7), including some 
sources. Let’s see how that works. What kind of a source s in Eq. (21.7) would 
give rise to a wave like Eq. (21.9)? 

Suppose we have the spherical wave of Eq. (21.9) and look at what is happen- 
ing for very small r. Then the retardation —r/cmf(t — r/c) can be neglected — 
provided /is a smooth function — and ^ becomes 

t ^ (r-> 0). (21.10) 


So is just like a Coulomb field for a charge at the origin that varies with time. 
That is, if we had a little lump of charge, limited to a very small region near the 
origin, with a density p, we know that 

Q/4ire 0 
</> = — JT- ’ 

where Q = Jp dV. Now we know that such a 0 satisfies the equation 

v 2 </> = - 2 - . 

Co 


Following the same mathematics, we would say that the ^ of Eq. (21.10) 

)), (21.11) 


satisfies 

< 

to 

1! 

1 

where s is related to /by 

S 


II 

with 

5 = fsdV. 


The only difference is that in the general case, 5 , and therefore 5, can be a function 
of time. 

Now the important thing is that if \p satisfies Eq. (21.11) for small r, it also 
satisfies Eq. (21.7). As we go very close to the origin, the l/r dependence of ^ 
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causes the space derivatives to become very large. But the time derivatives keep 
their same values [They are just the time derivatives of j (/).] So as r goes to zero, 
the term d 2 \p/dt 2 in Eq. (21.7) can be neglected in comparison with V 2 ^, and Eq. 
(21.7) becomes equivalent to Eq. (21.11). 

To summarize, then, if the source function ,v(/) of Eq. (21 7) is localized at 
the origin and has the total strength 


S(t) = fs(t) dV, 

the solution of Eq. (21.7) is 


y> 0 


1 S (t - r/c) 
4ir r 


( 21 . 12 ) 

(21.13) 


The only effect of the term 0“V/d/ 2 in Eq. (21.7) is to introduce the retardation 
(/ — r/c) in the Coulomb-like potential. 


21-3 The general solution of Maxwell’s equations 

We have found the solution of Eq. (21.7) for a “■point'” source. The next 
question is: What is the solution for a spread-out source 9 That’s easy; we can 
think of any source s(x,y, z, t) as made up of the sum of many “point” sources, 
one for each volume element dV , and each with the source sti ength s(x . y, z, /) dV . 
Since Eq (21.7) is linear, the resultant field is the superposition of the fields from 
all of such source elements. 

Using the results of the preceding section [Eq. (21.13)] we know that the 
field d\f/ at the point (xi,yi, z x )— or (1) for short — at the time /, from a source 
element s dV at the point (a 2 , y 2 * z 2 )—or (2) for short — is given by 

j t s i a s (2, t r\2‘ c) dV 2 

cm,t) = 4 — > 

where r 12 is the distance from (2) to (1). Adding the contributions from all the 
pieces of the source means, of course, doing an integral over all regions where 
s ^ 0; so we have 

(2U4) 

That is, the field at (1) at the time t is the sum of all the spherical waves which 
leave the source elements at (2) at the times (/ — r i2 /c). This is the solution of 
our wave equation for any set of sources. 

We see now how to obtain a general solution for Maxwell’s equations. If 
for \(/ we mean the scalar potential <£, the source function s becomes p/e n . Or we 
can let \f/ represent any one of the three components of the vector potential A, 
replacing v by the corresponding component of j/e 0 c 2 . Thus, if we know the 
charge density p(x, y, z, t) and the current density j(x, y } z, t) everywhere, we can 
immediately write down the solutions of Eqs (21.4) and (21.5). They are 

(2I,5) 

and 

A(l, 0 = [ dV 2 . (21.16) 

J 47reot-/*i2 

The fields E and B can then be found by differentiating the potentials, using Eqs. 
(21.2) and (21.3). [Incidentally, it is possible to verify that the <j> and A obtained 
from Eqs. (21.15) and (21.16) do satisfy the equality (21.6) ] 

We have solved Maxwell’s equations. Given the currents and charges in any 
circumstance, we can find the potentials directly from these integrals and then 
differentiate and get the fields. So we have finished with the Maxwell theory 
Also this permits us to close the ring back to our theory of light, because to connect 
with our earlier work on light, we need only calculate the electric field from a 
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moving charge. All that remains is to take a moving charge, calculate the po- 
tentials from these integrals, and then differentiate to find E from — V<£ — dA/dt . 
We should get Eq. (21.1). It turns out to be lots of work, but that’s the principle. 

So here is the center of the universe of electromagnetism — the complete theory 
of electricity and magnetism, and of light; a complete description of the fields 
produced by any moving charges; and more. It is all here. Here is the structure 
built by Maxwell, complete in all its power and beauty. It is probably one of the 
greatest accomplishments of physics. To remind you of its importance, we will 
put it all together in a nice frame. 


Maxwell’s equations: 


v E = 

e ( , 

v • B = 0 

„ ,, SB 

VXE = -aT 

c 2 v X B = i- + ~ 

Co ot 

Their solutions: 

E = 

B 

dA 

- v * - w 

= V x A 

HU 0 = j 

? (2 / - 1-,,/c) 

47re 0 r 12 

A( 1, t) = 

f j{2 ’ ‘ “ Wc) dv t 

J 4t re 0 C 2 r 12 


21-4 The fields of an oscillating dipole 

We have still not lived up to our promise to derive Eq. (21.1) for the electric 
field of a point charge in motion. Even with the results we already have, it is a 
relatively complicated thing to derive. We have not found Eq. (21.1) anywhere in 
the published literature except in Vol. I of these lectures.* So you can see that it is 
not easy to derive. (The fields of a moving charge have been written in many othei 
forms that are equivalent, of course.) We will have to limit ourselves here just to 
showing that, in a few examples, Eqs. (21.15) and (21.16) give the same results as 
Eq. (21.1). First, we will show that Eq (21.1) gives the correct fields with only the 
restriction that the motion of the charged particle is nonrelativistic. (Just this 
special case will take care of 90 percent, or more, of what we said about light.) 

We consider a situation in which we have a blob of charge that is moving 
about in some way, in a small region, and we will find the fields far away. To put it 
another way, we are finding the field at any distance from a point charge that is 
shaking up and down in very small motion. Since light is usually emitted from 
neutral objects such as atoms, we will consider that our wiggling charge q is located 
near an equal and opposite charge at rest. If the separation between the centers of 
the charges is d , the charges will have a dipole moment p = qd , which we take to 
be a function of time. Now we should expect that if we look at the fields close to 
the chaiges, we won’t have to worry about the delay; the electric field will be 
exactly the same as the one we have calculated earlier for an electrostatic dipole 


* The formula was worked out by R. P. Feynman, in about 1950, and given in some 
lectures as a good way of thinking about synchrotron radiation 
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Fig. 21-2. The potentials at (1) are 
given by integrals over the charge 
density p. 


— using, of course, the instantaneous dipole moment pit). But if we go very far 
out, we ought to find a term in the field that goes as \/r and depends on the ac- 
celeration of the charge perpendicular to the line of sight. Let’s see if we get such 
a result. 

We begin by calculating the vector potential A, using Eq. (21.16). Suppose 
that our moving charge is in a small blob whose charge density is given by p(v, y, z), 
and the whole thing is moving at any instant with the velocity v. Then the current 
density j(x, y, z) will be equal to vp(x, y, z). It will be convenient to take our 
coordinate system so that the z-axis is in the direction of v; then the geometry of 
our problem is as shown in Fig. 21-2. We want the integral 


j(2, 1 - r l2 /c) 


rv. 


(IV,. 


(21.17) 


Now if the size of the charge-blob is really very small compared with r 12 , we 
can set the r 12 term in the denominator equal to r , the distance to the center of the 
blob, and take r outside the integral. Next, we are also going to set r 12 = r in 
the numerator, although that is not really quite right. It is not right because we 
should take j at, say, the top of the blob at a slightly different time than we used 
for j at the bottom of the blob. When we set r 12 = r in j(t — r 12 /c ), we are 
taking the current density for the whole blob at the same time (/ — r/c). That is 
an approximation that will be good only if the velocity v of the charge is much 
less than c. So we are making a nonrelativistic calculation. Replacing j by py, 
the integral (21.17) becomes 


~ J yp( 2, t — r/c) dV 2 . 


Since all the charge has the same velocity, this integral is just u/r times the total 
charge q. But qv is just dp/dt, the rate of change of the dipole moment — which is, 
of course, to be evaluated at the retarded time (/ — r/c). We will write it as 
pit — r/c). So we get for the vector potential 


A(l,t) 


1 p(t - r/c) 
Aircoc 2 r 


(21.18) 


Our result says that the current in a varying dipole produces a vector potential 
in the form of spherical waves whose source strength is p/Aire^c 2 . 

We can now get the magnetic field from B — V X A. Since/? is totally in the 
z-direction, A has only a z-component; there are only two nonzero derivatives in 
the curl So B x = dA z /dy and B y = —dA z /dx. Let’s first look at B x : 


dA z _ 1 d p(t — r/c) 

dy 47t e {) c 2 dy r 


(21.19) 


To carry out the differentiation, we must remember that r — yx 2 + y 2 + z 2 , so 


B x 


1 


47re 0 c 2 


p(t 


C) 


dy 


+ 


1 1 d 


4i re 0 c 2 r dy 


p(t - r/c). (21.20) 


Remembering that dr/dy = y/r , the first term gives 


I yp(t - r/c) 

47T€ 0 c 2 r 1 


( 21 . 21 ) 


which drops off as l/r 2 like the fields of a static dipole (because y/r is constant for 
a given direction). 

The second term in Eq. (21.20) gives us the new effects. Carrying out the 
differentiation, we get 


47T€qC 2 cr 1 


; pit - r/c). 


( 21 . 22 ) 


where p means, of course, the second derivative of p with respect to t. This term. 
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which comes from differentiating the numerator, is responsible for radiation. 
First, it describes a field which decreases with distance only as 1 /r. Second, it 
depends on the acceleration of the charge. You can begin to see how we are going 
to get a result like Eq. (21 1'), which describes the radiation of light. 

Let’s examine in a little more detail how this radiation term comes about — it 
is such an interesting and important result. We start with the expression (21.18), 
which has a \/r dependence and is therefore like a Coulomb potential, except for 
the delay term in the numerator. Why is it then that when we differentiate with 
respect to space coordinates to get the fields, we don’t just get a 1 jr 1 field — with, 
of course, the corresponding time delays? 

We can see why in the following way: Suppose that we let our dipole oscillate 
up and down in a sinusoidal motion. Then we would have 


and 


P = Pz = Pq sin a;/ 

1 co/>o cos — r/c) 
47 re 0 c 2 r 


If we plot a graph of A z as a function of r at a given instant, we get the curve shown 
in Fig 21-3. The peak amplitude decreases as \/r, but there is, in addition, an 
oscillation in space, bounded by the \/r envelope. When we take the spatial de- 
rivatives, they will be proportional to the slope of the curve. From the figure we 
see that there are slopes much steeper than the slope of the \/r curve itself. It is, 
in fact, evident that for a given frequency the peak slopes are proportional to the 
amplitude of the wave, which varies as \/r. So that explains the drop-off rate of 
the radiation term. 

It all comes about because the variations with time at the source are translated 
into variations in space as the waves are propagated outward, and the magnetic 
fields depend on the spatial derivatives of the potential. 

Let's go back and finish our calculation of the magnetic field. We have for 
B x the two terms (21.21) and (21.22), so 

ypjt - r/c) _ yp(t - r/c) 
r 4 cr 2 


B x = 


1 


47 re 0 c J 



Fig. 21-3. The magnitdue of A as a 
function of r at the instant t for the 
spherical wave from an oscillating dipole. 


With the same kind of mathematics, we get 


47 T€qC 2 


xp(t — r/c) xp(t — r/c) 

..•I i 


cr 2 


Or we can put it all together in a nice vector formula: 

B = 1 [ p + (r/c)p]t-ric X r 

47T€()C 2 r s 


(21 23) 


Now let’s look at this formula. First of all, if we go very far out in r, only the 
p term counts. The direction of Z? is given by p X #% which is at right angles to the 
radius r and also at right angles to the acceleration, as in Fig. 21-4. Everything is 
coming out right; that is also the result we get from Eq. (21. 1'). 

Now let’s look at what we are not used to — at what happens closer in. In 
Section 14-9 we worked out the law of Biot and Savart for the magnetic field of an 
element of current. We found that a current element j dV contributes to the mag- 
netic field the amount 


dB = 


1 jXr 

47 re 0 c 2 r‘ s 


dV. 


(21.24) 



Fig. 21-4. The radiation fields B and 
E of an oscillating dipole. 


You see that this formula looks very much like the first term of Eq (21.23), if we 
remember that p is the current. But there is one difference. In Eq. (21.23), the 
current is to be evaluated at the time (/ — r/c), which doesn’t appear in Eq. (21.24). 
Actually, however, Eq. (21.24) is still very good for small r , because the second 
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term of Eq. (21 23) tends to cancel out the effect of the retardation in the first term. 
The two together give a result very near to Eq. (21 24) when r is small. 

We can see that this way . When r is small, (/ — r/c) is not very different from 
/, so we can expand the bracket in Eq. (2 1 .23) in a Taylor series. For the first term, 

p(t - r/c) = p(t) - ~ p(t) + etc., 

and to the same order in r/c, 

p(t - r/c) = p{t). 


When we take the sum, the two terms in p cancel, and we are left with the un- 
retarded current p : that is ,/>(/) — plus terms of order (r/c) 2 or higher [e.g., \(r/c) 2 p"\ 
which will be very small for r small enough that p does not alter markedly in the 
time r/c. 

So Eq. (21 23) gives fields very much like the instantaneous theory — much 
closer than the instantaneous theory with a delay; the first-order effects of the delay 
are taken out by the second term. The static formulas are very accurate, much 
more accurate than you might think Of course, the compensation only works for 
points close in. For points far out the correction becomes very bad, because the 
time delays produce a very large effect, and we get the important \/r term of the 
radiation. 

We still have the problem of computing the electric field and demonstrating 
that it is the same as Eq. (21. T)- For large distances we can see that the answer 
is going to come out all right. We know that far from the sources, where we have 
a propagating wave, E is perpendicular to B (and also to r), as in Fig. 21-4, and 
that cB = E So E is proportional to the acceleration p, as expected from Eq. 

(21. T). 

To get the electric field completely for all distances, we need to solve for the 
electrostatic potential. When we computed the current integral for A to get 
Eq. (21.18), we made an approximation by disregarding the slight variation of r 
in the delay terms. This will not work for the electrostatic potential, because we 
would then get 1 /r times the integral of the charge density, which is a constant. 
This approximation is too rough. We need to go to one higher order. Instead of 
getting involved in that higher-order computation directly, we can do something 
else — we can determine the scalar potential from Eq. (21.6), using the vector po- 
tential we have already found. The divergence of A , in our case, is just dA z /dz 
— since A x and A y are identically zero. Differentiating in the same way that we 
did above to find B , 


1 

p{t - rc) z Cv 

I+7S*' 

- r/c) 

47re 0 c 2 

1 

! 

1 

! 1 
■"S 

w 

zp(t - r/cf 


47r€ ( )C 2 

L 

cr 2 



Or, in vector notation, 

v . A = _ J [ p + (r/ c) p]t_r/c ■ r 

47T6 0 c 2 r* 

Using Eq. (21.6), we have an equation for (j > : 

<*4> = 1 [p + (r/c) p] t_ric - r 

dt 47T€ 0 r' s 

Integrating with respect to t just removes one dot from each of the p* s, so 


4>{r, t) = 


1 1 P + ( r/c)p] t _ r fc 

47 re () 


(21 25) 


(The constant of integration would correspond to some superposed static field 
which could, of course, exist. For the oscillating dipole we have taken, there is 
no static field ) 
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We are now able to find the electric field E from 


E = -T(t> 


dA 

dt 


Since the steps are tedious but straightforward [providing you remember that 
p(t — r/c ) and its time derivatives depend on x, y, and z through the retardation 
r/c], we will just give the result: 


E{r, t) 


-1 

47 T€ 0 ^ 3 


with 


3 iP r 2 ) r + iP(f ~ r / c ) X f} X r | (21.26) 


p*=p(t- r/c) + ~p(t - r/c). (21.27) 


Although it looks rather complicated, the result is easily interpreted. The 
vector />* is the dipole moment retarded and then “corrected” for the retardation, 
so the two terms with p* give just the static dipole field when r is small. [See 
Chapter 6, Eq. (6.14).] When r is large, the term in p dominates, and the electric 
field is proportional to the acceleration of the charges, at right angles to #\ and, m 
fact, directed along the projection of p m a plane perpendicular to r. 

This result agrees with what we would have gotten using Eq. (21.1). Of 
course, Eq. (21.1) is more general; it works with any motion, while Eq. (21.26) is 
valid only for small motions for which we can take the retardation r/c as constant 
over the source. At any rate, we have now provided the underpinnings for our 
entire previous discussion of light (excepting some matters discussed in Chapter 
36 of Vol. I), for it all hinged on the last term of Eq. (21.26). We will discuss next 
how the fields can be obtained for more rapidly moving charges (leading to the 
relativistic effects of Chapter 36 of Vol. I). 


21-5 The potentials of a moving charge; the general solution of Lienard and 
Wiechert 

In the last section we made a simplification in calculating our integral for A 
by considering only low velocities. But in doing so we missed an important point 
and also one where it is easy to go wrong. We will therefore take up now a calcula- 
tion of the potentials for a point charge moving in any way whatever — even with 
a relativistic velocity. Once we have this result, we will have the complete electro- 
magnetism of electric charges. Even Eq. (21.1) can then be derived by taking 
derivatives. The story will be complete. So bear with us. 

Let’s try to calculate the scalar potential <j>( 1) at the point (xj, y x , z{) produced 
by a point charge, such as an electron, moving in any manner whatsoever. By a 
“point” charge we mean a very small ball of charge, shrunk down as small as you 
like, with a charge density p(x, y 9 z). We can find <j> from Eq. (21.15): 

</>(!, t) = -j-L f ^ 2 ’ t r /i-g/O dv 2 . (21.28) 

47TC0 J T 12 

The answer would seem to be — and almost everyone would, at first, think — that 
the integral of p over such a “point” charge is just the total charge q , so that 


*0,0 


q_ 

47T60 ri 2 


(wrong). 


By r[ 2 we mean the radius vector from the charge at point (2) to point (1) at the 
retarded time ( t — r X2 /c). It is wrong. 

The correct answer is 


*0. 0 = ^-^- 


1 


4 tt € 0 r’i2 1 — Vr'/c 


(21.29) 


where v r > is the component of the velocity of the charge parallel to r{ 2 — namely, 
toward point (1). We will now show you why. To make the argument easier to 
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Fig. 21-5. (a) A “point" charge — considered as a small cubical distribution of 

charge — moving with the speed v toward point (1) (b) The volume element AV, 

used for calculating the potentials. 


follow, we will make the calculation first for a “point” charge which is in the form 
of a little cube of charge moving toward the point (1) with the speed v, as shown 
in Fig. 21 -5(a). Let the length of a side of the cube be a , which we take to be 
much, much less than r X2 , the distance from the center of the charge to the 
point (1). 

Now to evaluate the integral of Eq. (21.28), we will return to basic principles; 
we will write it as the sum 

(21.30) 

F i 



Fig. 21-6. Integrating p(f — r f / c) dV 
for a moving charge. 


where r t is the distance from point (1) to the /th volume element AV, and p t is the 
charge density at AV % at the time t l = t — rjc. Since r t » a , always, it will be 
convenient to take our A V L in the form of thin, rectangular slices perpendicular to 
r i2 , as shown in Fig. 21-5(b). 

Suppose we start by taking the volume elements AK* with some thickness w 
much less than a. The individual elements will appear as shown in Fig. 21-6(a), 
where we have put in more than enough to cover the charge. But we have not 
shown the charge, and for a good reason. Where should we draw it 9 For each 
volume element AV ti we are to take p at the time t L — (t — r t /c), but since the 
charge is moving , it is in a different place for each volume element AV t \ 

Let’s say that we begin with the volume element labeled “1” in Fig. 21-6(a), 
chosen so that at the time t x = (t — r x /c) the “back” edge of the charge occupies 
AV X , as shown in Fig. 21 -6(b). Then when we evalute p 2 AF 2 , we must use the 
position of the charge at the slightly later time t 2 = 0 — r 2 /c), when the charge 
will be in the position shown in Fig. 21-6(c). And so on, for AK 3 , AF 4 , etc. Now 
we can evaluate the sum. 

Since the thickness of each AV L is w, its volume is wa 2 . Then each volume 
element that overlaps the charge distribution contains the amount of charge 
uvrp, where p is the density of charge within the cube — which we take to be 
uniform. When the distance from the charge to point (1) is large, we will make a 
negligible error by setting all the r" s in the denominators equal to some average 
value, say the retarded position r f of the center of the charge. Then the sum (21.30) 

pwa 2 

5 

where AF,\ is the last AV t that overlaps the charge distributions, as shown in Fig. 
21-6(e). The sum is, clearly, 

^ pwa 2 _ pa 3 / 

r f r f \ a ) 


Now pa 3 is just the total charge q and Nw is the length b shown in part (e) of the 
figure. So we have 


* = T. 


47 re {) r f \a 


(21.31) 
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What is bl It is the length of the cube of charge increased by the distance 
moved by the charge between t x = (/ — r x /c) and ts = (t — r N /c ) — which is 
the distance the charge moves in the time 

A t = t N - 1 1 = (r x - r N )/ c = b/c. 


Since the speed of the charge is v, the distance moved is v At = vb/c. But the 
length b is this distance added to a: 

b = a + - b. 
c 

Solving for b , we get 

h = a 

1 - (v/c) ‘ 

Of course by v we mean the velocity at the retarded time /'=(/ — //c), which 
we can indicate by writing [1 — r/c] rvt , and Eq. (21.31) for the potential becomes 

* (1, {) = 4^7 [1 - (y/c)] rc t ' 


This result agrees with our assertion, Eq. (21.29). There is a correction term which 
comes about because the charge is moving as our integral “sweeps over the charge.” 
When the charge is moving toward the point (1), its contribution to the integral is 
increased by the ratio b/a. Therefore the correct integral is q/r f multiplied by 
b/a, which is 1/[1 — r/c] rei . 

If the velocity of the charge is not directed toward the observation point (1), 
you can see that what matters is the component of its velocity toward point (1). 
Calling this velocity component v r , the correction factor is 1/[1 — r r /c] Tvt . Also, 
the analysis we have made goes exactly the same way for a charge distribution of 
any shape — it doesn’t have to be a cube. Finally, since the “size” of the charge a 
doesn’t enter into the final result, the same result holds when we let the charge 
shrink to any size — even to a point. The general result is that the scalar potential 
for a point charge moving with any velocity is 


4>(t) - 


g 

47r€ 0 r'[l - (v r /c )\ ret 


This equation is often written in the equivalent form 

^ l ’ ^ ~ 4?reo[r - (v * r/c)] Tet ’ 


(21.32) 


(21.33) 


where r is the vector from the charge to the point (1), where <j> is being evaluated, 
and all the quantities in the bracket are to have their values at the retarded time 
t f = t - r’/c. 

The same thing happens when we compute A for a point charge, from Eq. 
(21.16). The current density is pv and the integral over p is the same as we found 
for </>. The vector potential is 


A (\ , /) 


gv 

47 re 0 c 2 [> - (v • r/c)] Tf . t 


(21.34) 


The potentials for a point charge were first deduced m this form by Lienard 
and Wiechert and are called the Lienard-Wiechert potentials. 

To close the ring back to Eq. (21.1) it is only necessary to compute E and B 
from these potentials (using B — V X A and E = —V<}> — dA/dt). It is now 
only arithmetic. The arithmetic, however, is fairly involved, so we will not write 
out the details. Perhaps you will take our word for it that Eq. (21 .1) is equivalent 
to the Lienard-Wiechert potentials we have derived.* 


* If you have a lot of paper and time you can try to work it through yourself. We 
would, then, make two suggestions' First, don’t forget that the derivatives of r f are 
complicated, since it is a function of /' Second, don’t try to derive (21 1), but carry out 
all of the derivatives in it, and then compare what you get with the E obtained from the 
potentials (21.33) and (21.34). 
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21-6 The potentials for a charge moving with constant velocity; the Lorentz 
formula 

We want next to use the Lienard-Wiechert potentials for a special case — to 
find the fields of a charge moving with uniform velocity in a straight line. We will 
do it again later, using the principle of relativity. We already know what the po- 
tentials are when we are standing in the rest frame of a charge. When the charge 
is moving, we can figure everything out by a relativistic transformation from one 
system to the other. But relativity had its origin in the theory of electricity and 
magnetism. The formulas of the Lorentz transformation (Chapter 15, Vol. 1) 
were discoveries made by Lorentz when he was studying the equations of electricity 
and magnetism. So that you can appreciate where things have come from, we 
would like to show that the Maxwell equations do lead to the Lorentz transforma- 
tion. We begin by calculating the potentials of a charge moving with uniform 
velocity, directly from the electrodynamics of Maxwell’s equations. We have 
shown that Maxwell’s equations lead to the potentials for a moving charge that we 
got in the last section. So when we use these potentials, we are using Maxwell’s 
theory. 


Fig. 21-7. Finding the potential at 
P of a charge moving with uniform 
velocity along the x-axis. 



Suppose we have a charge moving along the x-axis with the speed v. We want 
the potentials at the point P(x, y, z), as shown in Fig. 21-7. If / -= 0 is the moment 
when the charge is at the origin, at the time t the charge is at x = »/, y = z = 0 
What we need to know, however, is its position at the retarded time 

t' = / - C (21.35) 

where r f is the distance to the point P from the charge at the retarded time. At the 
earlier time the charge was at jc = />/', so 

r f = V C* — y/') 2_ + y l + (21.36) 

To find r f or f we have to combine this equation with Eq. (21.35). First, we 
eliminate r f by solving Eq. (21.35) for r f and substituting in Eq. (21.36). Then, 
squaring both sides, we get 

c\t - /') 2 = {x - Vf) 2 + y 2 + z 2 , 

which is a quadratic equation in t f . Expanding the squared binomials and collecting 
like terms in t\ we get 

(r 2 — c 2 )t f2 — 2(xv — c 2 t)t f + x z + y 2 + z 2 — (ct) 2 = 0. 

Solving for 

0 - &) f = 1 - ^ ~ C yj {x ~ v,)2 + ( 1 “ c*y + z2) (2L37) 
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To get r f we have to substitute this expression for t f into 


r* — c(t — /'). 


Now we are ready to find <t> from Eq. (21.33), which, since v is constant, 
becomes 

1 


y, Z, t) = -r~- -J 


47 re 0 r* — (v ’ r'/c) 


(21.38) 


The component of v in the direction of r' is v X (x — y/)/r', so v * #■' is just 
r X (x — vf')» and the whole denominator is 


c(t — /') (x — ?;/') — c 




Substituting for (1 — v 2 /c 2 )t f from Eq. (21.37), we get for 0 


* (W ’ ,) = 4 4 


1 


I (x - vtf + ^1 - + Z 2 ) 

This equation is more understandable if we rewrite it as 




1 




r X ~ vt 

K \/l~— v 2 /c 


+ y 2 4* z 2 


1/2 


(21.39) 


The vector potential ^ is the same expression with an additional factor of v/c 2 * 

. v 

In Eq. (21.39) you can clearly see the beginning of the Lorentz transformation. 
If the charge were at the origin in its own rest frame, its potential would be 


0(x, y, z) 4 7r€o [^2 _j_ j;2 _|_ Z 2Jl/2 

We are seeing it in a moving coordinate system, and it appears that the coordinates 
should be transformed by 

x — vt 

X -> -- — Z > 

\/l — v 2 /c 2 

y 

z — > z. 


That is just the Lorentz transformation, and what we have done is essentially the 
way Lorentz discovered it. 

But what about that extra factor 1 /y/l — v 2 /c% that appears at the front of 
Eq. (21.39)? Also, how does the vector potential A appear, when it is everywhere 
zero in the rest frame of the particle? We will soon show that A and <t> together 
constitute a four- vector, like the momentum p and the total energy U of a particle. 
The extra \/y/\ — v^/c 2 in Eq. (21.39) is the same factor that always comes in 
when one transforms the components of a four-vector — just as the charge density p 
transforms to p/V 1 “ v 2 /c 2 . In fact, it is almost apparent from Eqs. (21 4) 
and (21.5) that A and 0 are components of a four-vector, because we have already 
shown in Chapter 13 that j and p are the components of a four-vector. 

Later we will take up in more detail the relativity of electrodynamics; here we 
only wished to show how naturally the Maxwell equations lead to the Lorentz 
transformation. You will not, then, be surprised to find that the laws of electricity 
and magnetism are already correct for Einstein’s relativity. We will not have to 
“fix them up,” as we had to do for Newton’s laws of mechanics. 
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22 


AC Circuits 


22-1 Impedances 


Most of our work in this course has been aimed at reaching the complete 
equations of Maxwell. In the last two chapters we have been discussing the con- 
sequences of these equations. We have found that the equations contain all the 
static phenomena we had worked out earlier, as well as the phenomena of electro- 
magnetic waves and light that we had gone over in some detail in Volume I. The 
Maxwell equations give both phenomena, depending upon whether one computes 
the fields close to the currents and charges, or very far from them There is not 
much interesting to say about the intermediate region; no special phenomena 
appear there. 

There still remain, however, several subjects in electromagnetism that we 
want to take up. We want to discuss the question of relativity and the Maxwell 
equations — what happens when one looks at the Maxwell equations with respect 
to moving coordinate systems. There is also the question of the conservation of 
energy in electromagnetic systems. Then there is the broad subject of the electro- 
magnetic properties of materials; so far, except for the study of the properties 
of dielectrics, we have considered only the electromagnetic fields in free space And 
although we covered the subject of light in some detail in Volume I, there are 
still a few things we would like to do again from the point of view of the field 
equations. 

In particular, we want to take up again the subject of the index of re- 
fraction, particularly for dense materials. Finally, there are the phenomena 
associated with waves confined in a limited region of space. We touched on this 
kind of problem briefly when we were studying sound waves. Maxwell’s equations 
lead also to solutions which represent confined waves of the electric and magnetic 
fields. We will take up this subject, which has important technical applications, 
in some of the following chapters. In order to lead up to that subject, we will 
begin by considering the properties of electrical circuits at low frequencies. We 
will then be able to make a comparison between those situations in which the 
almost static approximations of Maxwell’s equations are applicable and those 
situations in which high-frequency effects are dominant. 

So we descend from the great and esoteric heights of the last few chapters 
and turn to the relatively low-level subject of electrical circuits. We will see, how- 
ever, that even such a mundane subject, when looked at in sufficient detail, can 
contain great complications 

We have already discussed some of the properties of electrical circuits in 
Chapters 23 and 25 of Vol. I. Now we will cover some of the same material again, 
but in greater detail. Again we are going to deal only with linear systems and with 
voltages and currents which all vary sinusoidally; we can then represent all voltages 
and currents by complex numbers, using the exponential notation described in 
Chapter 22 of Vol. I. Thus a time-varying voltage V{t) will be written 
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Review: 


V(t) = (22.1) 

where V represents a Complex number that is independent of /. It is, of course, 
understood that the actual time-varying voltage V(t ) is given by the real part of 
the complex function on the right-hand side of the equation. 
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Similarly, all of our other time-varying quantities will be taken to vary 
sinusoidally at the same frequency go. So we write 


II 

*-"(> 

c 

(current), 


S = S e 1 "* 

(emf), 

(22.2) 

E = £ e wt 

(electric field), 




and so on. 

Most of the time we will write our equations in terms of V, /, 8 , . . . (instead of 
in terms of V, /, 8, . . .), remembering, though, that the time variations are as 
given in (22.2). 

In our earlier discussion of circuits we assumed that such things as inductances, 
capacitances, and resistances were familiar to you. We want now to look in a little 
more detail at what is meant by these idealized circuit elements. We begin with 
the inductance. 

An inductance is made by winding many turns of wire in the form of a coil 
and bringing the two ends out to terminals at some distance from the coil, as shown 
in Fig. 22-1. We want to assume that the magnetic field produced by currents in 
the coil does not spread out strongly all over space and interact with other parts of 
the circuit. This is usually arranged by winding the coil in a doughnut-shaped 
form, or by confining the magnetic field by winding the coil on a suitable iron core, 
or by placing the coil in some suitable metal box, as indicated schematically in 
Fig. 22-1. In any case, we assume that there is a negligible magnetic field in the 
external region near the terminals a and h. We are also going to assume that we 
can neglect any electrical resistance in the wire of the coil. Finally, we will assume 
that we can neglect the amount of electrical charge that appears on the surface of 
a wire in building up the electric fields. 

With all these approximations we have what we call an “ideal" inductance. 
(We will come back later and discuss what happens in a real inductance.) For an 
ideal inductance we say that the voltage across the terminals is equal to L(dl/dt). 
Let’s see why that is so. When there is a current through the inductance, a magnetic 
field proportional to the current is built up inside the coil. If the current changes 
with time, the magnetic field also changes. In general, the curl of E is equal to 
— dB/dt\ or, put differently, the line integral of E all the way around any closed 
path is equal to the negative of the rate of change of the flux of B through the loop 
Now suppose we consider the following path: Begin at terminal a and go along 
the coil (staying always inside the wire) to terminal b \ then return from terminal b 
to terminal a through the air in the space outside the inductance. The line integral 
of E around this closed path can be written as the sum of two parts: 


Je * ds = J E ■ ds + J E 


ds . 


via 

coil 


b 

outside 


(22.3) 


As we have seen before, there can be no electric fields inside a perfect conductor. 
(The smallest fields would produce infinite currents.) Therefore the integral from 
a to b via the coil is zero. The whole contribution to the line integral of E comes 
from the path outside the inductance from terminal b to terminal a . Since we have 
assumed that there are no magnetic fields in the space outside of the “box," this 
part of the integral is independent of the path chosen and we can define the po- 
tentials of the two terminals. The difference of these two potentials is what we 
call the voltage difference, or simply the voltage V, so we have 

V = - f“ E-ds = -§E-ds. 

The complete line integral is what we have before called the electromotive 
force £ and is, of course, equal to the rate of change of the magnetic flux in the 
coil. We have seen earlier that this emf is equal to the negative rate of change of 
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the current, so we have 


V = 



where L is the inductance of the coil. Since dl/dt = /W, we have 


V = icoLJ. 


(22.4) 


The way we have described the ideal inductance illustrates the general approach 
to other ideal circuit elements — usually called “lumped” elements. The properties 
of the element are described completely in terms of currents and voltages that 
appear at the terminals. By making suitable approximations, it is possible to 
ignore the great complexities of the fields that appear inside the object. A separation 
is made between what happens inside and what happens outside. 

For all the circuit elements we will find a relation like the one in Eq. (22.4), in 
which the voltage is proportional to the current with a proportionality constant 
that is, in general, a complex number. This complex coefficient of proportionality 
is called the impedance and is usually written as z (not to be confused with the 
z-coordinate). It is, in general, a function of the frequency co. So for any lumped 
element we write 


V ^ V 
I ~ I 

For an inductance, we have 


(22.5) 


z (inductance) = z L = z‘coL. 


( 22 . 6 ) 


Now let’s look at a capacitor from the same point of view.* A capacitor con- 
sists of a pair of conducting plates from which two wires are brought out to suitable 
terminals. The plates may be of any shape whatsoever, and are often separated 
by some dielectric material. We illustrate such a situation schematically m Fig. 
22-2. Again we make several simplifying assumptions. We assume that the 
plates and the wires are perfect conductors. We also assume that the insulation 
between the plates is perfect, so that no charges can flow across the insulation 
from one plate to the other. Next, we assume that the two conductors are close 
to each other but far from all others, so that all field lines which leave one plate 
end up on the other. Then there are always equal and opposite charges on the two 
plates and the charges on the plates are much larger than the charges on the sur- 
faces of the lead-in wires. Finally, we assume that there are no magnetic fields 
close to the capacitor. 

Suppose now we consider the line integral of E around a closed loop which 
starts at terminal a, goes along inside the wire to the top plate of the capacitor, 
jumps across the space between the plates, passes from the lower plate to terminal 
b through the wire, and returns to terminal a in the space outside the capacitor. 
Since there is no magnetic field, the line integral of E around this closed path is 
zero. The integral can be broken down into three parts: 



Fig. 22-2. A capacitor (or con- 
denser). 


<^E • ds 


J E * ds 

along 

wires 


j E ' ds d - j" E ' ds, 

between outside 

plates 


(22.7) 


The integral along the wires is zero, because there are no electric fields inside per- 
fect conductors. The integral from b to a outside the capacitor is equal to the nega- 
tive of the potential difference between the terminals. Since we imagined that the 
two plates are in some way isolated from the rest of the world, the total charge on 


* There are people who say we should call the objects by the names “inductor” and 
“capacitor” and call their properties “inductance” and “capacitance” (by analogy with 
“resistor” and “resistance”). We would rather use the words you will hear in the labora- 
tory. Most people still say “inductance” for both the physical coil and its inductance L. 
The word “capacitor” seems to have caught on — although you will still hear “condenser” 
fairly often — and most people still prefer the sound of “capacity" to "capacitance.” 
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the two plates must be zero; if there is a charge Q on the upper plate, there is an 
equal, opposite charge — Q on the lower plate. We have seen earlier that if two 
conductors have equal and opposite charges, plus and minus Q , the potential 
difference between the plates is equal to Q/C , where C is called the capacity of the 
two conductors. From Eq. (22.7) the potential difference between the terminals 
a and b is equal to the potential difference between the plates. We have, therefore, 
that 


V 


Q 
c * 


The electric current I entering the capacitor through terminal a (and leaving 
through terminal b) is equal to dQ/dt , the rate of change of the electric charge on 
the plates. Writing dV/dt as 100 V, we can put the voltage current relationship for 
a capacitor in the following way: 

. T/ / ' 

mV =c’ 

or 


V = 


I 

icoC 


( 22 . 8 ) 


The impedance z of a capacitor, is then 

z (capacitor) = zc 


1 

iwC 


(22.9) 


The third element we want to consider is a resistor. However, since we have 
not yet discussed the electrical properties of real materials, we are not yet ready 
to talk about what happens inside a real conductor. We will just have to accept 
as fact that electric fields can exist inside real materials, that these electric fields 
give rise to a flow of electric charge — that is, to a current — and that this current 
is proportional to the integral of the electric field from one end of the conductor 
to the other. We then imagine an ideal resistor constructed as in the diagram of 
Fig. 22-3. Two wires which we take to be perfect conductors go from the terminals 
a and b to the two ends of a bar of resistive material. Following our usual line of 
argument, the potential difference between the terminals a and b is equal to the 
line integral of the external electric field, which is also equal to the line integral of 
the electric field through the bar of resistive material. It then follows that the cur- 
rent / through the resistor is proportional to the terminal voltage V: 


1°) (b) (c) (d ) 


/ = 


V 


R 


a 




o 

==C 

o 


Z = 


I 


ituL 


i cuC 



where R is called the resistance. We will see later that the relation between the 
current and the voltage for real conducting materials is only approximately linear. 
We will also see that this approximate proportionality is expected to be independent 
of the frequency of variation of the current and voltage only if the frequency is 
not too high. For alternating currents then, the voltage across a resistor is in phase 
with the current, which means that the impedance is a real number. 


R 


z (resistance) = z H — R. 


( 22 . 10 ) 


Fig. 22-4. The ideal lumped circuir Our results for the three lumped circuit elements — the inductor, the capacitor, 

elements (passive). and the resistor — are summarized in Fig. 22-4. In this figure, as well as in the 

preceding ones, we have indicated the voltage by an arrow that is directed from one 
terminal to another. If the voltage is “positive” — that is, if the terminal a is at a 
higher potential than the terminal b — the arrow indicates the direction of a positive 
“voltage drop.” 

Although we are talking about alternating currents, we can of course include 
the special case of circuits with steady currents by taking the limit as the frequency 
co goes to zero. For zero frequency — that is, for DC — the impedance of an induc- 
tance goes to zero ; it becomes a short circuit. For DC, the impedance of a condenser 
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goes to infinity; it becomes an open circuit. Since the impedance of a resistor is 
independent of frequency, it is the only element left when we analyze a circuit 
for dc. 

In the circuit elements we have described so far, the current and voltage are 
proportional to each other. If one is zero, so also is the other. We usually think in 
terms like these: An applied voltage is “responsible” for the current, or a current 
“gives rise to” a voltage across the terminals; so in a sense the elements “respond” 
to the “applied” external conditions. For this reason these elements are called 
passive elements. They can thus be contrasted with the active elements, such as 
the generators we will consider in the next section, which are the sources of the 
oscillating currents or voltages in a circuit. 


22-2 Generators 

Now we want to talk about an active circuit element — one that is a source of 
the currents and voltages in a circuit — namely, a generator. 

Suppose that we have a coil like an inductance except that it has very few 
turns, so that we may neglect the magnetic field of its own current. This coil, 
however, sits in a changing magnetic field such as might be produced by a rotating 
magnet, as sketched in Fig. 22-5. (We have seen earlier that such a rotating mag- 
netic field can also be produced by a suitable set of coils with alternating currents.) 
Again we must make several simplifying assumptions. The assumptions we will 
make are all the ones that we described for the case of the inductance. In particular, 
we assume that the varying magnetic field is restricted to a definite region in the 
vicinity of the coil and does not appear outside the generator in the space between 
the terminals. 

Following closely the analysis we made for the inductance, we consider the 
line integral of E around a complete loop that starts at terminal a , goes through the 
coil to terminal b and returns to its starting point in the space between the two 
terminals. Again we conclude that the potential difference between the terminals 
is equal to the total line integral of E around the loop: 

V = - j>E-ds. 

This line integral is equal to the emf in the circuit, so the potential difference V 
across the terminals of the generator is also equal to the rate of change of the mag- 
netic flux linking the coil: 

V = -S = J (flux). (22.11) 

For an ideal generator we assume that the magnetic flux linking the coil is deter- 
mined by external conditions — such as the angular velocity of a rotating magnetic 
field — and is not influenced in any way by the currents through the generator. 
Thus a generator — at least the ideal generator we are considering — is not an 
impedance. The potential difference across its terminals is determined by the 
arbitrarily assigned electromotive force £(/). Such an ideal generator is represented 
by the symbol shown in Fig. 22-6. The little arrow represents the direction of the 
emf when it is positive. A positive emf in the generator of Fig. 22-6 will produce 
a voltage V = 8, with the terminal a at a higher potential than the terminal b. 

There is another way to make a generator which is quite different on the 
inside but which is indistinguishable from the one we have just described insofar 
as what happens beyond its terminals. Suppose we have a coil of wire which 
is rotated in a fixed magnetic field, as indicated in Fig. 22-7. We show a bar 
magnet to indicate the presence of a magnetic field; it could, of course, be replaced 
by any other source of a steady magnetic field, such as an additional coil carrying 
a steady current. As shown in the figure, connections from the rotating coil are 
made to the outside world by means of sliding contacts or “slip rings.” Again, 
we are interested in the potential difference that appears across the two terminals 
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Fig. 22-5. A generator consisting of 
a fixed coil and a rotating magnetic field. 


a 



b 

Fig. 22-6. Symbol for an ideal gen- 
erator. 



V 


Fig. 22-7. A generator consisting of 
a coil rotating in a fixed magnetic field. 



a and b, which is of course the integral of the electric field from terminal a to ter- 
minal b along a path outside the generator. 

Now in the system of Fig. 22-7 there are no changing magnetic fields, so we 
might at first wonder how any voltage could appear at the generator terminals 
In fact, there are no electric fields anywhere inside the generator. We are, as usual, 
assuming for our ideal elements that the wires inside are made of a perfectly con- 
ducting material, and as we have said many times, the electric field inside a perfect 
conductor is equal to zero. But that is not true. It is not true when a conductor 
is moving in a magnetic field. The true statement is that the total force on any 
charge inside a perfect conductor must be zero. Otherwise there would be an 
infinite flow of the free charges. So what is always true is that the sum of the electric 
field E and the cross product of the velocity of the conductor and the magnetic 
field B — which is the total force on a unit charge — must have the value zero 
inside the conductor: 

F=E-\-vXB = 0 (in a perfect conductor), (22.12) 

where v represents the velocity of the conductor. Our earlier statement that there 
is no electric field inside a perfect conductor is all right if the velocity v of the 
conductor is zero; otherwise the correct statement is given by Eq. (22.12). 

Returning to our generator of Fig. 22-7, we now see that the line integral of 
the electric field E from terminal a to terminal b through the conducting path of 
the generator must be equal to the line integral of v X B on the same path, 

[ b E'ds = - f b (v X B)'ds. (22.13) 

Ja Ja 

inside inside 

conductor conductor 

It is still true, however, that the line integral of E around a complete loop, including 
the return from b to a outside the generator, must be zero, because there are no 
changing magnetic fields. So the first integral in Eq. (22.13) is also equal to K, 
the voltage between the two terminals. It turns out that the right-hand integral 
of Eq. (22 13) is just the rate of change of the flux linkage through the coil and is 
therefore — by the flux rule — equal to the emf in the coil. So we have again that 
the potential difference across the terminals is equal to the electromotive force in 
the circuit, in agreement with Eq. (22. 11). So whether we have a generator in which 
a magnetic field changes near a fixed coil, or one in which a coil moves in a fixed 
magnetic field, the external properties of the generators are the same. There is a 
voltage difference V across the terminals, which is independent of the current in 
the circuit but depends only on the arbitrarily assigned conditions inside the 
generator. 

So long as we are trying to understand the operation of generators from the 
point of view of Maxwell’s equations, we might also ask about the ordinary chemi- 
cal cell, like a flashlight battery It is also a generator, i.e., a voltage source, al- 
though it will of course only appear in dc circuits. The simplest kind of cell to 
understand is shown in Fig. 22-8. We imagine two metal plates immersed in some 
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chemical solution. We suppose that the solution contains positive and negative 
ions. We suppose also that one kind of ion, say the negative, is much heavier than 
the one of opposite polarity, so that its motion through the solution by the process 
of diffusion is much slower. We suppose next that by some means or other it is 
arranged that the concentration of the solution is made to vary from one part of 
the liquid to the other, so that the number of ions of both polarities near, say, the 
lower plate is much larger than the concentration of ions near the upper plate. 
Because of their rapid mobility the positive ions will drift more readily into the 
region of lower concentration, so that there will be a slight excess of positive charge 
arriving at the upper plate. The upper plate will become positively charged and 
the lower plate will have a net negative charge. 

As more and more charges diffuse to the upper plate, the potential of this plate 
will rise until the resulting electric field between the plates produces forces on the 
ions which just compensate for their excess mobility, so the two plates of the cell 
quickly reach a potential difference which is characteristic of the internal con- 
struction. 

Arguing just as we did for the ideal capacitor, we see that the potential differ- 
ence between the terminals a and b is just equal to the line integral of the electric 
field between the two plates when there is no longer any net diffusion of the ions. 
There is, of course, an essential difference between a capacitor and such a chemical 
cell. If we short-circuit the terminals of a condenser for a moment, the capacitor 
is discharged and there is no longer any potential difference across the terminals. 
In the case of the chemical cell a current can be drawn from the terminals con- 
tinuously without any change in the emf — until, of course, the chemicals inside 
the cell have been used up. In a real cell it is found that the potential difference 
across the terminals decreases as the current drawn from the cell increases. In 
keeping with the abstractions we have been making, however, we may imagine an 
ideal cell in which the voltage across the terminals is independent of the current. 
A real cell can then be looked at as an ideal cell in series with a resistor. 
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Fig. 22-8. A chemical cell. 


22-3 Networks of ideal elements; Kirchhoff’s rules 


As we have seen in the last section, the description of an ideal circuit element 
in terms of what happens outside the element is quite simple. The current and 
the voltage are linearly related. But what is actually happening inside the element 
is quite complicated, and it is quite difficult to give a precise description in terms of 
Maxwell’s equations. Imagine trying to give a precise description of the electric 
and magnetic fields of the inside of a radio which contains hundreds of resistors, 
capacitors, and inductors. It would be an impossible task to analyze such a thing 
by using Maxwell’s equations. But by making the many approximations we have 
described in Section 22-2 and summarizing the essential features of the real 
circuit elements in terms of idealizations, it becomes possible to analyze an elec- 
trical circuit in a relatively straightforward way. We will now show how that 
is done. 

Suppose we have a circuit consisting of a generator and several impedances 
connected together, as shown in Fig. 22-9. According to our approximations there 
is no magnetic field in the region outside the individual circuit elements. Therefore 
the line integral of E around any curve which does not pass through any of the 
elements is zero. Consider then the curve r shown by the broken line which goes 
all the way around the circuit in Fig. 22-9. The line integral of E around this curve 
is made up of several pieces. Each piece is the line integral from one terminal of a 
circuit element to the other. This line integral we have called the voltage drop 
across the circuit element. The complete line integral is then just the sum of the 
voltage drops across all of the elements in the circuit: 



jE-ds = 


Fig. 22—9. The sum of the voltage 
drops around any closed path is zero. 


Since the line integral is zero, we have that the sum of the potential differences 
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(22.14) 


around a complete loop of a circuit is equal to zero: 


E r» = 0 - 

around 
any loop 



Fig. 22-10. The sum of the currents 
Into any node is zero. 


This result follows from one of Maxwell’s equations — that in a region where there 
are no magnetic fields the line integral of E around any complete loop is zero. 

Suppose we consider now a circuit like that shown in Fig. 22-10. The hori- 
zontal line joining the terminals a, b, c , and d is intended to show that these ter- 
minals are all connected, or that they are joined by wires of negligible resistance. 
In any case, the drawing means that terminals a, b , c, and d are all at the same 
potential and, similarly, that the terminals e,/, g, and h are also at one common 
potential. Then the voltage drop V across each of the four elements is the same. 

Now one of our idealizations has been that negligible electrical charges ac- 
cumulate on the terminals of the impedances. We now assume further that any 
electrical charges on the wires joining terminals can also be neglected. Then the 
conservation of charge requires that any charge which leaves one circuit element 
immediately enters some other circuit element. Or, what is the same thing, we 
require that the algebraic sum of the currents which enter any given junction must 
be zero. By a junction, of course, we mean any set of terminals such as a , b, c, 
and d which are connected. Such a set of connected terminals is usually called a 
“node.” The conservation of charge then requires that for the circuit of Fig. 22-10, 


h-i 2 -h-h = o. 


(22.15) 


The sum of the currents entering the node which consists of the four terminals 
e,f, g, and h must also be zero: 


— I\ + 1 2 + ^3 + I A ~~ 0 . 


(22.16) 



Fig. 22-11, 
Kirchhoff's rules. 


Analyzing a circuit with 


This is, of course, the same as Eq. (22.15). The two equations are not independent. 
The general rule is that the sum of the currents into any node must be zero . 

E = °- ( 22 . 17 ) 

into 
a node 

Our earlier conclusion that the sum of the voltage drops around a closed loop 
is zero must apply to any loop in a complicated circuit. Also, our result that the 
sum of the currents into a node is zero must be true for any node. These two equa- 
tions are known as Kirchhoff’s rules . With these two rules it is possible to solve for 
the currents and voltages in any network whatever. 

Suppose we consider the more complicated circuit of Fig. 22-11. How shall 
we find the currents and voltages in this circuit? We can find them in the following 
straightforward way. We consider separately each of the four subsidiary closed 
loops which appear in the circuit. (For instance, one loop goes from terminal a to 
terminal b to terminal e to terminal d and back to terminal a.) For each of the loops 
we write the equation for the first of Kirchhoff’s rules — that the sum of the voltages 
around each loop is equal to zero. We must remember to count the voltage drop 
as positive if we are going in the direction of the current and negative if we are 
going across an element in the direction opposite to the current; and we must 
remember that the voltage drop across a generator is the negative of the emf in 
that direction. Thus if we consider the small loop that starts and ends at terminal 
a we have the equation 


+ z 3^3 + z 4^4 *— Si = 0. 


Applying the same rule to the remaining loops, we would get three more equations 
of the same kind. 

Next, we must write the current equation for each of the nodes in the circuit. 
For example, summing the currents into the node at terminal b gives the equation 
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It - h - i% = 0 . 



Similarly, for the node labeled e we would have the current equation 


1$ — I 4 I s — I 5 ~ 0 . 


For the circuit shown there are five such current equations. It turns out, however, 
that any one of these equations can be derived from the other four; there are, 
therefore, only four independent current equations. We thus have a total of eight 
independent, linear equations: the four voltage equations and the four current 
equations. With these eight equations we can solve for the eight unknown currents. 
Once the currents are known the circuit is solved. The voltage drop across any 
element is given by the current through that element times its impedance (or, in 
the case of the voltage sources, it is already known). 

We have seen that when we write the current equations, we get one equation 
which is not independent of the others. Generally it is also possible to write down 
too many voltage equations. For example, in the circuit of Fig. 22-11, although 
we have considered only the four small loops, there are a large number of other 
loops for which we could write the voltage equation. There is, for example, the 
loop along the path abcfeda. There is another loop which follows the path 
abcfehgda. You can see that there are many loops. In analyzing complicated cir- 
cuits it is very easy to get too many equations. There are rules which tell us how to 
proceed so that only the minimum number of equations is written down, but 
usually with a little thought it is possible to see how to get the right number of 
equations in the simplest form. Besides, writing an extra equation or two doesn’t 
do any harm. They will not lead to any wrong answers, only perhaps a little 
unnecessary algebra. 

In Chapter 25 of Vol. I we showed that if the two impedances z\ and z 2 are 
in series , they are equivalent to a single impedance z s given by 

z s = z 1 + z 2 * (22.18) 

We also showed that if the two impedances are connected in parallel , they are 
equivalent to the single impedance z p given by 

*- (■/..) I (■/..) - 5T V <2219 > 

If you look back you will see that in deriving these results we were in effect making 
use of Kirchhoff’s rules. It is often possible to analyze a complicated circuit by 
repeated application of the formulas for series and parallel impedances. For in- 
stance, the circuit of Fig. 22-12 can be analyzed that way. First, the impedances 
z 4 and z 5 can be replaced by their parallel equivalent, and so also can z 0 and z 7 . 
Then the impedance z 2 can be combined with the parallel equivalent of z 6 and z 7 
by the series rule. Proceeding in this way, the whole circuit can be reduced to a 
generator in series with a single impedance Z. The current through the generator 
is then just S/Z. Then by working backward one can solve for the currents in 
each of the impedances. 

There are, however, quite simple circuits which cannot be analyzed by this 
method, as for example the circuit of Fig. 22-13. To analyze this circuit we must 



Fig. 22—1 2. A circuit which can be 
analyzed in terms of series and parallel 
combinations. 



Fig. 22-1 3. A circuit that cannot be 
analyzed in terms of series and parallel 
combinations. 
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write down the current and voltage equations from KirchhofTs rules. Let’s do it. 
There is just one current equation: 



b 


Fig. 22-14. A bridge circuit. 



b 


Fig. 22-15. Any two-terminal net- 
work of passive elements is equivalent to 
an effective impedance. 


h + 1 2 + ^3 ~ 0 , 

so we know immediately that 

h - ~(h + hi 


We can save ourselves some algebra if we immediately make use of this result in 
writing the voltage equations. For this circuit there are two independent voltage 
equations; they are 

— Si / 2 z 2 — h z i = 0 


and 


S 2 — (h + 2 2 )z 3 ~ h %2 — 0. 


There are two equations and two unknown currents. Solving these equations for 
1 1 and / 2 , we get 


and 


h - 

1 2 = 


z 2&2 ~ ( z 2 + z 3)£l 
z l( z 2 + z 3) + Z 2 Z 3 

(22.20) 

ZiS 2 + Z3S1 

Z 1 ( z 2 + z 3) + Z 2 Z 3 

(22.21) 


The third current is obtained from the sum of these two. 

Another example of a circuit that cannot be analyzed by using the rules for 
series and parallel impedance is shown in Fig. 22-14. Such a circuit is called a 
“bridge.” It appears in many instruments used for measuring impedances. With 
such a circuit one is usually interested in the question: How must the various 
impedances be related if the current through the impedance z 3 is to be zero? We 
leave it for you to find the conditions for which this is so. 


22-4 Equivalent circuits 

Suppose we connect a generator 8 to a circuit containing some complicated 
interconnection of impedances, as indicated schematically in Fig. 22-1 5(a). All 
of the equations we get from KirchhofT’s rules are linear, so when we solve them 
for the current / through the generator, we will get that / is proportional to 8. 
We can write 



where now z eff is some complex number, an algebraic function of all the elements 
in the circuit. (If the circuit contains no generators other than the one shown, there 
is no additional term independent of 8.) But this equation is just what we would 
write for the circuit of Fig. 22-1 5(b). So long as we are interested only in what 
happens to the left of the two terminals a and b , the two circuits of Fig. 22-15 are 
equivalent. We can, therefore, make the general statement that any two-terminal 
network of passive elements can be replaced by a single impedance z eff without 
changing the currents and voltages in the rest of the circuit. This statement is, of 
course, just a remark about what comes out of KirchhofTs rules — and ultimately 
from the linearity of Maxwell’s equations. 

The idea can be generalized to a circuit that contains generators as well as 
impedances. Suppose we look at such a circuit “from the point of view” of one of 
the impedances, which we will call z n , as in Fig. 22-16(a). If we were to solve the 
equation for the whole circuit, we would find that the voltage V n between the two 
terminals a and b is a linear function of /, which we can write 

V n = A - BI n , (22.22) 

where A and B depend on the generators and impedances in the circuit to the left 
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of the terminals. For instance, for the circuit of Fig. 22-13, we find V 1 = I\z x . 
This can be written (by rearranging Eq. (22.20)] as 


V x 


t( 


z 2 \ 
z 2 + z 3/ 


62 



Z 2 Z 3 j 
z 2 ~*C z 3 1 


(22.23) 


The complete solution is then obtained by combining this equation with the one 
for the impedance z 1? namely, V\ — I\ z lt or in the general case, by combining 
Eq. (22.22) with 

V = I z 

r n g n £t n • 



If now we consider that z n is attached to a simple series circuit of a generatoi 
and a current, as in Fig. 22-1 5(b), the equation corresponding to Eq. (22.22) is 

8cff ^n z eti s 

which is identical to Eq. (22.22) provided we set 8 C . ff = A and 2 eff = B . So if we 
are interested only in what happens to the right of the terminals a and 6, the arbi- 
trary circuit of Fig. 22-16 can always be replaced by an equivalent combination of 
a generator in series with an impedance. 

22-5 Energy 

We have seen that to build up the current / in an inductance, the energy 
U = \Ll 2 must be provided by the external circuit. When the current falls back 
to zero, this energy is delivered back to the external circuit. There is no energy-loss 
mechanism in an ideal inductance. When there is an alternating current through 
an inductance, energy flows back and forth between it and the rest of the circuit, 
but the average rate at which energy is delivered to the circuit is zero. We say that 
an inductance is a nonchssipative element; no electrical energy is dissipated — that is, 
“lost”— in it. 

Similarly, the energy of a condenser, U ~ iCV 2 , is returned to the external 
circuit when a condenser is discharged. When a condenser is in an ac circuit 
energy flows in and out of it, but the net energy flow in each cycle is zero. An ideal 
condenser is also a nondissipative element. 

We know that an emf is a source of energy. When a current / flows in the 
direction of the emf, energy is delivered to the external circuit at the rate dU/dt = 
EL If current is driven against the emf — by other generators in the circuit — the 
emf will absorb energy at the rate £/; since / is negative, dU/dt will also be negative. 

If a generator is connected to a resistor /?, the current through the resistor 
is / = E/R . The energy being supplied by the generator at the rate El is being 
absorbed by the resistor. This energy goes into heat in the resistor and is lost 
from the electrical energy of the circuit. We say that electrical energy is dissipated 
in a resistor. The rate at which energy is dissipated in a resistor is dU/dt = RI 2 . 

In an ac circuit the average rate of energy lost to a resistor is the average of 
RI 2 over one cycle. Since / = Ie lu>t — by which we really mean that I varies as 
cos co/ — the average of 7 2 over one cycle is |/| 2 /2, since the peak current is |/| and 
the average of cos 2 c 0 / is 1/2. 

What about the energy loss when a generator is connected to an arbitrary 
impedance z? (By “loss” we mean, of course, conversion of electrical energy into 
thermal energy.) Any impedance z can be written as the sum of its real and im- 
ginary parts. That is, 

z - R + iX, (22.24) 

where R and X are real numbers. From the point of view of equivalent circuits we 
can say that any impedance is equivalent to a resistance in series with a pure 
imaginary impedance — called a reactance — as shown in Fig. 22-17. 

We have seen earlier that any circuit that contains only U s and C s has an 
impedance that is a pure imaginary number. Since there is no energy loss into any 
of the U s and C’s on the average, a pure reactance containing only Z/s and C’s 
will have no energy loss. We can see that this must be true in general for a reactance. 
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b 


Fig. 22-16. Any two-terminal net- 
work can be replaced by a generator in 
series with an impedance. 



Fig. 22—17. Any impedance is equiv- 
alent to a series combination of a pure 
resistance and a pure reactance. 



If a generator with the emf 8 is connected to the impedance z of Fig. 22-17, 
the emf must be related to the current I from the generator by 

8 - I(R + iX). (22.25) 

To find the average rate at which energy is delivered, we want the average of the 
product 8/. Now we must be careful. When dealing with such products, we must 
deal with the real quantities 8(/) and /(/). (The real parts of the complex functions 
will represent the actual physical quantities only when we have linear equations; 
now we are concerned with products , which are certainly not linear.) 

Suppose we choose our origin of t so that the amplitude I is a real number, 
let’s say / 0 ; then the actual time variation I is given by 


/ = / 0 COS 0)t . 

The emf of Eq. (22.25) is the real part of 

I 0 e lut (R + iX) 
or 

8 = I 0 R cos co/ — I 0 X sin c ot. (22.26) 
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The two terms in Eq. (22.26) represent the voltage drops across R and X 
in Fig. 22-17. We see that the voltage drop across the resistance is in phase with 
the current, while the voltage drop across the purely reactive part is out of phase 
with the current. 

The average rate of energy loss, (P) llv , from the generator is the integral of 
the product 8/ over one cycle divided by the period T; in other words, 


(P)*v = 


8/ A = 


InR cos 2 c oi dt 


J. 

7\ 


IqX cos c or sin cor dt. 


The first integral is \ I$R , and the second integral is zero. So the average 
energy loss in an impedance z = R + iX depends only on the real part of z, 
and is IqR/2 , which is in agreement with our earlier result for the energy loss in a 
resistor. There is no energy loss in the reactive part. 
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Fig. 22—1 8. The effective impedance 
of a ladder. 


22-6 A ladder network 

We would like now to consider an interesting circuit which can be analyzed 
in terms of series and parallel combinations. Suppose we start with the circuit of 
Fig. 22-1 8(a). We can see right away that the impedance from terminal a to ter- 
minal b is simply z x + z 2 . Now let’s take a little harder circuit, the one shown in 
Fig. 22-1 8(b). We could analyze this circuit using KirchhofT’s rules, but it is 
also easy to handle with series and parallel combinations. We can replace the 
two impedances on the right-hand end by a single impedance z 3 = z\ + z 2 , as 
in part (c) of the figure. Then the two impedances z 2 and z 3 can be replaced by 
their equivalent parallel impedance z 4 , as shown in part (d) of the figure. Finally, 
zi and z 4 are equivalent to a single impedance z 5 , as shown in part (e). 

Now we may ask an amusing question: What would happen if in the network 
of Fig. 22-1 8(b) we kept on adding more sections forever — as we indicate by the 
dashed lines m Fig. 22-1 9(a)? Can we solve such an infinite network? Well, that’s 
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Fig. 22-19. The effective impedance of an infinite ladder. 
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not so hard. First, we notice that such an infinite network is unchanged if we add 
one more section at the “front” end. Surely, if we add one more section to an 
infinite network it is still the same infinite network. Suppose we call the impedance 
between the two terminals a and b of the infinite network z 0 ; then the impedance of 
all the stuff to the right of the two terminals c and d is also z 0 . Therefore, so far as 
the front end is concerned, we can represent the network as shown in Fig. 22-1 9(b). 
Combining the parallel combinations z 2 z 0 and adding the result in series with z u 
we can immediately write down the impedance of this combination: 


2 21 + (l/z 2 ) + (l/z 0 ) 


or 


Z = Zi + 


Z 2 Z 0 
z 2 + z 0 


But this impedance is also equal to z 0 , so we have the equation 


We can solve for z 0 to get 


z o 


= Z 1 + 


Z 2 Z 0 * 
z 2 + z 0 


Zq = y + V 7 Oj/4) + ZiZ 2 • 


(22.27) 


So we have found the solution for the impedance of an infinite ladder of repeated 
series and parallel impedances. The impedance z 0 is called the characteristic 
impedance of such an infinite network. 

Let’s now consider a specific example in which the series element is an in- 
ductance L and the shunt element is a capacitance C, as shown in Fig. 22~20(a). 
In this case we find the impedance of the infinite network by setting z x = icoL 
and z 2 = l//toC. Notice that the first term, Zi/2, in Eq. (22.27) is just one-half 
the impedance of the first element. It would therefore seem more natural, or at 
least somewhat simpler, if we were to draw our infinite network as shown in Fig. 
22-20(b). Looking at the infinite network from the terminal a f we would see the 
characteristic impedance 

z 0 = V(L/C) ^(to 2 L 2 /4). (22.28) 

Now there are two interesting cases, depending on the frequency to. If co 2 is less 
than 4 /LC, the second term in the radical will be smaller than the first, and the 
impedance z () will be a real number. On the other hand, if to 2 is greater than 
4 /LC the impedance z 0 will be a pure imaginary number which we can write as 
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Fig. 22-20. An L-C ladder drawn 
in two equivalent ways. 


z 0 = iV(«2L2/4) - (L/C). 

We have said earlier that a circuit which contains only imaginary impedances, 
such as inductances and capacitances, will have an impedance which is purely 
imaginary. How can it be then that for the circuit we are now studying — which has 
only U s and C’s — the impedance is a pure resistance for frequencies below y/4 /LC? 
For higher frequencies the impedance is purely imaginary, in agreement with our 
earlier statement. For lower frequencies the impedance is a pure resistance and 
will therefore absorb energy. But how can the circuit continuously absorb energy, 
as a resistance does, if it is made only of inductances and capacitances? Answer: 
Because there is an infinite number of inductances and capacitances, so that when 
a source is connected to the circuit, it supplies energy to the first inductance and 
capacitance, then to the second, to the third, and so on. In a circuit of this kind, 
energy is continually absorbed from the generator at a constant rate and flows 
constantly out into the network, supplying energy which is stored in the induc- 
tances and capacitances down the line. 

This idea suggests an interesting point about what is happening in the circuit. 
We would expect that if we connect a source to the front end, the effects of this 
source will be propagated through the network toward the infinite end. The 
propagation of the waves down the line is much like the radiation from an antenna 
which absorbs energy from its driving source; that is, we expect such a propagation 
to occur when the impedance is real, which occurs if to is less than \/ 4/LC. But 
when the impedance is purely imaginary, which happens for to greater than \/ 4/ LC t 
we would not expect to see any such propagation. 
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22-7 Filters 


We saw in the last section that the infinite ladder network of Fig. 22-20 absorbs 
energy continuously if it is driven at a frequency below a certain critical frequency 
\/4/LC , which we will call the cutoff frequency co 0 . We suggested that this effect 
could be understood in terms of a continuous transport of energy down the line. 
On the other hand, at high frequencies, for co > w 0 , there is no continuous ab- 
sorption of energy; we should then expect that perhaps the currents don’t “pene- 
trate” very far down the line. Let’s see whether these ideas are right. 

Suppose we have the front end of the ladder connected to some ac generator 
and we ask what the voltage looks like at, say, the 754th section of the ladder. 
Since the network is infinite, whatever happens to the voltage from one section to 
the next is always the same; so let’s just look at what happens when we go from 
some section, say the nth to the next. We will define the currents h and voltages 
V n as shown in Fig. 22-2 1(a). 



Fig. 22-21. Finding the propagation factor of a ladder. 


We can get the voltage K„ +1 from V n by remembering that we can always 
replace the rest of the ladder after the nth section by its characteristic impedance z 0 ; 
then we need only analyze the circuit of Fig. 22-21 (b). First, we notice that any 
V nj since it is across z 0 , must equal I n z 0 . Also, the difference between V„ and V n ± \ 
is just I n Zii 

Vn - V n + l = /, ( Z, = V„ -J 

Zo 

So we get the ratio 

V n +t __ | 

Vn -0 "-o' 

We can call this ratio the propagation factor for one section of the ladder; we’ll 
call it a. It is, of course, the same for all sections: 

a = ■ (22.29) 

z 0 


The voltage after the nth section is then 

V n - a n F. (22.30) 

You can now find the voltage after 754 sections; it is just a to the 754th power 
times 8. 

Suppose we see what a is like for the L-C ladder of Fig. 22-20(a) Using z 0 
from Eq. (22.27), and z x - iooL, we get 

yU 7cPn^I J /4) - /(coL/2) (22 31) 

ViL/Cj-WL^/4) + i(uL/2) 

If the driving frequency is below the cutoff frequency u> ( , = \/4 /LC, the radical 
is a real number, and the magnitudes of the complex numbers in the numerator 
and denominator are equal. Therefore, the magnitude of a is one; we can write 

a — e iS , 

which means that the magnitude of the voltage is the same at every section, only 
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its phase changes. The phase change 5 is, in fact, a negative number and represents 
the “delay” of the voltage as it passes along the network. 

For frequencies above the cutoff frequency co 0 it is better to factor out an i 
from the numerator and denominator of Eq. (22 31) and rewrite it as 


VVLV 4) - (L/C) - (coL/2) 

a = — — • (22 H ) 

4) - (L/C) + (coL/2) 

The propagation factor a is now a real number, and a number less than one. That 
means that the voltage at any section is always less than the voltage at the pre- 
ceding section by the factor a. For any frequency above co () , the voltage dies 
away rapidly as we go along the network. A plot of the absolute value of a as a 
function of frequency looks like the graph in Fig. 22-22. 

We see that the behavior of a, both above and below co 0 , agrees with our 
interpretation that the network propagates energy for co < co () and blocks it for 
co > co n . We say that the network “passes” low frequencies and “rejects” oi 
“filters out” the high frequencies. Any network designed to have its characteristics 
vary in a prescribed way with frequency is called a “filter.” We have been analyzing 
a “low-pass filter.” 

You may be wondering why all this discussion of an infinite network which 
obviously cannot actually occur. The point is that the same characteristics arc 
found in a finite network if we finish it off at the end with an impedence equal to 
the characteristic impedence z 0 . Now in practice it is not possible to exactly 
reproduce the characteristic impedance with a few simple elements — like R' s, 
L’s, and C s. But it is often possible to do so with a fair approximation for a certain 
range of frequencies. In this way one can make a finite filter network whose 
properties are very nearly the same as those for the infinite case For instance, the 
L-C ladder behaves much as we have described it if it is terminated in the pure 
resistance R = \ r U/~C. 

If in our L-C ladder we interchange the positions of the L’s and C’s, to make 
the ladder shown in Fig. 22-23 (a), we can have a filter that propagates high fre- 
quencies and rejects low frequencies. It is easy to see what happens with this net- 
work by using the results we already have. You will notice that whenever we change 
an L to a C and vice versa , we also change every ico to 1 //co. So whatever happened 
at co before will now happen at 1 /co. In particular, we can see how a will vary with 
frequency by using Fig. 22-22 and changing the label on the axis to 1/co, as we 
have done in Fig. 22-23(b). 

The low-pass and high-pass filters we have described have various technical 
applications. An L-C low-pass filter is often used as a “smoothing” filter m a dc 
power supply. If we want to manufacture dc power from an ac source, we begin 
with a rectifier which permits current to flow only in one direction. From the 
rectifier we get a series of pulses that look like the function V(j) shown in 
Fig 22-24, which is lousy dc, because it wobbles up and down. Suppose we would 
like a nice pure dc, such as a battery provides. We can come close to that by 
putting a low-pass filter between the rectifier and the load. 

We know from Chapter 50 of Vol. I that the time function in Fig. 22-24 can be 
represented as a superposition of a constant voltage plus a sine wave, plus a higher- 
frequency sine wave, plus a still higher-frequency sine wave, etc. — by a Fourier 
series. If our filter is linear (if, as we have been assuming, the Z/s and C’s don't 
vary with the currents or voltages) then what comes out of the filter is the super- 
position of the outputs for each component at the input. If we arrange that the 
cutoff frequency co ( , of our filter is well below the lowest frequency in the function 
K(/), the dc (for which co = 0) goes through fine, but the amplitude of the first 
harmonic will be cut down a lot. And amplitudes of the higher harmonics will be 
cut down even more. So we can get the output as smooth as we wish, depending 
only on how many filter sections we are willing to buy. 

A high-pass filter is used if one wants to reject certain low frequencies. For 
instance, in a phonograph amplifier a high-pass filter may be used to let the music 



Fig 22-22. The propagation factor 
of a section of an L-C ladder 




(b) its propagation factor as a function 
of 1 co. 


1 /tn 



Fig. 22-24. The output voltage of a 
full-wave rectifier. 
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Fig. 22-25. (a) A band-pass filter, 

(b) A simple resonant filter. 



(a) 


through, while keeping out the low-pitched rumbling from the motor of the 
turntable. 

It is also possible to make “band-pass” filters that reject frequencies below 
some frequency u?i and above another frequency o> 2 (greater than tu x ), but pass the 
frequencies between co x and w 2 . This can be done simply by putting together a 
high-pass and a low-pass filter, but it is more usually done by making a ladder in 
which the impedances z x and z 2 are more complicated — being each a combination 
of Vs and C’s. Such a band-pass filter might have a propagation constant like 
that shown in Fig. 22-25(a). It might be used, for example, in separating signals 
that occupy only an interval of frequencies, such as each of the many voice channels 
in a high-frequency telephone cable, or the modulated carrier of a radio trans- 
mission. 

We have seen in Chapter 25 of Vol. I that such filtering can also be done using 
the selectivity of an ordinary resonance curve, which we have drawn for comparison 
in Fig. 22-25 (b). But the resonant filter is not as good for some purposes as the 
band-pass filter. You will remember (Chapter 48, Vol. I) that when a carrier of 
frequency co r is modulated with a “signal” frequency co s , the total signal contains 
not only the carrier frequency but also the two side-band frequencies co r + co s 
and co r — co s . With a resonant filter, these side-bands are always attentuated some- 
what, and the attenuation is more, the higher the signal frequency, as you can see 
from the figure. So there is a poor “frequency response.” The higher musical 
tones don’t get through. But if the filtering is done with a band-pass filter designed 
so that the width co 2 — to x is at least twice the highest signal frequency, the fre- 
quency response will be “flat” for the signals wanted. 

We want to make one more point about the ladder filter: the L-C ladder of 
Fig. 22-20 is also an approximate representation of a transmission line. If we 
have a long conductor that runs parallel to another conductor — such as a wire in a 
coaxial cable, or a wire suspended above the earth — there will be some capacitance 
between the two conductors and also some inductance due to the magnetic field 
between them. If we imagine the line as broken up into small lengths At, each 
length will look like one section of the VC ladder with a series inductance A L and 
a shunt capacitance AC. We can then use our results for the ladder filter. If we 
take the limit as At goes to zero, we have a good description of the transmission 
line. Notice that as At is made smaller and smaller, both A L and AC decrease, but 
in the same proportion, so that the ratio AL/AC remains constant. So if we take 
the limit of Eq. (22.28) as A L and AC go to zero, we find that the characteristic 
impedance z 0 is a pure resistance whose magnitude is V AL/AC. We can also 
write the ratio AL/AC as L 0 /C 0 , where L 0 and C 0 are the inductance and capaci- 
tance of a unit length of the line; then we have 



(22.33) 


You will also notice that as A L and AC go to zero, the cutoff frequency 
toy = V 4/LC goes to infinity. There is no cutoff frequency for an ideal 
transmission line. 



mutual inductance. 


22-8 Other circuit elements 

We have so far defined only the ideal circuit impedances — the inductance, 
the capacitance, and the resistance — as well as the ideal voltage generator. We want 
now to show that other elements, such as mutual inductances or transistors or 
vacuum tubes, can be described by using only the same basic elements. Suppose 
that we have two coils and that on purpose, or otherwise, some flux from one of 
the coils links the other, as shown in Fig. 22-26(a). Then the two coils will have a 
mutual inductance M such that when the current varies in one of the coils, there 
will be a voltage generated in the other. Can we take into account such an effect 
in our equivalent circuits? We can in the following way. We have seen that the 
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induced emfs in each of two interacting coils can be written as the sum of two parts: 


Si — — L\ 


dh 

dt 


M 


dl 2 
dt 


S 2 = - 


r dl^> dl \ 

L 2 -r =*= M -ri 

dt dt 


(22.34) 


The first term comes from the self-inductance of the coil, and the second term 
comes from its mutual inductance with the other coil. The sign of the second term 
can be plus or minus, depending on the way the flux from one coil links the other. 
Making the same approximations we used m describing an ideal inductance, we 
would say that the potential difference across the terminals of each coil is equal to 
the electromotive force in the coil. Then the two equations of (22.34) are the same 
as the ones we would get from the circuit of Fig. 22-26(b), provided the electro- 
motive force in each of the two circuits shown depends on the current in the 
opposite circuit according to the relations 

£i = ±icoMl%, (22.35) 

So what we can do is represent the effect of the self-inductance in a normal way but 
replace the effect of the mutual inductance by an auxiliary ideal voltage generator. 
We must in addition, of course, have the equation that relates this emf to the 
current in some other part of the circuit; but so long as this equation is linear, we 
have just added more linear equations to our circuit equations, and all of our 
earlier conclusions about equivalent circuits and so forth are still correct. 

In addition to mutual inductances there may also be mutual capacitances. 
So far, when we have talked about condensers we have always imagined that there 
were only two electrodes, but in many situations, for example in a vacuum tube, 
there may be many electrodes close to each other. If we put an electric charge on 
any one of the electrodes, its electric field will induce charges on each of the other 
electrodes and affect its potential. As an example, consider the arrangement of 
four plates shown in Fig. 22-27(a). Suppose these four plates are connected to 
external circuits by means of the wires A , B , C, and D. So long as we are only 
worried about electrostatic effects, the equivalent circuit of such an arrangement 
of electrodes is as shown in part (b) of the figure. The electrostatic interaction of 
any electrode with each of the others is equivalent to a capacity between the 
two electrodes. 

Finally, let’s consider how we should represent such complicated devices as 
transistors and radio tubes in an ac circuit. We should point out at the start that 
such devices are often operated in such a way that the relationship between the 
currents and voltages is not at all linear. In such cases, those statements we have 
made which depend on the linearity of equations are, of course, no longer correct. 
On the other hand, in many applications the operating characteristics are sufficiently 
linear that we may consider the transistors and tubes to be linear devices. By this 
we mean that the alternating currents in, say, the plate of a vacuum tube are linearly 
proportional to the voltages that appear on the other electrodes, say the grid 
voltage and the plate voltage. When we have such linear relationships, we can 
incorporate the device into our equivalent circuit representation. 

As in the case of the mutual inductance, our representation will have to include 
auxiliary voltage generators which describe the influence of the voltages or currents 
in one part of the device on the currents or voltages in another part. For example, 
the plate circuit of a triode can usually be represented by a resistance in series with 
an ideal voltage generator whose source strength is proportional to the grid voltage. 
We get the equivalent circuit shown m Fig. 22-28.* Similarly, the collector circuit 



(b) 



Fig. 22-27. Equivalent circuit of 
mutual capacitance. 



Fig. 22-28. A low-frequency equiv- 
alent circuit of a vacuum triode. 


* The equivalent circuit shown is correct only for low frequencies. For high frequencies 
the equivalent circuit gets much more complicated and will include various so-called 
“parasitic” capacitances and inductances. 


22-17 




Fig. 22-29. A low-frequency equiv- 
alent circuit of a transistor. 


E C 



of a transistor is conveniently represented as a resistor in series with an ideal 
voltage generator whose source strength is proportional to the current from the 
emitter to the base of the transistor. The equivalent circuit is then like that m Fig. 
22-29. So long as the equations which describe the operation are linear, we can 
use such representations for tubes or transistors. Then, when they are incorporated 
in a complicated network, our general conclusions about the equivalent representa- 
tion of any arbitrary connection of elements is still valid. 

There is one remarkable thing about transistor and radio tube circuits which 
is different from circuits containing only impedances: the real part of the effective 
impedance z off can become negative. We have seen that the real part of z represents 
the loss of energy. But it is the important characteristic of transistors and tubes 
that they supply energy to the circuit. (Of course they don’t just “make” energy; 
they take energy from the DC circuits of the power supplies and convert it into 
ac energy.) So it is possible to have a circuit with a negative resistance. Such a 
circuit has the property that if you connect it to an impedance with a positive real 
part, i.e., a positive resistance, and arrange matters so that the sum of the two 
real parts is exactly zero, then there is no dissipation in the combined circuit. If 
there is no loss of energy, any alternating voltage once started will remain forever. 
This is the basic idea behind the operation of an oscillator or signal generator which 
can be used as a source of alternating voltage at any desired frequency. 
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Cavily Resonators 


23-1 Real circuit elements 


When looked at from any one pair of terminals, any arbitrary circuit made 
up of ideal impedances and generators is, at any given frequency, equivalent to a 
generator K in series with an impedance z. That comes about because if we put a 
voltage V across the terminals and solve all the equations to find the current /, 
we must get a linear relation between the current and the voltage. Since all the 
equations are linear, the result for / must also depend only linearly on V The 
most general linear form can be expressed as 


23-1 Real circuit elements 

23-2 A capacitor at high frequencies 

23-3 A resonant cavity 

23-4 Cavity modes 

23-5 Cavities and resonant circuits 


/ = \ (V - S). (23.1) 

In general, both z and E may depend in some complicated way on the frequency w. 
Equation (23 I), however, is the relation we would get if behind the two terminals 
there was just the generator f>(aj) in series with the impedance z(co). 

There is also the opposite kind of question* If we have any electromagnetic 
device at all with two terminals and we measure the relation between / and V to 
detei mine g and z as functions of frequency, can we find a combination of our ideal 
elements that is equivalent to the internal impedance z? The answer is that for 
any reasonable — that is, physically meaningful — function z(o>), it is possible to 
approximate the situation to as high an accuracy as you wish with a circuit containing 
a finite set of ideal elements. We don't want to consider the general problem now 
but only look at what might be expected from physical arguments for a few cases 

If we think of a real resistor, we know that the current through it will produce 
a magnetic field. So any real resistor should also have some inductance. Also, 
when a resistor has a potential difference across it, there must be charges on the 
ends of the resistoi to produce the necessary electric fields As the voltage changes, 
the charges will change in proportion, so the resistor will also have some capaci- 
tance. We expect that a real resistor might have the equivalent circuit shown in 
Fig 23-1 In a well-designed resistor, the so-called “parasitic” elements L and C 
are small, so that at the frequencies for which it is intended, coL is much less than 
/?, and 1/cuC is much greater than R. It may therefore be possible to neglect them 
As the frequency is raised, however, they will eventually become important, and a 
resistor begins to look like a resonant circuit. 

A real inductance is also not equal to the idealized inductance, whose impe- 
dance is loot. A real coil of wire wifi have some resistance, so at low frequencies the 
coil is really equivalent to an inductance in series with some resistance, as shown m 
Fig. 23-2(a) But, you are thinking, the resistance and inductance are together in a 
real coil — the resistance is spread all along the wire, so it is mixed in with the 
inductance. We should probably use a circuit more like the one m Fig. 23-2(b), 
which has several little /?’ s and & s in series. But the total impedance of such a 
circuit is just £/? + which is equivalent to the simpler diagram of part (a) 

As we go up in frequency with a real coil, the approximation of an inductance 
plus a resistance is no longer very good. The charges that must build up on the 
wires to make the voltages will become important. It is as if there were little con- 
densers across the turns of the coil, as sketched in Fig 23-3(a). We might try to 
approximate the real coil by the circuit in Fig. 23— 3(b). At low frequencies, this 
circuit can be imitated fairly well by the simpler one in part (c) of the figure (which 
is again the same resonant circuit we found for the high-frequency model of a 
resistor) For higher frequencies, however, the more complicated circuit of 
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Fig. 23—1. Equivalent circuit of a 
real resistor. 



(a) (b) 

Fig. 23-2. The equivalent circuit of 
a real inductance at low frequencies. 
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Fig. 23-3. The equivalent circuit of 
a real inductance at higher frequencies. 


Fig. 23— 3(b) is better. In fact, the more accurately you wish to represent the actual 
impedance of a real, physical inductance, the more ideal elements you will have to 
use in the artificial model of it. 

Let's look a little more closely at what goes on in a real coil. The impedance 
of an inductance goes as toL, so it becomes zero at low frequencies — it is a “short 
circuit”: all we see is the resistance of the wire. As we go up in frequency, c oL soon 
becomes much larger than R , and the coil looks pretty much like an ideal induc- 
tance. As we go still higher, however, the capacities become important. Their 
impedance is proportional to 1/toC, which is large for small ox For small enough 
frequencies a condenser is an “open circuit,” and when it is in parallel with some- 
thing else, it draws no current. But at high frequencies, the current prefers to flow 
into the capacitance between the turns, rather than through the inductance. So 
the current in the coil jumps from one turn to the other and doesn't bother to go 
around and around where it has to buck the emf So although we may have 
intended that the current should go around the loop, it will take the easier path — the 
path of least impedance. 

If the subject had been one of popular interest, this effect would have been 
called “the high-frequency barrier,” or some such name. The same kind of thing 
happens in all subjects. In aerodynamics, if you try to make things go faster than 
the speed of sound when they were designed for lower speeds, they don’t work. 
It doesn’t mean that there is a great “barrier” there, it just means that the object 
should be redesigned. So this coil which we designed as an “inductance” is not 
going to work as a good inductance, but as some other kind of thing at very high 
frequencies. For high frequencies, we have to find a new design. 


23-2 A capacitor at high frequencies 

Now we want to discuss in detail the behavior of a capacitor — a geometrically 
ideal capacitor — as the frequency gets larger and larger, so we can see the transition 
of its properties. (We prefer to use a capacitor instead of an inductance, because 
the geometry of a pair of plates is much less complicated than the geometry of a 
coil.) We consider the capacitor shown in Fig. 23-4(a), which consists of two par- 
allel circular plates connected to an external generator by a pair of wires. If we 
charge the capacitor with dc, there will be a positive charge on one plate and a 
negative charge on the other; and there will be a uniform electric field between the 
plates. 

Now suppose that instead of DC, we put an a c of low frequency on the plates. 
(We will find out later what is “low” and what is “high”.) Say we connect the ca- 
pacitor to a lower-frequency generator. As the voltage alternates, the positive 
charge on the top plate is taken off and negative charge is put on. While that is 
happening, the electric field disappears and then builds up in the opposite direction. 




Fig. 23-4. The electric and magnetic fields between the plates of a capacitor. 
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As the charge sloshes back and forth slowly, the electric field follows At each 
instant the electric field is uniform, as shown in Fig 23-4(b), except for some edge 
effects which we are going to disregard. We can write the magnitude of the electric 
field as 

E = E 0 e™\ (23.2) 

where E 0 is a constant. 

Now will that continue to be right as the frequency goes up? No, because as 
the electric field is going up and down, there is a flux of electric field through any 
loop like r t in Fig 23-4(a). And, as you know, a changing electric field acts to 
produce a magnetic field. One of Maxwell’s equations says that when there is a 
varying electric field, as there is here, there has got to be a line integral of the 
magnetic field. The integral of the magnetic field around a closed ring, multiplied 
by c 2 , is equal to the time rate-of-change of the electric flux through the area 
inside the ring (if there are no currents): 

c 2 j> B • ds — ~ j E - n da. (23.3) 

inside T 

So how much magnetic field is there? That’s not very hard. Suppose that we take 
the loop Tj, which is a circle of radius r. We can see from symmetry that the 
magnetic field goes around as shown in the figure. Then the line integral of B is 
lirrB. And, since the electric field is uniform, the flux of the electric field is simply 
E multiplied by irr 2 , the area of the circle: 

c 2 B • 27 rr — ~ E * i rr 2 . (23.4) 

oi 

The derivative of E with respect to time is, for our alternating field, simply /co£ 0 e^. 
So we find that our capacitor has the magnetic field 

B = g E 0 e lwt . (23.5) 

In other words, the magnetic field also oscillates and has a strength proportional 
to r. 

What is the effect of that? When there is a magnetic field that is varying, there 
will be induced electric fields and the capacitor will begin to act a little bit like an 
inductance. As the frequency goes up, the magnetic field gets stronger; it is pro- 
portional to the rate of change of E , and so to co The impedance of the capacitor 
will no longer be simply l//coC. 

Let’s continue to raise the frequency and to analyze what happens more care- 
fully. We have a magnetic field that goes sloshing back and forth. But then the 
electric field cannot be uniform, as we have assumed! When there is a varying 
magnetic field, there must be a line integral of the electric field— because of Faraday’s 
law So if there is an appreciable magnetic field, as begins to happen at high fre- 
quencies, the electric field cannot be the same at all distances from the center. The 
electric field must change with r so that the line integral of the electric field can 
equal the changing flux of the magnetic field. 

Let’s see if we can figure out the correct electric field. We can do that by 
computing a “correction” to the uniform field we originally assumed for low 
frequencies. Let’s call the uniform field E l9 which will still be E () e lut , and write 
the correct field as 

E = Ei T - E 2j 

where E 2 is the correction due to the changing magnetic field. For any co we will 
write the field at the center of the condenser as E 0 e lut (thereby defining E 0 ), so 
that we have no correction at the center; E 2 ~ 0 at r = 0, 

To find E 2 we can use the integral form of Faraday’s law: 


J E * ds = — ~ (flux of B ). 
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The integrals are simple if we take them for the curve F 2 , shown in Fig 23-4(b), 
which goes up along the axis, out radially the distance r along the top plate, down 
vertically to the bottom plate, and back to the axis The line integral of around 
this curve is, of course, zero; so only E 2 contributes, and its integral is just 
~E 2 (r) ■ /?, where h is the spacing between the plates. (We call E positive if it 
points upward.) This is equal to the rate of change of the flux of B , which we have 
to get by an integral over the shaded area S inside F 2 in Fig. 23-4(b). The flux 
through a vertical strip of width dr is B(r)h dr , so the total flux is 


h / B(r) dr. 


Setting — d/dt of the flux equal to the line integral of E 2 , we have 


E 2 (r) = ^ f B(r) dr . 


(23.6) 



Notice that the h cancels out , the fields don't depend on the separation of the plates 
Using Eq (23.5) for B(r ), we have 


E 2 (r) 


d loir 

it ■ 


The time derivative just brings down another factor /to; we get 


^(0 = - ~ E 0 e^. 


(23.7) 


As we expect, the induced field tends to reduce the electric field farther out. The 
corrected field E = E 1 + E > is then 


E — E\ + E 2 




T" 1 l(jl3 t 


(23.8) 


Fig. 23-5. The electric field between 
the capacitor plates at high frequency. 
(Edge effects are neglected.) 


The electric field in the capacitor is no longer uniform; it has the parabolic 
shape shown by the broken line in Fig. 23-5 You see that our simple capacitor is 
getting slightly complicated. 

We could now use our results to calculate the impedance of the capacitor 
at high frequencies. Knowing the electric field, we could compute the charges on 
the plates and find out how the current through the capacitor depends on the 
frequency go, but we are not interested in that problem for the moment. We are 
more interested in seeing what happens as we continue to go up with the frequency 
— to see what happens at even higher frequencies Aren't we already finished? 
No, because we have corrected the electric field, which means that the magnetic 
field we have calculated is no longer right. The magnetic field of Eq. (23.5) is 
approximately right, but it is only a first approximation So let's call it B { We 
should then rewrite Eq. (23.5) as 

Br = ~ E 0 e Ml . (23.9) 


You will remember that this field was produced by the variation of E x . Now the 
correct magnetic field will be that produced by the total electric field E\ + E 2 . 
If we write the magnetic field as B = B v + B 2 , the second term is just the addi- 
tional field produced by E 2 To find B 2 we can go through the same arguments 
we have used to find B u the line integral of B 2 around the curve is equal to 
the rate of change of the flux of E 2 through Fi. We will just have Eq (23 4) again 
with B replaced by B 2 and E replaced by E 2 : 

rj 

c 2 B 2 * 27 rr = — (flux of E 2 through F x ). 


Since E 2 varies with radius, to obtain its flux we must integrate over the circular 
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surface inside r lt Using lirr dr as the element of area, this integral is 


f E 2 (r) ■ 2 irr dr . 

Jo 

So we get for B 2 (r) 

BrAr) = j&Tt f EAr)r dr - (23 - 10) 

Using E 2 (r) from Eq. (23.7), we need the integral of r 3 dr , which is, of course, 
z* 4 / 4. Our correction to the magnetic field becomes 


B 2 (r) = 


7 V,3„3 

^ ^ TP l(a t 

T6c* E ° e ' 


(23.11) 


But we are still not finished! If the magnetic field B is not the same as we first 
thought, then we have incorrectly computed E 2 . We must make a further cor- 
rection to £, which comes from the extra magnetic field B 2 . Let's call this additional 
correction to the electric field E z . It is related to the magnetic field B 2 in the same 
way that E 2 was related to B\, We can use Eq. (23.6) all over again just by chang- 
ing the subscripts: 

Ea(r) = | /fi 2 ( r) dr. (23.12) 


Using our result, Eq. (23.11), for B 2 , the new correction to the electric field is 




(23.13) 


Writing our doubly corrected electric field as E = E\ + E 2 + E A , we get 


E = E 0 e v 




The variation of the electric field with radius is no longer the simple parabola we 
drew in Fig. 23-5, but at large radii lies slightly above the curve ( E x + E 2 ). 

We are not quite through yet. The new electric field produces a new correction 
to the magnetic field, and the newly corrected magnetic field will produce a further 
correction to the electric field, and on and on. However, we already have all the 
formulas that we need For B 3 we can use Eq. (23.10), changing the subscripts of 
B and E from 2 to 3. 

The next correction to the electric field is 


Ea ~ 


1 


2 2 • 4 2 • 6 2 


(t) 


=- E () e“ 


So to this order we have that the complete electric field is given by 



(23.15) 

where we have written the numerical coefficients in such a way that it is obvious 
how the series is to be continued. 

Our final result is that the electric field between the plates of the capacitor, 
for any frequency, is given by E^e™ 1 times the infinite series which contains only 
the variable oor/c. If we wish, we can define a special function, which we will call 
y 0 (-^T as the infinite series that appears in the brackets of Eq. (23.15): 


J 0 (x) = 1 



(31)5 • <23 ' l6) 
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Then we can write our solution as E 0 e l(Jlt times this function, with x = oor/c: 

E = E 0 e lat J 0 ' (23.17) 

The reason we have called our special function J 0 is that, naturally, this is not 
the first time anyone has ever worked out a problem with oscillations in a cylinder. 
The function has come up before and is usually called J {) . It always comes up 
whenever you solve a problem about waves with cylindrical symmetry. The func- 
tion J {) is to cylindrical waves what the cosine function is to waves on a straight 
line. So it is an important function, invented a long time ago. Then a man named 
Bessel got his name attached to it. The subscript zero means that Bessel invented 
a whole lot of different functions and this is just the first of them. 

The other functions of Bessel — J u J 2 , and so on — have to do with cylindrical 
waves which have a variation of their strength with the angle around the axis of 
the cylinder. 

The completely corrected electric field between the plates of our circular 
capacitor, given by Eq. (23.17), is plotted as the solid line in Fig. 23-5. For 
frequencies that are not too high, our second approximation was already quite 
good. The third approximation was even better — so good, in fact, that if we had 
plotted it, you would not have been able to see the difference between it and the 
solid curve. You will see in the next section, however, that the complete series is 
needed to get an accurate description for large radii, or for high frequencies. 


23-3 A resonant cavity 

We want to look now at what our solution gives for the electric field between 
the plates of the capacitor as we continue to go to higher and higher frequencies. 
For large w, the parameter x = cor/c also gets large, and the first few terms in the 
series for J () of x will increase rapidly. That means that the parabola we have 
drawn in Fig. 23-5 curves downward more steeply at higher frequencies In fact, 
it looks as though the field would fall all the way to zero at some high frequency, 
perhaps when c / to is approximately one-half of a. Let’s see whether J {) does indeed 
go through zero and become negative. We begin by trying x - 2* 


Jq( x ) 1 


1 0- 



Fig. 23-6. The Bessel function J 0 (x). 


M2) =1 — 1+i — bV = 0 22 

The function is still not zero, so let’s try a higher value of jc, say, x = 2 5 Putting 
in numbers, we write 

7n(2.5) = I — 1.56 + 0.61 - 0 09 = -0.04. 

The function J 0 has already gone through zero by the time we get to x — 2.5. 
Comparing the results for x — 2 and x = 2 5, it looks as though J lt goes through 
zero at one-fifth of the way from 2.5 to 2. We would guess that the zero occurs for 
-v approximately equal to 2.4. Let’s see what that value of x gives: 

y 0 (2 4) = I — 1.44 + 0.52 - 0 08 - 0 00 

We get zero to the accuracy of our two decimal places. If we make the calculation 
more accurate (or since J {) is a well-known function, if we look it up in a book), we 
find that it goes through zero at x = 2 405 We have worked it out by hand to 
show you that you too could have discovered these things lather than having to 
borrow them from a book 

As long as we are looking up J {) in a book, it is interesting to notice how it 
goes for larger values of x, it looks like the graph in Fig 23-6. As x increases, 
y„(x) oscillates between positive and negative values with a decreasing amplitude 
of oscillation 

We have gotten the following interesting result: If we go high enough in fre- 
quency, the electric field at the center of our condenser will be one way and the 
electric field near the edge will point in the opposite direction. Foi example. 
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suppose that we take an co high enough so that = cor/c at the outer edge of the 
capacitor is equal to 4; then the edge of the capacitor corresponds to the abscissa 
x = 4 in Fig. 23-6. This means that our capacitor is being operated at the fre- 
quency co = 4 c/a At the edge of the plates, the electric field will have a rather 
high magnitude opposite the direction we would expect. That is the terrible thing 
that can happen to a capacitor at high frequencies. If we go to very high frequencies, 
the direction of the electric field oscillates back and forth many times as we go 
out from the center of the capacitor. Also there are the magnetic fields associated 
with these electric fields. It is not surprising that our capacitor doesn’t look like 
the ideal capacitance for high frequencies. We may even start to wonder whether 
it looks more like a capacitor or an inductance We should emphasize that there 
are even more complicated effects that we have neglected which happen at the edges 
of the capacitor. For instance, there will be a radiation of waves out past the edges, 
so the fields are even more complicated than the ones we have computed, but we 
will not worry about those effects now. 

We could try to figure out an equivalent circuit for the capacitor, but perhaps 
it is better if we just admit that the capacitor we have designed for low-frequency 
fields is just no longer satisfactory when the frequency is too high. If we want to 
treat the operation of such an object at high frequencies, we should abandon the 
approximations to Maxwell’s equations that we have made for treating circuits 
and return to the complete set of equations which describe completely the fields 
in space Instead of dealing with idealized circuit elements, we have to deal with 
the real conductors as they are, taking into account all the fields in the spaces in 
between. For instance, if we want a resonant circuit at high frequencies we will 
not try to design one using a coil and a parallel-plate capacitor. 

We have already mentioned that the parallel-plate capacitor we have been 
analyzing has some of the aspects of both a capacitor and an inductance. With the 
electric field there are charges on the surfaces of the plates, and with the magnetic 
fields there are back emf’s. Is it possible that we already have a resonant circuit? 
We do indeed. Suppose we pick a frequency for which the electric field pattern 
falls to zero at some radius inside the edge of the disc; that is, we choose c oa/c 
greater than 2.405 Everywhere on a circle coaxial with the plates the electric field 
will be zero. Now suppose we take a thin metal sheet and cut a strip just wide 
enough to fit between the plates of the capacitor. Then we bend it into a cylinder 
that will go around at the radius where the electric field is zero Since there are 
no electric fields there, when we put this conducting cylinder in place, no currents 
will flow in it; and there will be no changes in the electric and magnetic fields. We 
have been able to put a direct short circuit across the capacitor without changing 
anything. And look what we have; we have a complete cylindrical can with elec- 
trical and magnetic fields inside and no connection at all to the outside world 
The fields inside won’t change even if we throw away the edges of the plates outside 
our can, and also the capacitor leads. All we have left is a closed can with electric 
and magnetic fields inside, as shown in Fig. 23-7(a). The electric fields are os- 
cillating back and forth at the frequency co — which, don’t forget, determined the 
diameter of the can The amplitude of the oscillating E field varies with the distance 
from the axis of the can, as shown in the graph of Fig. 23— 7(b). This curve is just 
the first arch of the Bessel function of zero order. There is also a magnetic field 
which goes in circles around the axis and oscillates in time 90° out of phase with 
the electric field 

We can also write out a series for the magnetic field and plot it. as shown in 
the graph of Fig. 23-7 (c). 

How is it that we can have an electric and magnetic field inside a can with no 
external connections? It is because the electric and magnetic fields maintain them- 
selves: the changing E makes a B and the changing B makes an E — all according 
to the equations of Maxwell. The magnetic field has an inductive aspect, and the 
electric field a capacitive aspect; together they make something like a resonant 
circuit. Notice that the conditions we have described would only happen if the 
radius of the can is exactly 2.405 c/co. For a can of a given radius, the oscillating 
electric and magnetic fields will maintain themselves — in the way we have described 
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Fig. 23—7. The electric and magnetic 
fields in an enclosed cylindrical can. 
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— only at that particular frequency. So a cylindrical can of radius r is resonant at 
the frequency 

w 0 = 2.405 - • (23.18) 
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Fig. 23-8. Coupling into and out of 
a resonant cavity. 



Fig. 23-9. A setup for observing the 
caviry resonance. 



Fig. 23-10. The frequency response 
curve of a resonant cavity. 


We have said that the fields continue to oscillate in the same way after the can 
is completely closed. That is not exactly right. It would be possible if the walls 
of the can were perfect conductors. For a real can, however, the oscillating cur- 
rents which exist on the inside walls of the can lose energy because of the resistance 
of the material. The oscillations of the fields will gradually die away. We can see 
from Fig. 23-7 that there must be strong currents associated with electric and 
magnetic fields inside the cavity. Because the vertical electrical field stops suddenly 
at the top and bottom plates of the can, it has a large divergence there; so there 
must be positive and negative electric charges on the inner surfaces of the can, as 
shown in Fig. 23-7(a) When the electric field reverses, the charges must reverse 
also, so there must be an alternating current between the top and bottom plates 
of the can These charges will flow in the sides of the can, as shown in the figure 
We can also see that there must be currents in the sides of the can by considering 
what happens to the magnetic field The graph of Fig. 23— 7(c) tells us that the 
magnetic field suddenly drops to zero at the edge of the can Such a sudden change 
in the magnetic field can happen only if there is a current in the wall This current 
is what gives the alternating electric charges on the top and bottom plates of the 
can. 

You may be wondering about our discovery of currents in the vertical sides of 
the can What about our earlier statement that nothing would be changed when we 
introduced these vertical sides in a region where the electric field was zero 9 Re- 
member, however, that when we first put in the sides of the can, the top and 
bottom plates extended out beyond them, so that there were also magnetic fields 
on the outside of our can It was only when we threw away the parts of the 
capacitor plates beyond the edges of the can that net currents had to appear on the 
insides of the vertical walls 

Although the electric and magnetic fields in the completely enclosed can will 
gradually die away because of the energy losses, we can stop this from happening 
if we make a little hole in the can and put in a little bit of electrical energy to make 
up the losses We take a small wire, poke it through the hole in the side of the can, 
and fasten it to the inside wall so that it makes a small loop, as shown in Fig 23-8. 
If we now connect this wire to a source of high-frequency alternating current, this 
current will couple energy into the electric and magnetic fields of the cavity and 
keep the oscillations going. This will happen, of course, only if the frequency of the 
driving source is at the resonant frequency of the can If the source is at the wrong 
frequency, the electric and magnetic fields will not resonate, and the fields m the 
can will be very weak. 

The resonant behavior can easily be seen by making another small hole in 
the can and hooking in another coupling loop, as we have also drawn in Fig. 23-8. 
The changing magnetic field through this loop will generate an induced electro- 
motive force in the loop. If this loop is now connected to some external measuring 
circuit, the currents will be proportional to the strength of the fields in the cavity 
Suppose we now connect the input loop of our cavity to an RF signal generator, 
as shown m Fig. 23-9. The signal generator contains a source of alternating current 
whose frequency can be varied by varying the knob on the front of the generator. 
Then we connect the output loop of the cavity to a “detector," which is an instru- 
ment that measures the current from the output loop. It gives a meter reading pro- 
portional to this current. If we now measure the output current as a function of 
the frequency of the signal generator, we find a curve like that shown in Fig 23-10. 
The output current is small for all frequencies except those very near the frequency 
co {) , which is the resonant frequency of the cavity. The resonance curve is very much 
like those we described in Chapter 23 of Vol. I. The width of the resonance is, 
however, much narrower than we usually find for resonant circuits made of induc- 
tances and capacitors; that is, the Q of the cavity is very high. It is not unusual 
to find Q ' s as high as 100,000 or more if the inside walls of the cavity are made of 
some material with a very good conductivity, such as silver 
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23-4 Cavity modes 


Suppose we now try to check our theory by making measurements with an 
actual can. We take a can which is a cylinder with a diameter of 3.0 inches and a 
height of about 2.5 inches. The can is fitted with an input and output loop, as 
shown in Fig. 23-8. If we calculate the resonant frequency expected for this can 
according to Eq (23.18), we get that / 0 = w 0 /2ir = 3010 megacycles When 
we set the frequency of our signal generator near 3000 megacycles and vary it 
slightly until we find the resonance, we observe that the maximum output current 
occurs for a frequency of 3050 megacycles, which is quite close to the predicted 
resonant frequency, but not exactly the same. There are several possible reasons 
for the discrepancy. Perhaps the resonant frequency is changed a little bit because 
of the holes we have cut to put in the coupling loops. A little thought, however, 
shows that the holes should lower the resonant frequency a little bit, so that cannot 
be the reason. Perhaps there is some slight error in the frequency calibration of the 
signal generator, or perhaps our measurement of the diameter of the cavity is not 
accurate enough. Anyway, the agreement is fairly close. 

Much more important is something that happens if we vary the frequency of 
our signal generator somewhat further from 3000 megacycles. When we do that 
we get the results shown in Fig. 23-11. We find that, in addition to the resonance 
we expected near 3000 megacycles, there is also a resonance near 3300 megacycles 
and one near 3820 megacycles. What do these extra resonances mean? We might 
get a clue from Fig. 23-6. Although we have been assuming that the first zero of 
the Bessel function occurs at the edge of the can, it could also be that the second 
zero of the Bessel function corresponds to the edge of the can, so that there is one 
complete oscillation of the electric field as we move from the center of the can out 
to the edge, as shown in Fig. 23-12. This is another possible mode for the oscillating 
fields. We should certainly expect the can to resonate in such a mode. But 
notice, the second zero of the Bessel function occurs at x: = 5.52, which is over 
twice as large as the value at the first zero. The resonant frequency of this mode 
should therefore be higher than 6000 megacycles. We would, no doubt, find it 
there, but it doesn’t explain the resonance we observe at 3300. 

The trouble is that in our analysis of the behavior of a resonant cavity we have 
considered only one possible geometric arrangement of the electric and magnetic 
fields. We have assumed that the electric fields are vertical and that the magnetic 
fields lie in horizontal circles. But other fields are possible. The only requirements 
are that the fields should satisfy Maxwell’s equations inside the can and that the 
electric field should meet the wall at right angles. We have considered the case in 
which the top and the bottom of the can are flat, but things would not be completely 
different if the top and bottom were curved. In fact, how is the can supposed to 
know which is its top and bottom, and which are its sides? It is, in fact, possible 
to show that there is a mode of oscillation of the fields inside the can in which the 
electric fields go more or less across the diameter of the can, as shown in Fig. 23-13. 

It is not too hard to understand why the natural frequency of this mode 
should be not very different from the natural frequency of the first mode we have 
considered. Suppose that instead of our cylindrical cavity we had taken a cavity 
which was a cube 3 inches on a side. It is clear that this cavity would have three 
different modes, but all with the same frequency. A mode with the electric field 
going more or less up and down would certainly have the same frequency as the 
mode in which the electric field was directed right and left If we now distort the 
cube into a cylinder, we will change these frequencies somewhat. We would still 
expect them not to be changed too much, provided we keep the dimensions of the 
cavity more or less the same. So the frequency of the mode of Fig. 23-13 should 
not be too different from the mode of Fig. 23-8. We could make a detailed cal- 
culation of the natural frequency of the mode shown in Fig. 23-13, but we will not 
do that now. When the calculations are carried through, it is found that, for the 
dimensions we have assumed, the resonant frequency comes out very close to the 
observed resonance at 3300 megacycles. 

By similar calculations it is possible to show that there should be still another 
mode at the other resonant frequency we found near 3800 megacycles For this 
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Fig. 23-11. Observed resonant fre- 
quencies of a cylindrical cavity. 



Fig. 23-13. A transverse mode of 
the cylindrical cavity. 
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Fig. 23-14. Another mode of a cy- 
lindrical cavity. 


mode, the electric and magnetic fields are as shown in Fig. 23-14. The electric 
field does not bother to go all the way across the cavity It goes from the sides to 
the ends, as shown. 

As you will probably now believe, if we go higher and higher in frequency we 
should expect to find more and more resonances. There are many different modes.' 
each of which will have a different resonant frequency corresponding to some par- 
ticular complicated arrangement of the electric and magnetic fields. Each of these 
field arrangements is called a resonant mode. The resonance frequency of each mode 
can be calculated by solving Maxwell’s equations for the electric and magnetic 
fields in the cavity. 

When we have a resonance at some particular frequency, how can we know 
w'hich mode is being excited 9 One way is to poke a little wire into the cavity 
through a small hole If the electric field is along the wire, dS in Fig 23-15(a), 
there will be relatively large currents in the wire, sapping energy from the fields, 
and the resonance will be suppressed. If the electric field is as shown in Fig. 
23-1 5(b), the wire will have a much smaller effect. We could find which way the 
field points in this mode by bending the end of the wire, as shown in Fig 23-1 5(c) 
Then, as we rotate the wire, there will be a big effect when the end of the wire is 
parallel to E and a small effect when it is lotated so as to be at 90° to E. 





Fig. 23-15. A short metal wire inserted into a cavity will disturb the 
resonance much more when it is parallel to E than when it is at right angles. 


23-5 Cavities and resonant circuits 

Although the resonant cavity we have been desciibing seems to be quite 
different from the ordinary resonant circuit consisting of an inductance and a 
capacitor, the two resonant systems are, of course, closely related They are both 
members of the same family; they are just two extreme cases of electromagnetic 
resonators — and there are many intermediate cases between these two extremes. 
Suppose we start by considering the resonant circuit of a capacitor in parallel with 
an inductance, as shown in Fig. 23-16(a). This circuit will resonate at the frequency 
co (J = 1 /VLC. If we want to raise the resonant frequency of this circuit, we can 
do so by lowering the inductance L . One way is to decrease the number of turns in 
the coil. We can, however, go only so far in this direction. Eventually we will get 
down to the last turn, and we will have just a piece of wire joining the top and 
bottom plates of the condenser. We could raise the resonant frequency still further 
by making the capacitance smaller; however, we can also continue to decrease the 
inductance by putting several inductances in parallel Two one-turn inductances in 
parallel will have only half the inductance of each turn. So when our inductance 
has been reduced to a single turn, we can continue to raise the resonant frequency 
by adding other single loops from the top plate to the bottom plate of the condenser. 
For instance. Fig 23— 1 6(b) shows the condenser plates connected by six such 
“single-turn inductances.” If we continue to add many sucli pieces of wire, we can 
make the transition to the completely enclosed resonant system shown in part (c) 
of the figure, which is a drawing of the cross section of a cylindncally symmetrical 
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Fig. 23-16. Resonators of progressively higher resonant frequencies. 


object. Our inductance is now a cylindrical hollow can attached to the edges of 
the condenser plates. The electric and magnetic fields will be as shown in the 
figure. Such an object is, of course, a resonant cavity. Jt is called a “loaded” cavity. 
But we can still think of it as an L-C circuit in which the capacity section is the 
region where we find most of the electric field and the inductance section is 
that region where we find most of the magnetic field. 

If we want to make the frequency of the resonator in Fig. 23— 1 6(c) still higher, 
we can do so by continuing to decrease the inductance L. To do that, we must 
decrease the geometric dimensions of the inductance section, for example by 
decreasing the dimension h in the drawing. As h is decreased, the resonant fre- 
quency will be increased Eventually, of course, we will get to the situation in 
which the height h is just equal to the separation between the condenser plates 
We then have just a cylindrical can, our resonant circuit has become the cavity 
resonator of Fig. 23-7. 

You will notice that in the original L-C resonant circuit of Fig. 23-16 the 
electric and magnetic fields are quite separate. As we have gradually modified the 
resonant system to make higher and higher frequencies, the magnetic field has been 
brought closer and closer to the electric field until in the cavity resonator the two 
are quite intermixed 

Although the cavity resonators we have talked about in this chapter have been 
cylindrical cans, there is nothing magic about the cylindrical shape A can of any 
shape will have resonant frequencies corresponding to various possible modes of 
oscillations of the electric and magnetic fields For example, the “cavity” shown 
in Fig 23-17 will have its own particular set of resonant frequencies — although 
they would be rather difficult to calculate. 



Fig. 23-17. Another resonant cavity. 
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24 


Waveguides 


24-1 The transmission line 

In the last chapter we studied what happened to the lumped elements of circuits 
when they were operated at very high frequencies, and we were led to see that a 
resonant circuit could be replaced by a cavity with the fields resonating inside. 
Another interesting technical problem is the connection of one object to another, 
so that electromagnetic energy can be transmitted between them. In low-frequency 
circuits the connection is made with wires, but this method doesn't work very well 
at high frequencies because the circuits would radiate energy into all the space 
around them, and it is hard to control where the energy will go. The fields spread 
out around the wires; the currents and voltages are not “guided'’ very well by 
the wires. In this chapter we want to look into the ways that objects can be 
interconnected at high frequencies At least, that’s one way of presenting our 
subject. 

Another way is to say that we have been discussing the behavior of waves in 
free space. Now it is time to see what happens when oscillating fields are confined 
in one or more dimensions. We will discover the interesting new phenomenon 
when the fields are confined in only two dimensions and allowed to go free in the 
third dimension, they propagate in waves. These are “guided waves” — the subject 
of this chapter. 

We begin by working out the general theory of the transmission line . The 
ordinary power transmission line that runs from tower to tower over the country- 
side radiates away some of its power, but the power frequencies (50-60 cycles/sec) 
are so low that this loss is not serious. The radiation could be stopped by surround- 
ing the line with a metal pipe, but this method would not be practical for power 
lines because the voltages and currents used would require a very large, expensive, 
and heavy pipe. So simple “open lines” are used. 

For somewhat higher frequencies — say a few kilocycles — radiation can al- 
ready be serious However, it can be reduced by using “twisted-pair” transmission 
lines, as is done for short-run telephone connections. At higher frequencies, how- 
ever, the radiation soon becomes intolerable, either because of power losses or 
because the energy appears in other circuits where it isn’t wanted For frequencies 
from a few kilocycles to some hundreds of megacycles, electromagnetic signals 
and power are usually transmitted via coaxial lines consisting of a wire inside a 
cylindrical “outer conductor” or “shield ” Although the following treatment will 
apply to a transmission line of two parallel conductors of any shape, we will carry 
it out referring to a coaxial line. 

We take the simplest coaxial line that has a central conductor, which we sup- 
pose is a thin hollow cylinder, and an outer conductor which is another thin 
cylinder on the same axis as the inner conductor, as in Fig. 24-1 We begin by 
figuring out approximately how the line behaves at relatively low frequencies 
We have already described some of the low-frequency behavior when we said 
earlier that two such conductors had a certain amount of inductance per unit 
length or a certain capacity per unit length. We can, in fact, describe the low- 
frequency behavior of any transmission line by giving its inductance per unit 
length, L {) and its capacity per unit length, C 0 . Then we can analyze the line as 
the limiting case of the L-C filter as discussed in Section 22-6. We can make a 
filter which imitates the line by taking small series elements L 0 A* and small 
shunt capacities C 0 Ax, where Ax is an element of length of the line. Using our 
results for the infinite filter, we see that there would be a propagation of electric 
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Fig. 24-1 . A coaxial transmission line. 
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signals along the line. Rather than following that approach, however, we would 
now rather look at the line from the point of view of a differential equation. 

Suppose that we see what happens at two neighboring points along the 
transmission line, say at the distances x and x + Ax from the beginning of the 
line. Let’s call the voltage difference between the two conductors F(x), and the 
current along the “hot” conductor 7(x) (see Fig. 24-2). If the current in the line 
is varying, the inductance will give us a voltage drop across the small section of 
line from x to x + Ax in the amount 

AV = V(x + Ax) - V(x) = -L 0 Axj { - 
Or, taking the limit as Ax — * 0, we get 


I(x) I(x + Ax) 


/; 

VI*), 
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Fig. 24-2. The currents and voltages 

of a transmission line. 
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The changing current gives a gradient of the voltage. 

Referring again to the figure, if the voltage at x is changing, there must be 
some charge supplied to the capacity in that region. If we take the small piece of 
line between x and x + Ax, the charge on it is q = C 0 AxK The time rate-of- 
change of this charge is C 0 Ax dV /dt, but the charge changes only if the current 
/(x) into the element is different from the current /(x + Ax) out. Calling the differ- 
ence A/, we have 

at n a dV 
A/ — — C 0 Ax — * 

dt 


Taking the limit as Ax — » 0, we get 


dl _ dV 
dx " C ° dt * 


(24.2) 


So the conservation of charge implies that the gradient of the current is propor- 
tional to the time rate-of-change of the voltage. 

Equations (24.1) and (24.2) are then the basic equations of a transmission 
line. If we wish, we could modify them to include the effects of resistance in the 
conductors or of leakage of charge through the insulation between the conductors, 
but for our present discussion we will just stay with the simple example. 

The two transmission line equations can be combined by differentiating one 
with respect to t and the other with respect to x and eliminating either V or /. 
Then we have either 


d 2 v_ d 2 V 

dx 2 C ° L ° dt 2 
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or 


d 2 I 

dx 2 
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Once more we recognize the wave equation in x. For a uniform transmission 
line, the voltage (and current) propagates along the line as a wave. The voltage 
along the line must be of the form V(x, t) = / (x — vt) or K(x, t) = g(x + 
or a sum of both. Now what is the velocity vl We know that the coefficient of 
the d 2 /dt 2 term is just l/v 2 , so 


= 1 

Vl 0 Q) 


(24.5) 


We will leave it for you to show that the voltage for each wave in a line is 
proportional to the current of that wave and that the constant of proportionality 
is just the characteristic impedance z 0 . Calling V + and /+ the voltage and current 
for a wave going in the plus x-direction, you should get 
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V+ = ZoI+. 


(24.6) 



Similary, for the wave going toward minus x the relation is 


- -z 0 /_. 


The characteristic impedance — as we found out from our filter equations — is 
given by 





(24.7) 


and is, therefore, a pure resistance. 

To find the propagation speed v and the characteristic impedance z 0 of a 
transmission line, we have to know the inductance and capacity per unit length. 
We can calculate them easily for a coaxial cable, so we will see how that goes. For 
the inductance we follow the ideas of Section 17-8, and set \Ll 2 equal to the mag- 
netic energy which we get by integrating e 0 c 2 B 2 /2 over the volume. Suppose 
that the central conductor carries the current /; then we know that B — I/2irt {) c 2 r , 
where r is the distance from the axis. Taking as a volume element a cylindrical 
shell of thickness dr and of length /, we have for the magnetic energy 



/ lirr dr , 


where a and b are the radii of the inner and outer conductors, respectively, 
ing out the integral, we get 

j, /2/ . b 

U = 5- In - • 

47 re 0 c 2 a 

Setting the energy equal to \LI 2 , we find 


Carry- 
(24 8) 


L = 


l 
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It is, as it should be, proportional to the length / of the line, so the inductance per 
unit length L 0 is 


= In (b/a) 
h re 0 c 2 


(24.10) 


We have worked out the charge on a cylindrical condenser (see Section 12-2) 
Now, dividing the charge by the potential difference, we get 


2re 0 I 
In (b/a) * 


The capacity per unit length C 0 is C/L Combining this result with Eq. (24.10), 
we see that the product L 0 C 0 is just equal to 1/c 2 . so v = l/V^o^o is equal 
to c. The wave travels down the line with the speed of light. We point out that this 
result depends on our assumptions: (a) that there are no dielectrics or magnetic 
materials in the space between the conductors, and (b) that the currents are all on 
the surfaces of the conductors (as they would be for perfect conductors). We will 
see later that for good conductors at high frequencies, all currents distribute 
themselves on the surfaces as they would for a perfect conductor, so this assump- 
tion is then valid. 

Now it is interesting that so long as assumptions (a) and (b) are correct, the 
product L 0 C 0 is equal to 1/c 2 for any parallel pair of conductors — even, say, for a 
hexagonal inner conductor anywhere inside an elliptical outer conductor. So long 
as the cross section is constant and the space between has no material, waves are 
propagated at the velocity of light. 

No such general statement can be made about the characteristic impedance. 
For the coaxial line, it is 


= In (b/a) 
lireoc 


(24.11) 
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Fig. 24-3. Coordinates chosen for 
the rectangular waveguide. 
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Fig. 24-4. The electric field in the 
waveguide at some value of z. 
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Fig. 24—5. The z-dependence of the 
field in the waveguide. 


The factor l/e 0 c has the dimensions of a resistance and is equal to 1207T ohms. 
The geometric factor In ( b/a ) depends only logarithmically on the dimensions, so 
for the coaxial line — and most lines — the characteristic impedance has typical 
values of from 50 ohms or so to a few hundred ohms. 


24-2 The rectangular waveguide 

The next thing we want to talk about seems, at first sight, to be a striking 
phenomenon: if the central conductor is removed from the coaxial line, it can still 
carry electromagnetic power. In other words, at high enough frequencies a hollow 
tube will work just as well as one with wires. It is related to the mysterious way in 
which a resonant circuit of a condenser and inductance gets replaced by nothing 
but a can at high frequencies. 

Although it may seem to be a remarkable thing when one has been thinking 
in terms of a transmission line as a distributed inductance and capacity, we all 
know that electromagnetic waves can travel along inside a hollow metal pipe. 
If the pipe is straight, we can see through it! So certainly electromagnetic waves 
go through a pipe. But we also know that it is not possible to transmit low-fre- 
quency waves (power or telephone) through the inside of a single metal pipe. So 
it must be that electromagnetic waves will go through if their wavelength is short 
enough. Therefore we want to discuss the limiting case of the longest wavelength 
(or the lowest frequency) that can get through a pipe of a given size. Since the 
pipe is then being used to carry waves, it is called a waveguide. 

We will begin with a rectangular pipe, because it is the simplest case to 
analyze. We will first give a mathematical treatment and come back later to look 
at the problem in a much more elementary way. The more elementary approach, 
however, can be applied easily only to a rectangular guide. The basic phenomena 
are the same for a general guide of arbitrary shape, so the mathematical argument 
is fundamentally more sound. 

Our problem, then, is to find what kind of waves can exist inside a rectangular 
pipe. Let’s first choose some convenient coordinates; we take the z-axis along the 
length of the pipe, and the x- and j^-axes parallel to the two sides, as shown in 
Fig. 24-3. 

We know that when light waves go down the pipe, they have a transverse 
electric field; so suppose we look first for solutions in which E is perpendicular to 
z, say with only a jr-component, E u . This electric field will have some variation 
across the guide; in fact, it must go to zero at the sides parallel to the j-axis, because 
the currents and charges in a conductor always adjust themselves so that there is 
no tangential component of the electric field at the surface of a conductor. So 
E y will vary with x in some arch, as shown in Fig. 24-4. Perhaps it is the Bessel 
function we found for a cavity? No, because the Bessel function has to do with 
cylindrical geometries. For a rectangular geometry, waves are usually simple 
harmonic functions, so we should try something like sin k x x. 

Since we want waves that propagate down the guide, we expect the field to 
alternate between positive and negative values as we go along in z, as in Fig. 24-5, 
and these oscillations will travel along the guide with some velocity />. If we have 
oscillations at some definite frequency to, we would guess that the wave might vary 
with z like cos(tu/ — k z z ), or to use the more convenient mathematical form, 
like e ' l{<jit ~ k z z \ This z-dependence represents a wave travelling with the speed 
v = o)/k z (see Chapter 29, Vol. I). 

So we might guess that the wave in the guide would have the following 
mathematical form: 

E y = E 0 sin (24.12) 

Let’s see whether this guess satisfies the correct field equations. First, the 
electric field should have no tangential components at the conductors. Our field 
satisfies this requirement; it is perpendicular to the top and bottom faces and is 
zero at the two side faces. Well, it is if we choose k x so that one-half a cycle of 
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sin /: r jv just fits in the width of the guide — that is, if 


k x a = 7 r. (24.13) 

There are other possibilities, like k x a = 27t, 3t, . . . , or, in general, 


k x a = «7r, 


(24 14) 


where n is any integer. These represent various complicated arrangements of the 
field, but for now let’s take only the simplest one, where k x = ir/a, where a is 
the width of the inside of the guide. 

Next, the divergence of E must be zero in the free space inside the guide, 
since there are no charges there. Our E has only a ^-component, and it doesn’t 
change with y, so we do have that V • E = 0. 

Finally, our electric field must agree with the rest of Maxwell’s equations in 
the free space inside the guide. That is the same thing as saying that it must 
satisfy the wave equation 


d ’- E » + tJk + tlk _ _L d lly = o 

dx 2 dy 2 dz 2 c 2 dt 2 


(24.15) 


We have to see whether our guess, Eq. (24.12), will work. The second derivative of 
E t/ with respect to ^ is just — k x E y The second derivative with respect to y is 
zero, since nothing depends on v. The second derivative with respect to z is — k 2 E 1f , 
and the second derivative with respect to i is —a) 2 E y . Equation (24.15) then says 
that 

2 

10 f — 0 
K'xE'y I 2 — O. 


Unless E y is zero everywhere (which is not very interesting), this equation is correct 
if 


k x + k 


2 

z 



(24.16) 


We have already fixed k x , so this equation tells us that there can be waves of the 
type we have assumed if k z is related to the frequency a> so that Eq. (24 16) is 
satisfied — in other words, if 

k z = vV/c 2 ) ~ (ir 2 /a 2 ). (24.17) 

The waves we have described are propagated in the z-direction with this value of k z 
The wave number k z we get from Eq. (24.17) tells us, for a given frequency w, 
the speed with which the nodes of the wave propagate down the guide. The 
phase velocity is 

V = (24.18) 

K z 

You will remember that the wavelength X of a travelling wave is given by 
X = 2 7n>/a>, so k z is also equal to 2ir/\ g , where \ g is the wavelength of the oscilla- 
tions along the z-direction — the “guide wavelength.” The wavelength in the guide 
is different ,of course, from the free-space wavelength of electromagnetic waves 
of the same frequency. If we call the free-space wavelength X 0 , which is equal to 
27 tc/co, we can write Eq. (24.17) as 


= 


Xo 

Vl - (Xo/2 a) 2 


(24.19) 



Fig. 24-6. The magnetic field in the 
waveguide. 


Besides the electric fields there are magnetic fields that will travel with the 
wave, but we will not bother to work out an expression for them right now. Since 
c 2 V X B = dE/dt , the lines of B will circulate around the regions in which 
dE/dt is largest, that is, halfway between the maximum and minimum of E. The 
loops of B will lie parallel to the xz-plane and between the crests and troughs of 
£, as shown in Fig. 24-6. 
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24-3 The cutoff frequency 

In solving Eq. (24.16) for k z , there should really be two roots — one plus and 
one minus. We should write 

k z — y/ (c o 2 /c 2 )" — " (ir 2 /a 2 ). (24.20) 

The two signs simply mean that there can be waves which propagate with a nega- 
tive phase velocity (toward — z), as well as waves which propagate in the positive 
direction in the guide. Naturally, it should be possible for waves to go in either 
direction. Since both types of waves can be present at the same time, there will be 
the possibility of standing-wave solutions. 

Our equation for k z also tells us that higher frequencies give larger values of 
k z , and therefore smaller wavelengths, until in the limit of large co, k becomes 
equal to co/c, which is the value we would expect for waves in free space. The 
light we “see” through a pipe still travels at the speed c. But now notice that if we 
go toward low frequencies, something strange happens. At first the wavelength 
gets longer and longer, but if co gets too small the quantity inside the square root 
of Eq. (24.20) suddenly becomes negative. This will happen as soon as co gets to 
be less than ire /a — or when X 0 becomes greater than 2a. In other words, when 
the frequency gets smaller than a certain critical frequency co r = ire /a, the wave 
number k z (and also \ ff ) becomes imaginary and we haven’t got a solution any 
more. Or do we? Who said that k 2 has to be real? What if it does come out 
imaginary? Our field equations are still satisfied. Perhaps an imaginary k z also 
represents a wave. 

Suppose co is less than co c ; then we can write 

k z = =*=/*', (24.21) 


where k f is a positive real number: 

k f = V(t r 2 /a*) - (a> 2 /c 2 ). (24.22) 

If we now go back to our expression, Eq. (24.12), for E y , we have 



7T 7 r 

Fig. 24-7. The variation of E y with 

z for CO « COc. 


E v = E 0 smk x xe l(wtrik ' z) , (24.23) 

which we can write as 

Ey = Eq sin k x xe ±h ' * e 1 * 1 . (24.24) 

This expression gives an Zs-field that oscillates with time as e lwt but which 
varies with z as e ±k>z . It decreases or increases with z smoothly as a real exponent- 
ial In our derivation we didn’t worry about the sources that started the waves, 
but there must, of course, be a source someplace in the guide. The sign that goes 
with k r must be the one that makes the field decrease with increasing distance 
from the source of the waves. 

So for frequencies below co c = 7rc/a, waves do not propagate down the guide; 
the oscillating fields penetrate into the guide only a distance of the order of l/k f . 
For this reason, the frequency co c is called the “cutoff frequency” of the guide. 
Looking at Eq. (24 22), we see that for frequencies just a little below co f , the num- 
ber k f is small and the fields can penetrate a long distance into the guide. But if 
co is much less than co c , the exponential coefficient k f is equal to 7 r/a and the field 
dies off extremely rapidly, as shown in Fig. 24-7. The field decreases by l/e in the 
distance a/ it, or in only about one-third of the guide width. The fields penetrate 
very little distance from the source. 

We want to emphasize an interesting feature of our analysis of the guided 
waves — the appearance of the imaginary wave number k z . Normally, if we solve 
an equation in physics and get an imaginary number, it doesn’t mean anything 
physical. For waves, however, an imaginary wave number does mean something. 
The wave equation is still satisfied; it only means that the solution gives expo- 
nentially decreasing fields instead of propagating waves. So in any wave problem 
where k becomes imaginary for some frequency, it means that the form of the wave 
changes — the sine wave changes into an exponential. 
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24-4 The speed of the guided waves 


The wave velocity we have used above is the phase velocity, which is the speed 
of a node of the wave; it is a function of frequency. If we combine Eqs. (24.17) 
and (24.18), we can write 


^phase 


C 

Vl — (to c /w ) 2 


(24.25) 


For frequencies above cutoff — where travelling waves exist — co c /co is less than one, 
and / phase is real and greater than the speed of light. We have already seen in 
Chapter 48 of Vol. I that phase velocities greater than light are possible, because it 
is just the nodes of the wave which are moving and not energy or information. In 
order to know how fast signals will travel, we have to calculate the speed of pulses 
or modulations made by the interference of a wave of one frequency with one or 
more waves of slightly different frequencies (see Chapter 48. Vol. I). We have 
called the speed of the envelope of such a group of waves the group velocity; it is 
not co/k but dio/dk : 


_ do) 
VgIour> ~ dk 


(24.26) 


Taking the derivative of Eq. (24.17) with respect to to and inverting to get dej/dk, 
we find that 

Voup = cVl — (co c /co) 2 , (24.27) 

which is less than the speed of light. 

The geometric mean of v p h aBC and r group is just c, the speed of light: 

^phase ^group = £ • (24.28) 

This is curious, because we have seen a similar relation in quantum mechanics. 
For a particle with any velocity — even relativistic — the momentum p and energy 
U are related by 

U 2 = p 2 c 2 + m 2 c A . (24.29) 

But in quantum mechanics the energy is ftco, and the momentum is h/K which is 
equal to hk\ so Eq. (24.29) can be written 

co 2 _ j 2 , m 2 c 2 
~ k + 

or 

k = \/(c o 2 /c 2 ) - (m 2 c 2 /h 2 ), 

which looks very much like Eq. (24.17) . . . Interesting! 

The group velocity of the waves is also the speed at which energy is transported 
along the guide. If we want to find the energy flow down the guide, we can get it 
from the energy density times the group velocity. If the root mean square electric 
field is E 0 , then the average density of electric energy is e 0 £o/2. There is also 
some energy associated with the magnetic field. We will not prove it here, but in 
any cavity or guide the magnetic and electric energies are equal, so the total 
electromagnetic energy density is e 0 £o* The power dU/dt transmitted by the guide 
is then 

^ = e 0 E 2 0 abv groa p . (24.32) 

(We will see later another, more general way of getting the energy flow.) 


(24.30) 

(24.31) 


24-5 Observing guided waves 

Energy can be coupled into a waveguide by some kind of an “antenna.” For 
example, a little vertical wire or “stub” will do. The presence of the guided waves 
can be observed by picking up some of the electromagnetic energy with a little 
receiving “antenna,” which again can be a little stub of wire or a small loop. 
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Fig. 24-8. A waveguide with a driv- 
ng stub and a pickup probe. 



In Fig. 24-8, we show a guide with some cutaways to show a driving stub and a 
pickup “probe 11 . The driving stub can be connected to a signal generator via a 
coaxial cable, and the pickup probe can be connected by a similar cable to a 
detector. It is usually convenient to insert the pickup probe via a long thin slot 
in the guide, as shown in Fig. 24-8. Then the probe can be moved back and forth 
along the guide to sample the fields at various positions. 

If the signal generator is set at some frequency w greater than the cutoff 
frequency <o r , there will be waves propagated down the guide from the driving 
stub. These will be the only waves present if the guide is infinitely long, which 
can effectively be arranged by terminating the guide with a carefully designed 
absorber in such a way that there are no reflections from the far end. Then, since 
the detector measures the time average of the fields near the probe, it will pick 
up a signal which is independent of the position along the guide; its output will 
be proportional to the power being transmitted. 

If now the far end of the guide is finished off in some way that produces a 
reflected wave — as an extreme example, if we closed it off with a metal plate — there 
will be a reflected wave in addition to the original forward wave. These two waves 
will interfere and produce a standing wave in the guide similar to the standing 
waves on a string which we discussed in Chapter 49 of Vol. I. Then, as the pickup 
probe is moved along the line, the detector reading will rise and fall periodically, 
showing a maximum in the fields at each loop of the standing wave and a minimum 
at each node The distance between two successive nodes (or loops) is just X„/2. 
This gives a convenient way of measuring the guide wavelength. If the frequency 
is now moved closer to co 6 , the distances between nodes increase, showing that the 
guide wavelength increases as predicted by Eq. (24.19). 

Suppose now the signal generator is set at a frequency just a little below 
Then the detector output will decrease gradually as the pickup probe is moved 
down the guide If the frequency is set somewhat lower, the field strength will 
fall rapidly, following the curve of Fig. 24-7. and showing that waves are not 
propagated. 


24-6 Waveguide plumbing 

An important practical use of waveguides is for the transmission of high- 
frequency power, as, for example, in coupling the high-frequency oscillator or 
output amplifier of a radar set to an antenna. In fact, the antenna itself usually 
consists of a parabolic reflector fed at its focus by a waveguide flared out at the 
end to make a “horn” that radiates the waves coming along the guide. Although 
high frequencies can be transmitted along a coaxial cable, a waveguide is better 
for transmitting large amounts of power. First, the maximum power that can be 
transmitted along a line is limited by the breakdown of the insulation (solid or gas) 
between the conductors. For a given amount of power, the field strengths in a 
guide are usually less than they are in a coaxial cable, so higher powers can be 
transmitted before breakdown occurs. Second, the power losses in the coaxial cable 
are usually greater than in a waveguide. In a coaxial cable there must be insulating 
material to support the central conductor, and there is an energy loss in this 
material — particularly at high frequencies. Also, the current densities on the 
central conductor are quite high, and since the losses go as the square of the current 
density, the lower currents that appear on the walls of the guide result in lower 
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Fig. 24-9. Sections of waveguide connected 
with flanges. 
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Fig. 24—10. A low-loss connection between 
two sections of waveguide. 


energy losses. To keep these losses to a minimum, the inner surfaces of the guide 
are often plated with a material of high conductivity, such as silver. 

The problem of connecting a “circuit” with waveguides is quite different 
from the corresponding circuit problem at low frequencies, and is usually called 
microwave “plumbing.” Many special devices have been developed for the pur- 
pose. For instance, two sections of waveguide are usually connected together by 
means of flanges, as can be seen in Fig. 24-9. Such connections can, however, 
cause serious energy losses, because the surface currents must flow across the joint 
which may have a relatively high resistance. One way to avoid such losses is to 
make the flanges as shown in the cross section drawn in Fig. 24-10. A small space 
is left between the adjacent sections of the guide, and a groove is cut in the face of 
one of the flanges to make a small cavity of the type shown in Fig. 23-1 6(c) The 
dimensions are chosen so that this cavity is resonant at the frequency being used 
This resonant cavity presents a high “impedance” to the currents, so relatively 
little current flows across the metallic joints (at a in Fig. 24-10). The high guide 
currents simply charge and discharge the “capacity” of the gap (at b in the figure), 
where there is little dissipation of energy. 

Suppose you want to stop a waveguide in a way that won’t result in reflected 
waves. Then you must put something at the end that imitates an infinite length of 
guide. You need a “termination” which acts for the guide like the characteristic 
impedance does for a transmission line — something that absorbs the arriving waves 
without making reflections. Then the guide will act as though it went on forever 
Such terminations are made by putting inside the guide some wedges of resistance 
material carefully designed to absorb the wave energy while generating almost 
no reflected waves. 

If you want to connect three things together — for instance, one source to 
two different antennas — then you can use a “T” like the one shown in Fig 24-1 1. 
Power fed in at the center section of the “T" will be split and go out the two side 
arms (and there may also be some reflected waves). You can see qualitatively from 
the sketches in Fig. 24-12 that the fields would spread out when they get to the 
end of the input section and make electric fields that will start waves going out the 
two arms. Depending on whether electric fields in the guide are parallel or per- 
pendicular to the “top” of the “T,” the fields at the junction would be roughly 
as shown in (a) or (b) of Fig. 24-12. 

Finally, we would like to describe a device called an “unidirectional coupler," 
which is very useful for telling what is going on after you have connected a compli- 
cated arrangement of waveguides. Suppose you want to know which way the 
waves are going in a particular section of guide — you might be wondering, for 
instance, whether or not there is a strong reflected wave. The unidirectional 
coupler takes out a small fraction of the power of a guide if there is a wave going 
one way, but none if the wave is going the other way. By connecting the output 
of the coupler to a detector, you can measure the “one-way” power in the guide 
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Fig. 24-1 1. A waveguide “T.” (The 
flanges have plastic end caps to keep the 
inside clean while the "T” is not being 
used.) 



O 0 


E 

O 

0 


0 

O 


0 0 


lb) 


o o 

O 0 

o o 


Fig. 24-12. The electric fields in a 
waveguide "T" for two possible field 
orientations. 



Figure 24-13 is a drawing of a unidirectional coupler; a piece of waveguide 
AB has another piece of waveguide CD soldered to it along one face. The guide 
CD is curved away so that there is room for the connecting flanges. Before the 
guides are soldered together, two (or more) holes have been drilled in each guide 
(matching each other) so that some of the fields in the main guide AB can be 
coupled into the secondary guide CD. Each of the holes acts like a little antenna 
that produces a wave in the secondary guide. If there were only one hole, waves 
would be sent in both directions and would be the same no matter which way the 
wave was going in the primary guide. But when there are two holes with a separa- 
tion space equal to one-quarter of the guide wavelength, they will make two sources 
90° out of phase Do you remember that we considered in Chapter 29 of Vol I the 
interference of the waves from two antennas spaced X/4 apart and excited 90° 
out of phase in time? We found that the waves subtract in one direction and add 
in the opposite direction The same thing will happen here. The wave produced 
in the guide CD will be going in the same direction as the wave in AB. 

If the wave in the primary guide is travelling from A toward B, there will be 
a wave at the output D of the secondary guide. If the wave m the primary guide 
goes from B toward A , there will be a wave going toward the end C of the secondary 
guide. This end is equipped with a termination, so that this wave is absorbed and 
there is no wave at the output of the coupler 




24-7 Waveguide modes 

The wave we have chosen to analyze is a special solution of the field equations. 
There are many more. Each solution is called a waveguide “mode ” For example, 
our ^-dependence of the field was just one-half a cycle of a sine wave. There is an 
equally good solution with a full cycle, then the variation of E tJ with a; is as shown 
m Fig 24-14 The k r for such a mode is twice as large, so the cutoff frequency is 
much higher. Also, in the wave we studied E has only a y-component, but there 
are other modes with more complicated electric fields. If the electric field has 
components only in a; and y — so that the total electric field is always at i ight 
angles to the z-direction — the mode is called a “transverse electric” (or TE) mode. 
The magnetic field of such modes will always have a z-component. It turns out 
that if E has a component in the z-direction (along the direction of propagation), 
then the magnetic field will always have only transverse components. So such 
fields are called transverse magnetic (TM) modes. For a rectangular guide, all 
the other modes have a higher cutoff frequency than the simple TE mode we have 
described. It is, therefore, possible — and usual — to use a guide with a frequency 
just above the cutoff for this lowest mode but below the cutoff frequency for all 
the others, so that just the one mode is propagated. Otherwise, the behavior gets 
complicated and difficult to control. 


Fig. 24-14. Another possible varia- 24-8 Another way of looking at the guided waves 

tion of E y with x. 

We want now to show you another way of understanding why a waveguide 
attenuates the fields rapidly for frequencies below the cutoff frequency co ( . Then 
you will have a more “physical” idea of why the behavior changes so drastically 
between low and high frequencies We can do this for the rectangular guide by 
analyzing the fields in terms of reflections — or images — in the walls of the guide. 
The approach only works for rectangular guides, however, that's why we started 
with the more mathematical analysis which works, in principle, for guides of any 
shape. 

For the mode we have described, the vertical dimension (in y) had no effect, 
so we can ignore the top and bottom of the guide and imagine that the guide is 
extended indefinitely in the vertical direction. We imagine then that the guide 
just consists of two vertical plates with the separation a. 

Let's say that the source of the fields is a vertical wire placed in the middle of 
the guide, with the wire carrying a current that oscillates at the frequency co. 
In the absence of the guide walls such a wire would radiate cylindrical waves 
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Now we consider that the guide walls are perfect conductors. Then, just as in 
electrostatics, the conditions at the surface will be correct if we add to the field of 
the wire the field of one or more suitable image wires. The image idea works just 
as well for electrodynamics as it does for electrostatics, provided, of course, that 
we also include the retardations. We know that is true because we have often 
seen a mirror producing an image of a light source. And a mirror is just a “perfect” 
conductor for electromagnetic waves with optical frequencies. 

Now let's take a horizontal cross section, as shown in Fig. 24-15, where W i 
and W 2 are the two guide walls and S 0 is the source wire. We call the direction of 
the current in the wire positive. Now if there were only one wall, say W u we could 
remove it if we placed an image source (with opposite polarity) at the position 
marked S x . But with both walls in place there will also be an image of S 0 in the 
wall W 2 , which we show as the image 52- This source, too, will have an image in 
W u which we call S3. Now both S 1 and S3 will have images in W 2 at the positions 
marked *S 4 and S 6 , and so on. For our two plane conductors with the source 
halfway between, the fields are the same as those produced by an infinite line of 
sources, all separated by the distance a. (It is, in fact just what you would see if 
you looked at a wire placed halfway between two parallel mirrors.) For the fields 
to be zero at the walls, the polarity of the currents in the images must alternate 
from one image to the next. In other words, they oscillate 180° out of phase. 
The waveguide field is, then, just the superposition of the fields of such an infinite 
set of line sources. 

We know that if we are close to the sources, the field is very much like the 
static fields. We considered in Section 7-5 the static field ot a grid of line sources 
and found that it is like the field of a charged plate except for terms that decrease 
exponentially with the distance from the grid. Here the average source strength 
is zero, because the sign alternates from one source to the next. Any fields which 
exist should fall off exponentially with distance. Close to the source, we see the 
field mainly of the nearest source; at large distances, many sources contribute and 
their average effect is zero. So now we see why the waveguide below cutoff fre- 
quency gives an exponentially decreasing field. At low frequencies, in particular, 
the static approximation is good, and it predicts a rapid attenuation of the fields 
with distance. 

Now we are faced with the opposite question: Why are waves propagated 
at all? That is the mysterious part! The reason is that at high frequencies the 
retardation of the fields can introduce additional changes in phase which can cause 
the fields of the out-of-phase sources to add instead of cancelling. In fact, in 
Chapter 29 of Vol. I we have already studied, just for this problem, the fields 
generated by an array of antennas or by an optical grating. There we found that 
when several radio antennas are suitably arranged, they can give an interference 
pattern that has a strong signal in some direction but no signal in another. 

Suppose we go back to Fig. 24-15 and look at the fields which arrive at a 
large distance from the array of image sources. The fields will be strong only in 
certain directions which depend on the frequency — only in those directions for 
which the fields from all the sources add in phase. At a reasonable distance from 
the sources the field propagates in these special directions as plane waves. We have 
sketched such a wave in Fig. 24-16, where the solid lines represent the wave crests 
and the dashed lines represent the troughs. The wave direction will be the one 
for which the difference in the retardation for two neighboring sources to the crest 
of a wave corresponds to one-half a period of oscillation. In other words, the 
difference between r 2 and r 0 in the figure is one-half of the free-space wavelength: 


The angle 8 is then given by 

sin 0 = £2 . (24.33) 

2 a 

There is, of course, another set of waves travelling downward at the symmetric 
angle with respect to the array of sources. The complete waveguide field (not too 
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S 6 - 

Fig. 24—1 5. The line source So be- 
tween the conducting plane walls W] 
and W 2 . The walls can be replaced by 
the infinite sequence of image sources. 



Fig. 24-16. One set of coherent 
waves from an array of line sources. 




Fig. 24—17. The waveguide field can 
be viewed as the superposition of two 
trains of plane waves. 


close to the source) is the superposition of these two sets of waves, as shown in 
Fig, 24-17, The actual fields are really like this, of course, only between the two 
walls of the waveguide. 

At points like A and C, the crests of the two wave patterns coincide, and the 
field will have a maximum, at points like B , both waves have their peak negative 
value, and the field has its minimum (largest negative) value. As time goes on 
the field in the guide appears to be travelling along the guide with a wavelength 
which is the distance from A to C. That distance is related to 6 by 


cos 6 = 


Using Eq. (24.33) for 0 , we get that 


Xp _ Xo 

cos B Vl - (X 0 /2a) 2 5 


(24 34) 


(24.35) 


which is just what we found in Eq (24 19) 

Now we see why there is only wave propagation above the cutoff frequency 
u>n If the free-space wavelength is longer than 2a , there is no angle where the waves 
shown in Fig 24-16 can appear. The necessary constructive interference appears 
suddenly when \ () drops below 2a, or when cu goes above co () ^ ttc/u. 

If the frequency is high enough, there can be two or more possible directions 
in which the waves will appear. For our case, this will happen if X (} < In 
general, however, it could also happen when X 0 < a. These additional waves 
correspond to the higher guide modes we have mentioned. 

It has also been made evident by our analysis why the phase velocity of the 
guided waves is greater than c and why this velocity depends on co As co is changed, 
the angle of the free waves of Fig. 24-16 changes, and therefore so does the velocity 
along the guide. 

Although we have described the guided wave as the superposition of the fields 
of an infinite array of line sources, you can see that we would arrive at the same 
result if we imagined two sets of free-space waves being continually reflected back 
and forth between two perfect mirrors — remembering that a reflection means a 
reversal of phase. These sets of reflecting waves would all cancel each other unless 
they were going at just the angle 6 given m Eq. (24 33) There are many ways of 
looking at the same thing. 
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25 


Electrodynamics in Itelatieistic Notation 


25-1 Four- vectors 

We now discuss the application of the special theory of relativity to electro- 
dynamics. Since we have already studied the special theory of relativity in Chapters 
15 through 17 of Vol. I, we will just review quickly the basic ideas. 

It is found experimentally that the laws of physics are unchanged if we move 
with uniform velocity. You can’t tell if you are inside a spaceship moving with 
uniform velocity in a straight line, unless you look outside the spaceship, or at 
least make an observation having to do with the world outside. Any true law of 
physics we write down must be arranged so that this fact of nature is built in. 

The relationship between the space and time of two systems of coordinates, 
one. S', in uniform motion in the jc-direction with speed r relative to the other, S, 
is given by the Loren tz transformation: 

,/ _ * - vx 

, x — rt 
x = — - •> 

\/~l — v 2 

The laws of physics must be such that after a Lorentz transformation, the new 
form of the laws looks just like the old form. This is just like the principle that 
the laws of physics don’t depend on the orientation of our coordinate system. In 
Chapter 1 1 of Vol I, we saw that the way to describe mathematically the invariance 
of physics with respect to rotations was to write our equations in terms of vectors . 

For example, if we have two vectors 

A = (4 n A, h A z ) and B = ( B r , B z ), 

we found that the combination 

A B - A t B x + A V B„ + A Z B Z 

was not changed if we transformed to a rotated coordinate system. So we know 
that if we have a scalar product like A ■ B on both sides of an equation, the equation 
will have exactly the same form in all rotated coordinate systems. We also dis- 
covered an operator (see Chapter 2), 


y = y. 


z' = z. 


(25.1) 



which, when applied to a scalar function, gave three quantities which transform 
just like a vector With this operator we defined the gradient, and in combination 
with other veclois, the divergence and the Laplacian. Finally we discovered that 
by taking sums of certain products of pairs of the components of two vectors we 
could get three new quantities which behaved like a new vector. We called it the 
cross product of two vectors Using the cross product with our operator V we then 
defined the curl of a vector 

Since we will be referring back to what we have done in vector analysis, we 
have put in Table 25-1 a summary of all the important vector operations in 
three dimensions that we have used in the past. The point is that it must be possible 
to write the equations of physics so that both sides transform the same way under 


25-1 Four- vectors 

25-2 The scalar product 

25-3 The four-dimensional gradient 

25-4 Electrodynamics in 

four-dimensional notation 

25-5 The four-potential of a 
moving charge 
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Table 25-1 

The important quantities and operations 
of vector analysis in three dimensions 


Definition of a 
vector 

Scalar product 

Differential vector 
; operator 

Gradient 
Divergence 
Laplacian 
Cross product 
| Curl 


A. — ( A x , Ay, A z) 

A B 

V 

Vcp 

V A 

V • V = V 2 
A X B 

V x A 


rotations. If one side is a vector, the other side must also be a vector, and both 
sides will change together m exactly the same way if we rotate our coordinate sys- 
tem Similarly, if one side is a scalar, the other side must also be a scalar, so that 
neither side changes when we rotate coordinates, and so on. 

Now in the case of special relativity, time and space are inextricably mixed, 
and we must do the analogous things for four dimensions We want our equations 
to remain the same not only for rotations, but also for any inertial frame. That 
means that our equations should be invariant under the Lorentz transformation 
of equations (25.1). The purpose of this chapter is to show you how that can be 
done. Before we get started, however, we want to do something that makes our 
work a lot easier (and saves some confusion) And that is to choose our units of 
length and time so that the speed of light c is equal to 1 You can think of it as 
taking our unit of time to be the tune that it takes light to go one meter (which is 
about 3 X 10” ,J sec) We can even call this time unit “one meter/’ Using this 
unit, all of our equations will show more clearly the space-time symmetry Also, 
all the c’s will disappear from our relativistic equations. (If this bothers you, 
you can always put the c’s back into any equation by replacing every / by ct, or, in 
general, by sticking in a c wherever it is needed to make the dimensions of the 
equations come out right.) With this groundwork we are ready to begin Our 
program is to do in the four dimensions of space-time all of the things we did with 
vectors for three dimensions. It is really quite a simple game, we just work by 
analogy The only real complications is the notation (we’ve already used up the 
vector symbol for three dimensions) and one slight twist of signs 

First, by analogy with vectors in three dimensions, we define a four-vector as 
a set of the four quantities a t , ci x , cr v , and a Zs which transform like /, x, v, and z when 
we change to a moving coordinate system. There are several different notations 
people use for a four-vector; we will write by which we mean the group of four 
numbers ( a u a x , a lh a z ) — in other words, the subscript //can take on the four 
“values” /, x, y, z It will also be convenient, at times to indicate the three space 
components by a three-vector, like this: = (a f , a) 

We have already encountered one four-vector, which consists of the energy 
and momentum of a particle (Chapter 17, Vol. I). In our new notation we write 

Pn = (E,p\ (25.2) 

which means that the four-vector is made up of the energy E and the three 
components of the three-vector p of a particle. 

It looks as though the game is really very simple — for each three-vector in 
physics all we have to do is find what the remaining component should be, and we 
have a four-vector To see that this is not the case, consider the velocity vector 
with components 

_ dx __ dy _ dz 

V * ~ dt ’ ~ It' Vz= dt ‘ 


The question is: What is the time component? Instinct should give the right 
answer. Since four-vectors are like /, x, y, z, we would guess that the time com- 
ponent is 


This is wrong The reason is that the / in each denominator is not an invariant 
when we make a Lorentz transformation The numerators have the right behavior 
to make a foui -vector, but the dt in the denominator spoils things ; it is unsymmetnc 
and is not the same in two different systems. 

It turns out that the four “velocity” components which we have written down 
will become the components of a four-vector if we just divide by \/T~^~ r 2 , We 
can see that that is true because if we start with the momentum four-vector 


P* = (E>P) = 


"hi # 

\/l — \/l — 


(25 3) 


25-2 



and divide it by the rest mass /7? n , which is an invariant scalar in four dimensions , 
we have 


Pjl = ( L , v 

m i) Vn/1 — v l \fl — v 2 


(24.4) 


which must still be a four- vector. (Dividing by an invariant scalar doesn’t change 
the transformation properties ) So we can define the “ velocity fow -vector” u u by 


a i 


u x 


1 

sf\ — v 2 

*'z s 

\/l — V 2 


Vi - 5 

Vz 

vr^v’ 


(25.5) 


The four-velocity is a useful quantity; we can, foi instance, write 


Pn = jM*. 


(25.6) 


This is the typical sort of form an equation which is relativistically correct must 
have; each side is a four-vector. (The right-hand side is an invariant times a 
four-vector, which is still a four-vector.) 


25-2 The scalar product 

h is an accident of life, if you wish, that under coordinate rotations the 
distance of a point from the origin does not change. This means mathematically 
that r 2 = x 2 + y 2 + z 2 is an invariant In other words, after a rotation 
r f 2 = V, or 

x'- + v' 2 + z' 2 = * 2 + v 2 + Z 2 . 

Now the question is* Is there a similar quantity which is invariant under the 
Lorentz transformation 7 There is. From Eq. (25.1) you can see that 

t' 2 ~ x ,2 = t 2 — x 2 

That is pi city nice, except that it depends on a particular choice of the x-direction 
We can fix that up by subtracting y 2 and z 2 . Then any Lorentz transformation 
plus a rotation will leave the quantity unchanged. So the quantity which is anal- 
agous to r 2 for four dimensions, in three dimensions is 


It is an invariant under what is called the ‘‘complete Lorentz group” — which 
means for transformation of both translations at constant velocity and rotations. 

Now since this invariance is an algebraic matter depending only on the 
transformation rules of Eq (25.1) — plus rotations — it is true for any four-vector 
(by definition they all transform the same). So for a four-vector a M we have that 

,2 /2 /2 /2 2 2 2 2 
a i a z Qy a% a i a x Gy Gz. 

We will call this quantity the square of “the length” of the four-vector (Some- 
times people change the sign of all the terms and call the length a 2 x + a 2 + 
ui — uf, so you’ll have to watch out ) 

Now if we have two vectors a^ and their corresponding components 
transform in the same way, so the combination 

aft ~ a x b r ~ a }i by — a z b z 

is also an invariant (scalar) quantity. (We have in fact already proved this in 
Chapter 17 of Vol. I.) Clearly this expression is quite analogous to the dot product 
for vectors. We will, in fact, call it the dot product or scalar product of two four- 
vectors It would seem logical to write it as so it would look like a dot prod- 

uct But, unhappily, it’s not done that way; it is usually written without the dot. 
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So we will follow the convention and write the dot product simply as So, 

by definition , 

b^i ci ib i ctyby a vb ^ (25.7) 

Whenever you see two identical subscripts together (we will occasionally have 
to use v or some other letter instead of fi) it means that you arc ro take the four 
products and sum, remembering the minus sign for the products of the space 
components. With this convention the invariance of the scalar product under a 
Lorentz transformation can be written as 

a’Jo £ = 

Since the last three terms in (25.7) are just the scalar dot product in three 
dimensions, it is often more convenient to write 

afihp = afit — a b. 

It is also obvious that the four-dimensional length we described above can be 
written as a^: 

u^ciy. = ui — al — a 2 u — a\ = a 2 t — a * a. (25.8) 

It will also be convenient to sometimes write this quantity as 

^ == dj^Cl 

We will now give you an illustration of the usefulness of four-vector dot 
products. Antiprotons (P) are produced in large accelerators by the reaction 

P+P-^P+P+P+P. 

That is, an energetic proton collides with a proton at rest (for example, m a hy- 
drogen target placed in the beam), and if the incident proton has enough energy, 
a proton-antiproton pair may be produced, in addition to the two original protons.* 
The question is: How much energy must be given to the incident proton to make 
this reaction energetically possible 9 

The easiest way to get the answer is to consider what the reaction looks like 
in the center-of-mass (CM) system (see Fig. 25-1). Well call the incident proton 
a and its four-momentum pi Similarly, well call the target proton b and its four- 


Fig. 25-1. The reaction P + P — > 
3P + P viewed in the laboratory and 
CM systems. The incident proton is sup- 
posed to have just barely enough energy 
to make the reaction go. Protons are 
denoted by solid circles; antiprotons, by 
open circles. 



* You may well ask: Why not consider the reactions 

P + P-^P+P + P, 

or even 

P + P -> P + P 

which clearly require less energy? The answer is that a principle called conservation of 
baryons tells us the quantity “number of protons minus number of antiprotons” cannot 
change. This quantity is 2 on the left side of our reaction. Therefore, if we want an 
antiproton on the right side, we must have also three protons (or other baryons). 
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momentum p\. If the incident proton has just barely enough energy to make the 
reaction go. the final state — the situation after the collision — will consist of a 
glob containing three protons and an antiproton at rest in the CM system If 
the incident energy were slightly higher, the final state particles would have some 
kinetic energy and be moving apart; if the incident energy were slightly lower, 
there would not be enough energy to make the four particles 

If we call pi the total four-momentum of the whole glob in the final state, 
conservation of energy and momentum tells us that 

p a + p b = p c , 

and 

E a + E h = E°. 

Combining these two equations, we can write that 

Pi + p\ = Pi- (25.9) 

Now the important thing is that this is an equation among four-vectors, and 
is, therefore, true in any inertial frame. We can use this fact to simplify our 
calculations. We start by taking the “length” of each side of Eq. (25 9); they are, 
of course, also equal. We get 

(. pi + pI)(pI + pi) = pIpI- (25.10) 

Since p%pl is invariant, we can evaluate it in any coordinate system. In the CM 
system, the time component of pi is the rest energy of four protons, namely 4 A/, 
and the space part p is zero; so pi = (4 M, 0). We have used the fact that the 
rest mass of an antiproton equals the rest mass of a proton, and we have called 
this common mass M. 

Thus, Eq. (25.10) becomes 

pIpI + 2 pIpI + pIpI = 16 M 2 . (25.11) 

Now plpl and p\p\ are very easy, since the “length” of the momentum four-vector 
of any particle is just the mass of the particle squared: 

PuP* = E 2 - p 2 = A/ 2 . 

This can be shown by direct calculation or, more cleverly, by noting that for a 
particle at rest p^ = (Af, 0), so p^p^ = M 1 But since it is an invariant, it is equal 
to M 2 in any frame. Using these results in Eq. (25.11), we have 

2plp\ = 14 M 2 
or 

pip': = 1M 2 . (25.12) 

Now we can also evaluate plpl in the laboratory system. The four-vector 
pi can be written (E a .p a \ while p\ = (A/, 0), since it describes a proton at rest. 
Thus, plpl must also be equal to ME a , and since we know the scalar product is 
an invariant this must be numerically the same as what we found in (25.12). So 
we have that 

E a - 7Af, 


which is the result we were after The total energy of the initial proton must be 
at least 1M (about 6.6 Gev since M = 938 Mev) or, subtracting the rest mass A/, 
the kinetic energy must be at least 6M (about 5.6 Gev). The Bevatron accelerator 
at Berkeley was designed to give about 6 2 Gev of kinetic energy to the protons it 
accelerates, in order to be able to make antiprotons 

Since scalar products are invariant, they are always interesting to evaluate. 
What about the “length” of the four-velocity 

2 2 1 V i 

VV — u t — U = J __ v 2 ~ 1 v 2 = ‘ 

Thus, is the unit four-vector 
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25-3 The four-dimensional gradient 

The next thing that we have to discuss is the four-dimensional analog of the 
gradient. We recall (Chapter 14, Vol. I) that the three differential operators 
d/dx, d/dy, d/dz transform like a three-vector and are called the gradient. The 
same scheme ought to work m four dimensions; that is, we might guess that the 
four-dimensional gradient should be ( d/dt , d/dx, d/dy , d/dz). This is wrong. 

To see the error, consider a scalar function </> which depends only on x and t. 
The change in <j > , if we make a small change At in / while holding x constant, is 

H = ~ A/. (25.13) 

at 


On the other hand, according to a moving observer. 


A 4> 


d(f> . d(j> . 
dx' dt f 


We can express Ax' and At ' in terms of At by using Eq (25.1) Remembering 
that we are holding x constant, so that Ax = 0, we write 


Thus, 


Ax' = 


vT^ 


A t\ At ' 


At 


\/ 1 — V 2 


V T-,, 


*I + S' " 


\ \/ 1 — V 2 



d(j) \ At 
dx'/ -^/l" — 


Comparing this result with Eq. (25.13), we learn that 

d<t> _ 1 /d<j> 0<t>\ 

dt \f\ ~~v 2 dx f ) 

A similar calculation gives 

d(}> _ 1 / d<f> d<j>\ 

dx \/ r \~— v- \dx f V dt f ) 


(25.14) 


(25.15) 


Now we can see that the gradient is rather strange. The formulas for x and 1 
in terms of x' and f [obtained by solving Eq. (25 1)] are: 


t f + vx f _ x ' + vf 

vr- v 2 9 


This is the way a four-vector must transform. But Eqs. (25.14) and (25 15) have 
a couple of signs wrong 1 

The answer is that instead of the incorrect (d/dt, V), we must define the four- 
dimensional gradient operator, winch we will call by 



(25.16) 


With this definition, the sign difficulties encountered above go away, and 
behaves as a four-vector should. (It's rather awkward to have those minus signs, 
but that's the way the world is.) Of course, what it means to say that V M “behaves 
like a four-vector” is simply that the four-gradient of a scalar is a four-vector. If 
<t> is a true scalar invariant field (Lorentz invariant) then C M 0 is a four-vector field 
All right, now that we have vectors, gradients, and dot products, the next 
thing is to look for an invariant which is analogous to the divergence of three- 
dimensional vector analysis. Clearly, the analog is to form the expression V^, 
where b w is a four-vector field whose components are functions of space and time. 
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We define the diligence of the four-vector b ^ = (b f ,b) as the dot product of 


and hp : 


V aba 


3t 




(25.17) 


¥t b,+ Vb, 


where V * b is the ordinary three-divergence of the three-vector b. Note that one 
has to be careful with the signs. Some of the minus signs come from the definition 
of the scalar product, Eq. (25.7); the others are required because the space com- 
ponents of are — d/dx, etc , as in Eq. (25.16) The divergence as defined by 
(25 17) is an invariant and gives the same answer in all coordinate systems which 
differ by a Lorentz transformation. 

Let's look at a physical example in which the four-divergence shows up 
We can use it to solve the problem of the fields around a moving wire We have 
already seen (Section 13-7) that the electric charge density p and the current 
density j form a four-vector y M = (p,y). If an uncharged wire carries the current 
y r , in a frame moving past it with velocity v (along a), the wire will have the 
charge and current density [obtained from the Lorentz transformation Eqs. 
(25.1)] as follows; 


p - r - ’ fx . 

\/T- \/i - v 1 

These are just what we found in Chapter 13 We can then use these sources 
in Max well’s equation in the moving system to find the fields. 

The charge conservation law, Section 13-2, also takes on a simple form in 
the four-vector notation. Consider ihe four divergence of / M : 

= f t + V • j . (25.18) 


The law of the conservation of charge says that the outflow of current per unit 
volume must equal the negative rate of increase of charge density. In other words, 
that 


Putting this into Eq. (25.18), the law of conservation of charge takes on the simple 
form 


V,/ M - 0. 


(25.19) 


Since is an invariant scalar, if it is zero in one frame it is zero in all frames. 
We have the result that if charge is conserved in one coordinate system, it is con- 
served in all coordinate systems moving with uniform velocity. 

As our last example we want to consider the scalar product of the gradient 
operator with itself. In three dimensions, such a product gives the Laplacian 


V 2 


V • V 


d 2 
dx 2 


d 2 


+ 


dz 2 ’ 


What do we get in four dimensions 9 That’s easy Following our rules for dot 
products and gradients, we get 


= 


<9 

dt dt 


d 2 
dt* 


(- £)(- I) - (- £)(- I) - (- 1)(- 1) 


This operator, which is the analog of the three-dimensional Laplacian, is called 
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the D' Alembert wn and has a special notation: 

□ 2 = V.v, = ^ - V 2 . (25.20) 

From its definition it is an invariant scalar operator; if it operates on a four-vector 
field, it produces a new four-vector field. (Some people define the D’Alembertian 
with the opposite sign to Eq. (25.20), so you will have to be careful when reading 
the literature.) 

We have now found four-dimensional equivalents of most of the three- 
dimensional quantities we had listed in Table 25-1. (We do not yet have the 
equivalents of the cross product and the curl operation; we won't get to them until 
the next chapter ) It may help you remember how they go if we put all the impor- 
tant definitions and results together in one place, so we have made such a summary 
in Table 25-2. 

Table 25-2 

The important quantities of vector analysis in three and four dimensions. 



Three dimensions 

Vector 

A (A A fJ , A 2 ) 

Scalar pioduct 

A * B — A X B X + A v By + A Z B 

Vector operator 

V - (d/dx, d/dy, d/dz) 

Gradient 

v4 , = ( d lMM) 

\dx dy dz / 

Divergence 

dA 8 A, dA z 

v A = + 

Laplacian and 
D'Alembertian 

v • v = + *1 + ii 

dx 2 ^ dy* dz 2 


Four dimensions 
= (a ti a x ,a u ,a z ) = ( a t ,a ) 


ciixbfi = a t b t — ajb x — ajy, — a z b z = atb ( — a ■ h 
(d/d/, — d/dx , — d/dy , — d/dz) — (d/d/, — V) 



25-4 Electrodynamics in four-dimensional notation 

We have already encountered the D’Alembertian operatoi, without giving it 
that name, in Section 18-6, the differential equations we found there for the po- 
tentials can be written in the new notations as: 

□ 2 0 = — , □ 2 A = J- (25.21) 

£() 

The four quantities on the right-hand side of the two equations in (25.21) are 
p, Ac, Jm Jz> divided by e 0 , which is a universal constant which will be the same 
in all coordinate systems if the same unit of charge is used in all frames. So the four 
quantities p/e 0 , //A 0 , /,,/e 0 , / 2 /e 0 also transform as a four-vector We can write 
them as y M /e () The D’Alembertian doesn’t change when the coordinate system 
is changed, so the quantities <£, A x , A, f , A z must also transform like a four-vector — 
which means that they are the components of a four-vector. In short. 

A - (& A) 

is a four-vector. What we call the scalar and vector potentials are really different 
aspects of the same physical thing. They belong together And if they are kept 
together the relativistic invariance of the world is obvious We call A M the four- 
potential 
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In the four-vector notation Eqs. (25.21) become simply 

□ % = (25.22) 

e 0 

The physics of this equation is just the same as Maxwell's equations. But there is 
some pleasure in being able to rewrite them in an elegant form. The pretty form 
is also meaningful, it shows directly the invariance of electrodynamics under the 
Lorentz transformation. 

Remember that Eqs (25.21) could be deduced from Maxwell’s equations only 
if we imposed the gauge condition 

^ + V • A = 0, (25.23) 

Ot 

which just says = 0; the gauge condition says that the divergence of the 
four-vector A u is zero. This condition is called the Lorentz condition. It is very 
convenient because it is an invariant condition and therefore Maxwell’s equations 
stay in the form of Eq. (25.22) for all frames. 


25-5 The four-potential of a moving charge 

Although it is implicit in what we have already said, let us write down the 
transformation laws which give <j> and A in a moving system in terms of 0 and A 
in a stationary system. Since Ay = (0, A) is a four-vector, the equations must 
look just like Eqs. (25.1), except that t is replaced by 0, and x is replaced by A 
Thus, 

^ <t> — vA x 

0 — — .... » 


A x v<j> 

s 

V 1 — v 2 


Ay — Ay, 


A z . 


(25.24) 


This assumes that the primed coordinate system is moving with speed v in the 
positive ^-direction, as measured in the unpnmed coordinate system. 

We will consider one example of the usefulness of the idea of the four-potential 
What are the vector and scalar potentials of a charge q moving with speed ?j along 
the x-axis 9 The problem is easy in a coordinate system moving with the charge, 
since in this system the charge is standing still. Let’s say that the charge is at the 
origin of the S'-frame, as shown in Fig. 25-2. The scalar potential in the moving 
system is then given by 


0 ' = 


g 

47re 0 r' ’ 


(25.25) 



Fig. 25-2. The frame S' moves with 
velocity v (in the x-direction) wifh respect 
to S. A charge at rest at the origin of S' 
is at x = vt in S. The potentials at P can 
be computed in either frame. 


r' being the distance from q to the field point, as measured m the moving system 
The vector potential A f is, of course, zero. 

Now it is straightforward to find 0 and A , the potentials as measured in the 
stationary coordinates. The inverse relations to Eqs. (25 24) are 


0' + vA' x ^ 

\/T — v 2 


A x 


A x + v<p f 

VT - 5 


A 2 = A’ 


Using the <p' given by Eq. (25.25), and A f — 0, we get 

0 = _ JL 1 

47reo v 2 

^ _q 1 

47r€ 0 


(25.26) 
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This gives us the scalar potential <£ we would see in *S, but, unfortunately, expressed 
in terms of the S' coordinates. We can get things in terms of /, ;c, y, z by substituting 
for x\ y\ and z', using (25.1). We get 


47T60 ^j [(x _ vt)/Vr —, ]2 + y2 


(25.27) 


+ 


Following the same procedure for the components of A , you can show that 


A = v<t>. (25.28) 

These are the same formulas we derived by a different method in Chapter 21. 


25-6 The invariance of the equations of electrodynamics 

We have found that the potentials <t> and A taken together form a four-vector 
which we call A M , and that the wave equations — the full equations which determine 
the A^ in terms of the j M — can be written as in Eq. (25 22). This equation, together 
with the conservation of charge, Eq. (25 19), gives us the fundamental law of the 
electromagnetic field: 

= VJ, = 0. (25.29) 

There, in one tiny space on the page, are all of the Maxwell equations — beautiful 
and simple. Did we learn anything from writing the equations this way, besides 
that they are beautiful and simple? In the first place, is it anything different from 
what we had before when we wrote everything out in all the various components? 
Can we from this equation deduce something that could not be deduced from the 
wave equations for the potentials in terms of the charges and currents? The answer 
is definitely no. The only thing we have been doing is changing the names of things 
— using a new notation. We have written a square symbol to represent the de- 
rivatives, but it still means nothing more nor less than the second derivative with 
respect to /, minus the second derivative with respect to x, minus the second 
derivative with respect to y, minus the second derivative with respect to z. And the 
fji means that we have four equations, one each for p ~ /, x, y, or z. What then is 
the significance of the fact that the equations can be written in this simple form? 
From the point of view of deducing anything directly, it doesn't mean anything. 
Perhaps, though, the simplicity of the equations means that nature also has a 
certain simplicity. 

Let us show you something interesting that we have recently discovered: AH 
of the laws of physics can be contained in one equation . That equation is 

U = 0. (25.30) 

What a simple equation! Of course, it is necessary to know what the symbol 
means. U is a physical quantity which we will call the “unworldliness" of the 
situation. And we have a formula for it. Here is how you calculate the unworld- 
liness. You take all of the known physical laws and write them in a special form. 
For example, suppose you take the law of mechanics, F = ma, and rewrite it as 
F — ma ~ 0 Then you can call (F — ma) — which should, of course, be zero — 
the “mismatch," of mechanics. Next, you take the square of this mismatch and 
call it U i, which can be called the “unworldliness of mechanical effects." In other 
words, you take 

U x = (F - ma) 2 . (25.31) 

Now you write another physical law, say, V • E = p/e 0 and define 

u ‘ 

which you might call “the gaussian unworldliness of electricity." You continue 

to write u 3 . u 4 , and so on — one for every physical law there is 

25-10 



Finally you call the total unworldliness U of the world the sum of the various 
unworldhnesses U* from all the subphenomena that are involved; that is, U = 
ZU t Then the great “law of nature 1 ' is 


U = 0. 


(25 32) 


This “law 11 means, of course, that the sum of the squares of all the individual 
mismatches is zero, and the only way the sum of a lot of squares can be zero is for 
each one of the terms to be zero 

So the “beautifully simple 11 law in Eq. (25.32) is equivalent to the whole series 
of equations that you originally wrote down It is therefore absolutely obvious 
that a simple notation that just hides the complexity in the definitions of symbols 
is not real simplicity. It is just a trick. The beauty that appears in Eq. (25 32) — 
just from the fact that several equations are hidden within it — is no more than a 
trick. When you unwrap the whole thing, you get back where you were before 

However, there is more to the simplicity of the laws of electromagnetism 
written in the form of Eq. (25.29). It means more, just as a theory of vector 
analysis means more. The fact that the electromagnetic equations can be written 
in a very particular notation which was designed for the four-dimensional geometry 
of the Lorentz transformations — in other words, as a vector equation in the four- 
space — means that it is invariant under the Lorentz transformations. It is because 
the Maxwell equations are invariant under those transformations that they can 
be written in a beautiful form. 

It is no accident that the equations of electrodynamics can be written in the 
beautifully elegant form of Eq. (25 29). The theory of relativity was developed 
because it was found experimentally that the phenomena predicted by Maxwell’s 
equations were the same in all inertial systems. And it was precisely by studying 
the transformation properties of Maxwell’s equations that Lorentz discovered 
his transformation as the one which left the equations invariant. 

There is, however, another reason for writing our equations this way. It has 
been discovered — after Einstein guessed that it might be so — that all of the laws 
of physics are invariant under the Lorentz transformation. That is the principle 
of relativity. Therefore, if we invent a notation which shows immediately when a 
law is written down whether it is invariant or not, we can be sure that in trying 
to make new theories we will write only equations which are consistent with the 
principle of relativity. 

The fact that the Maxwell equations are simple in this particular notation is 
not a miracle, because the notation was invented with them in mind. But the 
interesting physical thing is that every law of physics — the propagation of meson 
waves or the behavior of neutrinos in beta decay, and so forth — must have this 
same invariance under the same transformation Then when you are moving at a 
uniform velocity in a spaceship, all of the laws of nature transform together in 
such a way that no new phenomenon will show up. It is because the principle of 
relativity is a fact of nature that in the notation of four-dimensional vectors the 
equations of the world will look simple. 
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Lorentz Transformations of the Fields 


26-1 The four-potential of a moving charge 


We saw in the last chapter that the potential = (</>, A) is a four-vector. 
The time component is the scalar potential </>, and the three space components are 
the vector potential A. We also worked out the potentials of a particle moving with 
uniform speed on a straight line by using the Lorentz transformation (We had 
already found them by another method in Chapter 21.) For a point charge whose 
position at the time t is {vt, 0, 0), the potentials at the point (x, y, z) are 


<t> = 


47re 0 \/l — v 2 


'<* “ P,r ‘ + / + z* 


A x = 


1 - v 2 

qv 


1/2 


47re 0 \/l — v 2 


(x - vty + + z2 


1 - */ 2 


1/2 


Ay A% — 0 


(26.1) 


Equations (26 1) give the potentials at x, y, and z at the time /, for a charge 
whose “present” position (by which we mean the position at the time t) is at x = vt 
Notice that the equations are in terms of (x — vt), y , and z, which are the coordi- 
nates measured from the current position P of the moving charge (see Fig. 26-1) 
The actual influence we know really travels at the speed c, so it is the behavior of 
the charge back at the retarded position P r that really counts, f The point P f is at 
x = vf (where, t' = t — r'/c is the retarded time). But we said that the charge was 
moving with uniform velocity in a straight line, so naturally the behavior at P f and 
the current position are directly related. In fact, if we make the added assumption 
that the potentials depend only upon the position and the velocity at the retarded 
moment, we have in equations (26.1) a complete formula for the potentials for a 
charge moving any way. It works this way. Suppose that you have a charge 
moving in some arbitrary fashion, say with the trajectory in Fig. 26-2, and you 
are trying to find the potentials at the point (x, y, z). First, you find the retarded 
position P f and the velocity v' at that point. Then you imgaine that the charge 
would keep on moving with this velocity during the delay time (/' — /), so that 
it would then appear at an imaginary position P vrin , which we can call the “pro- 
jected position,” and would arrive there with the velocity ?/. (Of course, it doesn't 
do that; its real position at t is at P.) Then the potentials at (x, y, z) are just what 
equations (26 1) would give for the imaginary charge at the projected position 
P proj . What we are saying is that since the potentials depend only on what the 
charge is doing at the retarded time, the potentials will be the same whether the 
charge continued moving at a constant velocity or whether it changed its velocity 
after t ' — that is, after the potentials that were going to appear at (x, y, z) at the 
time t were already determined. 

You know, of course, that the moment that we have the formula for the po- 
tentials from a charge moving in any manner whatsoever, we have the complete 
electrodynamics; we can get the potentials of any charge distribution by super- 


f The primes used here to indicate the retarded positions and times should not be confused 
with the primes referring to a Lorentz-transformed frame in the preceding chapter. 


26-1 The four-potential of a 
moving charge 

26-2 The fields of a point charge 
with a constant velocity 

26-3 Relativistic transformation 
of the fields 

26-4 The equations of motion in 
relativistic notation 


In this chapter : c = 1 


Review: Chapter 20, Vo I. II, Solution 
of Max well v Equation s in 
Free Space 



Fig. 26-1. Finding the fields at P due 
to a charge q moving along the x-axis 
with the constant speed v. The field 
“now” at the point (x, y, z) can be ex- 
pressed in terms of the “present” position 
P, as well as in terms of P f , the “retarded” 
position (at t f = t — r'/c). 
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Cx,y,z) 



Fig. 26-2. A charge moves on an 
arbitrary trajectory. The potentials at 
( x,y,z ) at the time t are determined by 
the position P' and velocity v' at the 
retarded time t ' — r / c. They are con- 
veniently expressed in terms of the co- 
ordinates from the “projected” position 
P pr0[ . (The actual position at t is P.) 


position. Therefore we can summarize all the phenomena of electrodynamics 
either by writing Maxwell’s equations or by the following series of remarks. 
(Remember them in case you are ever on a desert island. From them, all can be 
reconstructed. You will, of course, know the Lorentz transformation; you will 
never forget that on a desert island or anywhere else ) 

First , Ap is a four-vector. Second , the Coulomb potential for a stationary 
charge is q/4irc {) r. Third , the potentials produced by a charge moving in any way 
depend only upon the velocity and position at the retarded time With those 
three facts we have everything From the fact that A y is a four-vector, we transform 
the Coulomb potential, which we know, and get the potentials for a constant 
velocity. Then, by the last statement that potentials depend only upon the past 
velocity at the retarded time, we can use the projected position game to find them. 
It is not a particularly useful way of doing things, but it is interesting to show that 
the laws of physics can be put in so many different ways 

It is sometimes said, by people who are careless, that all of electrodynamics 
can be deduced solely from the Lorentz transformation and Coulomb’s law. Of 
course, that is completely false. First, we have to suppose that there is a scalar 
potential and a vector potential that together make a four-vector That tells us 
how the potentials transform Then why is it that the effects at the retarded 
time are the only things that count? Better yet, why is it that the potentials depend 
only on the position and the velocity and not, for instance, on the acceleration 7 
The fields E and B do depend on the acceleration. If you try to make the same 
kind of an argument with respect to them, you would say that they depend only 
upon the position and velocity at the retarded time But then the fields from an 
accelerating charge would be the same as the fields from a charge at the projected 
position — which is false. The fields depend not only on the position and the velocity 
along the path but also on the acceleration. So there are several additional tacit 
assumptions in this great statement that everything can be deduced from the 
Lorentz transformation (Whenever you see a sweeping statement that a tremen- 
dous amount can come from a very small number of assumptions, you always 
find that it is false. There are usually a large number of implied assumptions that 
are far from obvious if you think about them sufficiently carefully.) 


26-2 The fields of a point charge with a constant velocity 

Now that we have the potentials from a point charge moving at constant 
velocity, we ought to find the fields — for practical reasons There are many cases 
where we have uniformly moving particles — for instance, cosmic rays going through 
a cloud chamber, or even slow-moving electrons m a wire. So let’s at least see 
what the fields actually do look like for any speed — even for speeds nearly that 
of light — assuming only that there is no acceleration. It is an interesting question. 

We get the fields from the potentials by the usual rules* 


First, for E z 


E = 


■ V<f> 


dA 

dt 


B = V X A. 


F d</> BA Z 

z - dz dt 


But A z is zero; so differentiating <j> in equations (26 1), we get 


E z = 


q 


47reo\/l — 


4^4 + / + 

1 — V 2 


Similarly, for E y , 


Ey — 


47re 0 


4^4P + / + ** 

1 — V 2 


' 3/ 2 


3/2 


(26.2) 


(26.3) 


The x-component is a little more work. The derivative of <p is more complicated 
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and A x is not zero. First, 


d<j> 

dx 


q 

47 re 0 \/ 1 — v 2 


(x - Vl )/( 1 -_y 2 )_ 

^ 

1 — V 2 


(26.4) 


Then, differentiating A x with respect to t , we find 

q —v 2 (x — v)t/( 1 — v 2 ) 


dA x 

dt 


47reoVl — v 2 
And finally, taking the sum. 


(x " Vt) 2 ,2,2 

+ y + z 


|3/2 


1 — V 2 


x — vt 


Ai re 0 \/r^ 


<* - ^ 


13/2 


1 — V 2 


(26 5) 


(26.6) 


We’ll look at the physics of E in a minute, let's first find B For the z-compo- 
nent, 

B - Odk __ 

dx dy 

Since A v is zero, we have just one derivative to get. Notice, however, that A r 
is just v<j> , and d/dy of is just ~~vE y . So 


Similarly, 


and 


By ~ dz 


B z 

dA x dA 


I'Ey. 


dX 

B u = -vE z . 


d(j> 

dz 


(26.7) 
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POSITION 


Fig. 26-3. For a charge moving with 
constant speed, the electric field points 
radially from the “present” position of 
the charge. 


Finally, B x is zero, since A y and A z are both zero. We can write the magnetic field 
simply as 

B = v X E (26.9) 

Now let’s see what the fields look like. We will try to draw a picture of the 
field at various positions around the present position of the charge. It is true that 
the influence of the charge comes, in a certain sense, from the retarded position, 
but because the motion is exactly specified, the retarded position is uniquely given 
in terms of the present position For uniform velocities, it’s nicer to relate the 
fields to the current position, because the field components at (x, j\ z) depend 
only on (x — vO, y* and z — which are the components of the displacements 
rp from the present position to (x, y, z) (see Fig. 26-3). 

Consider first a point with z = 0. Then E has only x- and ^-components. 
From Eqs. (26.3) and (26.6), the ratio of these components is just equal to the 
ratio of the x- and ^-components of the displacement. That means that E is in 
the same direction as rp, as shown in Fig. 26-3. Since E z is also proportional to z, 
it is clear that this result holds in three dimensions. In short, the electric field is 
radial from the charge, and the field lines radiate directly out of the charge, just 
as they do for a stationary charge. Of course, the field isn't exactly the same as 
for the stationary charge, because of all the extra factors of (1 — v 2 ) But we 
can show something rather interesting. The difference is just what you would get 
if you were to draw the Coulomb field with a peculiar set of coordinates in which 
the scale of x was squashed up by the factor y/Y — ?;2. If you do that, the field 
lines will be spread out ahead and behind the charge and will be squeezed together 
around the sides, as shown in Fig. 26-4. 

If we relate the strength of E to the density of the field lines in the conventional 
way, we see a stronger field at the sides and a weaker field ahead and behind, 
which is just what the equations say. First, if we look at the strength of the field 
at right angles to the line of motion, that is, for (x — vt) = 0, the distance from 



Fig. 26-4. The electric field of a 
charge moving with the constant speed 
v = 0.9c, part (b), compared with the 
field of a charge at rest, part (a). 
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the charge is (y 2 + z 2 ). Here the total field strength is \JE\ + E 2 , which is 


E = 


q 1 

47re 0 \/l — v 2 y 2 + z2 


(26.10) 


The field is proportional to the inverse square of the distance — just like the Cou- 
lomb field except increased by the constant, extra factor l/\/l — v 2 , which is 
always greater than one. So at the sides of a moving charge, the electric field is 
stronger than you get from the Coulomb law. In fact, the field in the sidewise 
direction is bigger than the Coulomb potential by the ratio of the energy of the 
particle to its rest mass. 

Ahead of the charge (and behind), y and z are zero and 


= g(l - V 2 ) 

47T€ 0 (^ — Vt) 2 


(26.11) 



Fig. 26-5. The magnetic field near 
a moving charge is v X E. (Compare 
with Fig. 26-4.) 


The field again varies as the inverse square of the distance from the charge but is 
now reduced by the factor (1 — v 2 ), in agreement with the picture of the field lines. 
If v/c is small, v 2 /c 2 is still smaller, and the effect of the (1 — v 2 ) terms is very 
small; we get back to Coulomb's law. But if a particle is moving very close to 
the speed of light, the field in the forward direction is enormously reduced, and 
the field in the sidewise direction is enormously increased. 

Our results for the electric field of a charge can be put this way: Suppose 
you were to draw on a piece of paper the field lines for a charge at rest, and then 
set the picture to travelling with the speed v. Then, of course, the whole picture 
would be compressed by the Lorentz contraction; that is, the carbon granules 
on the paper would appear in different places The miracle of it is that the picture 
you would see as the page flies by would still represent the field lines of the point 
charge. The contraction moves them closer together at the sides and spreads them 
out ahead and behind, just in the right way to give the correct line densities. We 
have emphasized before that field lines are not real but are only one way of repre- 
senting the field. However, here they almost seem to be real. In this particular 
case, if you make the mistake of thinking that the field lines are somehow really 
there in space, and transform them, you get the correct field. That doesn’t, however, 
make the field lines any more real All you need do to remind yourself that they 
aren’t real is to think about the electric fields produced by a charge together with 
a magnet; when the magnet moves, new electric fields are produced, and destroy 
the beautiful picture So the neat idea of the contracting picture doesn't work in 
general. It is, however, a handy way to remember what the fields from a fast- 
moving charge are like. 

The magnetic field is v X E [from Eq. (26.9)]. If you take the velocity crossed 
into a radial JE’-field, you get a B which circles around the line of motion, as shown 
in Fig. 26-5. If we put back the c's, you will see that it’s the same result we had 
for low-velocity charges. A good way to see where the c’s must go is to refer back 
to the force law, 

F = q(E + v X B). 


You see that a velocity times the magnetic field has the same dimensions as an 
electric field. So the right-hand side of Eq (26.9) must have a factor 1/c 2 : 

(26 12) 


For a slow-moving charge (v « c), we can take for E the Coulomb field; then 

R = q__ v JX_r 

4t r€„c* r< (26.13) 

This formula corresponds exactly to equations for the magnetic field of a current 
that we found in Section 14-7. 
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We would like to point out, in passing, something interesting for you to think 
about. (We will come back to discuss it again later.) Imagine two electrons with 
velocities at right angles, so that one will cross over the path of the other, but in 
front of it, so they don’t collide. At some instant, their relative positions will be 
as in Fig 26-6(a). We look at the force on q\ due to q 2 and vice versa. On q 2 
there is only the electric force from q u since q\ makes no magnetic field along its 
line of motion. On q u however, there is again the electric force but, in addition, 
a magnetic force, since it is moving in a 5-field made by q 2 . The forces are as drawn 
in Fig. 26-6(b). The electric forces on q x and q 2 are equal and opposite. However, 
there is a sidewise (magnetic) force on qi and no sidewise force on q 2 . Does action 
not equal reaction? We leave it for you to worry about. 


26-3 Relativistic transformation of the fields 

In the last section we calculated the electric and magnetic fields from the 
transformed potentials. The fields are important, of course, in spite of the argu- 
ments given earlier that there is physical meaning and reality to the potentials. 
The fields, too, are real. It would be convenient for many purposes to have a way 
to compute the fields in a moving system if you already know the fields in some 
“rest” system. We have the transformation laws for <f> and A, because A u is a 
four-vector. Now we would like to know the transformation laws of E and B . 
Given E and B in one frame, how do they look in another frame moving past? 
It is a convenient transformation to have. We could always work back through the 
potentials, but it is useful sometimes to be able to transform the fields directly. 
We will now see how that goes. 

How can we find the transformation laws of the fields? We know the trans- 
formation laws of the <f> and A , and we know how the fields are given in terms of 
(j> and A — it should be easy to find the transformation for the B and E. (You 
might think that with every vector there should be something to make it a four- 
vector, so with E there’s got to be something else we can use for the fourth com- 
ponent. And also for B. But it’s not so. It’s quite different from what you would 
expect.) To begin with, let’s take just a magnetic field 5, which is, of course 
V X A. Now we know that the vector potential with its x-, y-, and z-components 
is only a piece of something; there is also a /-component. Also we know that for 
derivatives like V, besides the x, y , z parts, there is also a derivative with respect to 
/. So let’s try to figure out what happens if we replace a “_y" by a “/”, or a “z" 
by a “/,” or something like that. 

First, notice the form of the terms in V X A when we write out the com- 
ponents: 


(a) 


F, V. x B , 


(b) 


q,E| ®B, 


q 2 E 2 = F 2 


Fig. 26-6. The forces between two 
moving charges are not always equal and 
opposite. It appears that “action” is not 
equal to “reaction.” 


dA z _ dA y __ dA x __ dA z _ dA y _ dA x 

dy dz v dz dx z dx dy 


(26.14) 


The x-component is equal to a couple of terms that involve only y- and z-com- 
ponents. Suppose we call this combination of derivatives and components a 
“z^-thing,” and give it a shorthand name, F zy . We simply mean that 


, — __ dAy 

zy ~"df~ ~df 


(26.15) 


Similarly. B y is equal to the same kind of “thing,” but this time it is an “xz-thing.” 
And B z is, of course, the corresponding “^x-thing.” We have 


B x = F zih B y = F xz , B z - F vx . (26.16) 


Now what happens if we simply try to concoct also some “/’’-type things 
like F xt and F tz (since nature should be nice and symmetric in x, y, z, and /)? For 
instance, what is F tz l It is, of course, 

dA t dA z 
dz dt 
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But remember that A f = <£, so it is also 


Table 26-1 
The components of F^v 

F, v = ~F Vfi 
= 0 

Ft, - -B z F xt = E x 

F,z = ~B X F !/t = E v 

F zx --- -B y F zt - E z 


d<f> dA z 
dz dt 


You’ve seen that before. It is the z-component of E . Well, almost — there is a 
sign wrong But we forgot that in the four-dimensional gradient the /-derivative 
comes with the opposite sign from x, y, and z So we should really have taken the 
more consistent extension of F tz as 


F 


tz 


dA t , dA* 
dz dt 


(26.17) 


Then it is exactly equal to —E z Trying also F n and F t , h we find that the three 
possibilities give 

F lx - -E xa F ly = -F v , F tz = ~E Z . (26.18) 

What happens if both subscripts are / 7 Or, for that matter, if both are a*? 


We get things like 

F dAt 

_ dAy 


11 dt 

dt 

and 

f - Me 

^ i 

1 


M - XX 

dx 

dx 

which give nothing but zero. 




We have then six of these F-things. There are six more which you get by 
reversing the subscripts, but they give nothing really new, since 


F rtf -- — F IJX . 

and so on. So, out of sixteen possible combinations of the four subscripts taken 
in pairs, we get only six different physical objects; and thev are the components 
of B and E. 

To represent the general term of F, we will use the general subscripts p and 
where each can stand for 0, 1,2, or 3— meaning in our usual four-vector notation 
/, a, y, and z Also, everything will be consistent with our four-vector notation if 
we define F^ v by 

F, v - V U A P - V y A M , (26 19) 


remembering that = (d/d/, —d/dx, — d/dv, — d/dz) and that A^ ^ (0, A r , A f/ , 
A A 

What we have found is that there are six quantities that belong together in 
nature— that are different aspects of the same thing. The electric and magnetic 
fields which we have considered as sepaiate vectors in our slow-moving world 
(where we don’t worry about the speed of light) are not vectors in four-space. 
They are parts of a new “thing.” Our physical “field” is really the six-component 
object F ilv . That is the way we must look at it for relativity We summanze our 
results on F yv in Table 26-1 

You see that what we have done here is to generalize the cross product We 
began with the curl operation, and the fact that the transformation properties of 
the curl are the same as the transformation properties of two vectors — the ordinary 
three-dimensional vector A and the gradient operator which we know also behaves 
like a vector Let’s look for a moment at an ordinary cross product in three di- 
mensions, for example, the angular momentum of a particle When an object is 
moving in a plane, the quantity (xv,, — yv x ) is important. For motion in three 
dimensions, there are three such important quantities, which we call the angular 
momentum; 


L tl/ - m( xr u - ycr), L tiZ = m(vr- - zr„), L Zi - m(zv t - xv z ) 

Then (although you may have forgotten by now) we discovered in Chapter 20 
of Vol I the miracle that these three quantities could be identified with the com- 
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ponents of a vector. In order to do so, we had to make an artificial rule with a 
right-hand convention It was just luck. It was luck because L %) (with i and j 
equal to x, y, or z) was an antisymmetric object 

LlJ " L ) ; , Is I X 0 

Of the nine possible quantities, there are only three independent numbers. And 
it just happens that when you change coordinate systems these three objects 
transform in exactly the same way as the components of a vector. 

The same thing lets us represent an element of surface as a vector A surface 
element has two parts— say dx and dy — which we can represent by the vector da 
normal to the suiface. But we can’t do that in four dimensions What is the 
“normal” to dx dyl Is it along z or along /? 

In short, for three dimensions it happens by luck that after you’ve taken a 
combination of two vectors like L ih you can represent it again by another vector 
because there are just three terms that happen to transform like the components 
of a vector But in four dimensions that is evidently impossible, because there are 
six independent terms, and you can’t represent six things by four things. 

Even in three dimensions it is possible to have combinations of vectors that 
can*t be represented by vectors. Suppose we take any two vectors a = (a x , c/ y , a z ) 
and b = (A r , by, b z ), and make the various possible combinations of components, 
like a x b r , a r b, r etc There would be nine possible quantities: 


Ctj.br, 

fljcbfj) 

eixb z . 

Cli/br, 

Cl,, by. 

Clyb Z , 

a z b x , 

cizby. 

a z b z . 

We might call these quantities T t] . 




If we now go to a rotated coordinate system (say rotated about the z-axis), 
the components of a and b are changed. In the new system, a x , for example, gets 
replaced by 

a' x = a x cos 9 + a y sin 9, 

and b u gets replaced by 

by = b u cos 0 — b x sin 9. 

And similarly for other components The nine components of the product quantity 
T n we have invented are all changed too, of course For instance, T XIJ = a r b,, 
gets changed to 

T xy = a x b u (cos 2 9) — a x b x ( cos 0 sin 9) + a y b y { sin 9 cos 9) — a u b x { sin 2 0), 
or 

T xy = T xtf cos 2 9 — T xx cos 9 sin 0 + T yy sin 6 cos 9 — T yx sin 2 0. 

Each component of T f u is a linear combination of the components of T u . 

So we discover that it is not only possible to have a “vector product” like 
a X b which has three components that transform like a vector, but we can — 
artificially— also make another kind of “product” of two vectors T l} with nine 
components that transform under a rotation by a complicated set of rules that 
we could figure out Such an object which has two indices to describe it, instead 
of one, is called a tensor. It is a tensor of the “second rank,” because you can 
play this game with three vectors too and get a tensor of the third rank, — or with 
four, to get a tensor of the fourth rank, and so on. A tensor of the first rank is a 
vector 

The point of all this is that our electromagnetic quantity F yv is also a tensor 
of the second rank, because it has two indices in it. It is, however, a tensor m 
four dimensions. It transforms in a special way which we will work out in a mo- 
ment — it is just the way a product of vectors transforms. For F yv it happens 
that if you change the indices around, F^ w changes sign. That’s a special case — it is 
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an antisymmetric tensor . So we say: the electric and magnetic fields are both part 
of an antisymmetric tensor of the second rank in four dimensions. 

You’ve come a long way. Remember way back when we defined what a 
velocity meant? Now we are talking about “an antisymmetric tensor of the 
second rank in four dimensions.” 

Now we have to find the law of the transformation of Fy V . It isn’t at all 
difficult to do; it’s just laborious — the brains involved are nil, but the work is not. 
What we want is the Lorentz transformation of V^A v — V„A M . Since V M is just a 
special case of a vector, we will work with the general antisymmetric vector com- 
bination. which we can call G fJLV : 


Gn V Myb v a v by. 


(26.20) 


(For our purposes, will eventually be replaced by V M and by will be replaced by 
the potential Ay.) The components of a M and by transform by the Lorentz formulas, 
which are 


a’t = 

Mf “ 

va x 

A' 

b t - 

■ nb x 

Vi“ 

w’ 

u t 

vr 

- V 

n r ~ 

a x - 

va t 

b f 

b x - 

■ vb t 

u x 

Vl- 

- v 2 

Ux 

vr 

— V 2 

a'y = 

Uy, 


K 



a'z = 

a 2 . 


V, 

= b z . 



(26.21) 


Now let’s transform the components of GV We start with G 

G tx — afix @xbt 

' a t ~ »a x \ f b x - vb t \ ( a x - va t \(b t - vb x 


,VT^r/\Vi - if 

— a t b x — a x b t . 

But that is just G tx ; so we have the simple result 

G'tx — Gtr- 

We will do one more 
a t 


Vl - v 2 /\vT 


<?tv = 


Vi 


va x b t — vb x ( a t b„ — a u b t ) — v(a r b„ — a„b x ) 

& u My 


So we get that 


Vi - w" 2 


nt — ^ty 

i h — 


VI - V 2 


vGj 


And, of course, in the same way, 

GU = 


vr - v 

G tz — vG X2 

VT^V 


It is clear how the rest will go. Let’s make a table of all six terms, only now we 
may as well write them for F V 


Fix = F tX9 


F f 

r xy 


F X y - vF ti 


\/l — V 2 


FI 


ty 


FL = 


tv 


vF x , 


vrv 


F'y, = F yz 


( 26 . 22 ) 


Ft, 


vF x 


Vl — v 2 

Of course, we still have F’ yV = —F(. y and F' m 
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F' = 

1 ZX 


Fzx — vF zf 


Vl" — v 2 


0 . 



So we have the transformation of the electric and magnetic fields. All we have 
to do is look at Table 26-1 to find out what our grand notation in terms of F^ v 
means in terms of E and B. It’s just a matter of substitution. So that we can see 
how it looks in the ordinary symbols, we'll rewrite our transformation of the 
field components in Table 26-2. 


Table 26-2 

The Lorentz transformation of the electric and magnetic fields (Note: c = 1) 


E' x 

= E x 

BL 

= B x 

E'u 

1! 

ise 

1 

bo 

B'y 

By + vE z 


y/j — V 2 

< 

i 

to 

£' 

E z T vB y 

B' z 

B z — vE„ 

^ z 

Vi - f 2 

\/l — V 2 


The equations in Table 26-2 tell us how E and B change if we go from one inertial 
frame to another. If we know E and B in one system, we can find what they are 
in another that moves by with the speed v. 

We can write these equations in a form that is easier to remember if we notice 
that since v is in the ^-direction, all the terms with v are components of the cross 
products v X E and v X B. So we can rewrite the transformations as shown in 
Table 26-3 


Table 26-3 

An alternative form for the field transformations (Note: c = 1) 


E' x = 

E x 

B' x 

= 

( E -f- v X B) y 

By 

E'. = 

(E + v X B) z 
\/l - V* 

B' z 


( B v X E)y 


\/l — v 2 
(B — v X E) z 


Vl“- v* 


It is now easier to remember which components go where In fact, the transforma- 
tion can be written even more simply if we define the field components along 
as the “parallel" components E u and B\\ (because they are parallel to the relative 
velocity of S and S' ), and the total transverse components — the vector sums of 
the y- and z-components — as the “perpendicular" components E± and B± Then 
we get the equations in Table 26-4. (We have also put back the c’s, so it will be 
more convenient when we want to refer back later ) 


Table 26-4 

Still another form for the Lorentz transformation of E and B 


E \ i 


Ei 


(E + v X B ) j 


V 1 


v 2 /c 2 


B[ i = B 


B'j_ 


(* - ^ 


vT 


v’Er'2 


The field transformations give us another way of solving some problems we 
have done before — for instance, for finding the fields of a moving point charge. 
We have worked out the fields before by differentiating the potentials. But we 
could now do it by transforming the Coulomb field. If we have a point charge 
at rest in the *S-frame, then there is only the simple radial ZT-field In the S'-frame 
we will see a point charge moving with the velocity i/, if the S'-frame moves by the 
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Fig. 26-7. The coordinate frame S' 
moving through a static electric field. 


5-frame with the speed v = —u. We will let you show that the transformations 
of Tables 26-3 and 26-4 give the same electric and magnetic fields we got in Section 
26-2. 

The transformation of Table 26-2 gives us an interesting and simple answer 
for what we see if we move past any system of fixed charges. For example, suppose 
we want to know the fields in our frame 5' if we are moving along between the 
plates of a condenser, as shown in Fig. 26-7. (It is, of course, the same thing if 
we say that a charged condenser is moving past us.) What do we see 9 The trans- 
formation is easy in this case because the 5-field in the original system is zero. 
Suppose, first, that our motion is perpendicular to E , then we will see an E f — 
E /\/ 1 — v~ 2 ~fc 2 which is still completely transverse. We will see, in addition, a 
magnetic field B f = — v X E f /c ? ‘. (The y/Y — v 2 doesn’t appear in our formula 
for B' because we wrote it in terms of E r rather than E ; but it’s the same thing.) 
So when we move along perpendicular to a static electric field, we see a reduced 
E and an added transverse B. If our motion is not perpendicular to E , we break 
E into E \\ and E±. The parallel part is unchanged, £[j = E\\, and the perpendicular 
component does as just described. 

Let’s take the opposite case, and imagine we are moving through a pure 
static magnetic field. This time we would see an electric field E f equal to v X B ', 
and the magnetic field changed by the factor 1 /a/1 — v 2 /c 2 (assuming it is trans- 
verse). So long as v is much less than c, we can neglect the change in the magnetic 
field, and the main effect is that an electric field appears. As one example of this 
effect, consider this once famous problem of determining the speed of an airplane. 
It’s no longer famous, since radar can now be used to determine the air speed 
from ground reflections, but for many years it was very hard to find the speed of 
an airplane m bad weather. You could not see the ground and you didn’t know 
which way was up, and so on. Yet it was important to know how fast you were 
moving relative to the earth. How can this be done without seeing the earth? 
Many who knew the transformation formulas thought of the idea of using the fact 
that the airplane moves in the magnetic field of the earth Suppose that an airplane 
is flying where there is a magnetic field more or less known. Let’s just take the 
simple case where the magnetic field is vertical. If we were flying through it with 
a horizontal velocity v, then, according to our formula, we should see an electric 
field which is v X B, i.e , perpendicular to the line of motion If we hang an 
insulated wire across the airplane, this electric field will induce charges on the ends 
of the wire. That is nothing new. From the point of view of someone on the ground, 
we are moving a wire through a field, and the v X B force causes charges to move 
to the ends of the wire The transformation equations just say the same thing in 
a different way. (The fact that we can say the thing more than one way doesn’t 
mean that one way is better than another We are getting so many different 
methods and tools that we can usually get the same result in 65 different ways*) 

So to measure y, all we have to do is measure the voltage between the ends of 
the wire. We can’t do it with a voltmeter because the same fields will act on the 
wires in the voltmeter, but there are ways of measuring such fields. We talked 
about some of them when we discussed atmospheric electricity in Chapter 9 So 
it should be possible to measure the speed of the airplane. 

This important problem was, however, never solved this way. The reason is 
that the electric field that is developed is of the order of millivolts per meter. It 
is possible ro measure such fields, but the trouble is that these fields are, unfortun- 
ately, not any different from any other electric fields. The field that is produced 
by motion through the magnetic field can’t be distinguished from some electric 
field that was already in the air from another cause, say from electrostatic charges 
in the air, or on the clouds We described in Chapter 9 that there are, typically, 
electric fields above the surface of the earth with strengths of about 100 volts per 
meter But they are quite irregular. So as the airplane flies through the air, it 
sees fluctuations of atmospheric electric fields which are enormous in comparison 
to the tiny fields produced by the v X B term, and it turns out for practical reasons 
to be impossible to measure speeds of an airplane by its motion through the earth’s 
magnetic field. 
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26-4 The equations of motion in relativistic notation* 

It doesn't do much good to find electric and magnetic fields from Maxwell's 
equations unless we know what the fields do when we have them. You may re- 
member that the fields are required to find the forces on charges, and that those 
forces determine the motion of the charge. So, of course, part of the theory of 
electrodynamics is the relation between the motion of charges and the forces. 

For a single charge in the fields E and B , the force is 

F = q(E + v X B). (26 23) 

This force is equal to the mass times the acceleration for low velocities, but the 
correct law for any velocity is that the force is equal to dp/dt. Writing p = 
niQv/Vl — v~/c z , we find that the relati vistically correct equation of motion is 


d / m {) v 
dt l ■ yj\ — v 2 /c 2 


- F = q(E + v X B). 


(26.24) 


We would like now to discuss this equation from the point of view of relativity. 
Since we have put our Maxwell equations in relativistic form, it would be interesting 
to see what the equations of motion would look like in relativistic form Let’s see 
whether we can rewrite the equation in a four-vector notation. 

We know that the momentum is part of a four-vector p^ whose time com- 
ponent is the energy /77 0 /\/l — r 2 /c 2 . So we might think to replace the left-hand 
side of Eq. (26 24) by dpjdt Then we need only find a fourth component to go 
with F. This fourth component must equal the rate-of-change of the energy, or the 
rate of doing work, which is F ■ v. We would then like to write the right-hand 
side of Eq (26.24) as a four-vector like (F ■ v, F x , F, r F z ). But this does not make 
a four-vector. 

The time derivative of a four-vector is no longer a four-vector, because the 
d/dt requires the choice of some special frame for measuring t. We got into that 
trouble before when we tried to make v into a four-vector. Our first guess was 
that the time component would be cdt/dt = c. But the quantities 


/ dx dy dz\ 
\ s dt dt ’ dt) 


(c, v) 


(26.25) 


are not the components of a four-vector We found that they could be made into 
one by multiplying each component by l/\/l — v 2 /c 2 . The “four-velocity" 
My, is the four-vector 


m 2 / />2 


Vi - 


(26.26) 


So it appears that the trick is to multiply d/dt by l/\/T 
derivatives to make a four-vector. 

Our second guess then is that 

1 d 

Vl - W dt W 


v 2 /c 2 , if we want the 


(26.27) 


should be a four-vector. But what is v? It is the velocity of the particle — not of a 
coordinate frame! Then the quantity/^ defined by 


u = 


F ■ v F 

y/\ — v 2 /c 2 v 1 — v 2 Jc 2 


(26.28) 


is the extension into four dimensions of a force — we can call it the “four-force.” 
It is indeed a four-vector, and its space components are not the components of 
F but of F/\/i — v 2 J7 2 . 


* In this section we will put back all of the c's. 
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The question is — why is / M a four-vector 9 It would be nice to get a little under- 
standing of that l/\ 7 1 — v 2 /c 2 factor Since it has come up twice now, it is time 
to see why the d/dt can always be fixed by the same factor. The answer is in the 
following: When we take the time derivative of some function x, we compute the 
increment Ax in a small interval At in the variable t. But in another frame, the 
interval At might correspond to a change in both t f and x', so if we vary only t\ 
the change in x will be different. We have to find a variable for our differentiation 
that is a measure of an “interval" in space-time , which will then be the same in 
all coordinate systems. When we take Ax for that interval, it will be the same for 
all coordinate frames. When a particle “moves” in four-space, there are the changes 
At , Ax, A y, Az. Can we make an invariant interval out of them? Well, they are 
the components of the four-vector Xp = ( ct , x, y, z) so if we define a quantity 
As by 

(Ax) 2 = ~ Ax,. Ax„ = ~ ( c 2 At 2 - Ax 2 - Ay 2 - Az 2 ) (26.29) 


— which is a four-dimensional dot product — we then have a good four-scalar to 
use as a measure of a four-dimensional interval. From As — or its limit ds — we 
can define a parameter s = jds. And a derivative with respect to s, d/ds, is a 
nice four-dimensional operation, because it is invariant with respect to a Lorentz 
transformation. 

It is easy to relate ds to dt for a moving particle. For a moving point particle, 

dx = v x dt , dy = v y dt , dz = v z dt , (26 30) 

and 

* = \(dt 2 /c 2 ){c 2 - v 2 x - V 2 - v 2 z ) = AVI - »- 2 / c 2 - (26.31) 


So the operator 


1 d 



is an invariant operator. If we operate on any four-vector with it, we get another 
four-vector. For instance, if we operate on ( ct , x, y , z), we get the four-velocity Up\ 


dxp 

ds 




We see now why the factor \/l — v 2 /c 2 fixes things up. 

The invariant variable s is a useful physical quantity. It is called the “proper 
time” along the path of a particle, because ds is always an interval of time in a 
frame that is moving with the particle at any particluar instant. (Then, Ax = 
Ay = Az = 0, and As = At.) If you can imagine some “clock” whose rate 
doesn’t depend on the acceleration, such a clock carried along with the particle 
would show the time s 

We can now go back and write Newton’s law (as corrected by Einstein) in 
the neat form 

^ = fu, (26.32) 


where / M is given in Eq. (26.28). Also, the momentum p^ can be written as 


dx u 

Pp = m 0 Up = m o > 


(26.33) 


where the coordinates x M = ( ct , x, y, z) now describe the trajectory of the particle. 
Finally, the four-dimensional notation gives us this very simple form of the equa- 
tions of motion: 


u = 



(26.34) 


which is reminiscent of F — ma It is important to notice that Eq. (26.34) is not 
the same as F — ma , because the four-vector formula Eq. (26.34) has in it the 
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relativistic mechanics which are different from Newton’s law for high velocities. 
It is unlike the case of Maxwell’s equations, where we were able to rewrite the 
equations in the relativistic form without any change in the meaning at all — but with 
just a change of notation. 

Now let’s return to Eq. (26.24) and see how we can write the right-hand side 
in four-vector notation. The three components — when divided by \/l — v 2 /c 2 — 
are the components ofy^, so 

f = + v X B) x = ! E x + v y B z v z B y 

\/j — V 2 /c 2 x/T — v 2 fd 2 \/\ — V^Jc 2 \/l " V 2 j C 2 

(26.35) 

Now we must put all quantities in their relativistic notation. First, c/\/l — v 2 /c 1 
and Vy/^l — v 2 /c 2 and v z /\/\ — v 2 /c 2 are the y-, and z-components of the 
four-velocity // M And the components of E and B are components of the second- 
rank tensor of the fields F yv . Looking back in Table 26-1 for the components of 
Fy V that correspond to E x , B z , and B y , we get 

fx q{u t F xt UyF X y w z F xz)*i 

which begins to look interesting. Every term has the subscript .x, which is reason- 
able, since we're finding an x-component. Then all the others appear in pairs: 
tu yy , — except that the xx-term is missing. So we just stick it in, and write 

fx ~ qiMtF x t u x F xx MyF xy u z F xz ). (26 36) 

We haven’t changed anything because F^ v is antisymmetric, and F rx is zero. The 
reason for wanting to put in the xx-term is so that we can write Eq. (26.36) in 
the short-hand form 

U = WvFy V . (26.37) 

This equation is the same as Eq (26.36) if we make the rule that whenever any 
subscript occurs twice (as v does here), you automatically sum over terms in the 
same way as for the scalar product, using the same convention for the signs. 

You can easily believe that (26.37) works equally well for ju = y or ji = z. 
but what about ji = tl Let’s see, for fun, what it says: 


ft = q{u t F t t ~ u x F tx - u t/ F(y - u z F iz ) 

Now we have to translate back to £”s and Z?’s. We get 


ft — q ( 0 + - x p i E x 4 v — E u 4 — 

\ \/T — v 2 /c 2 \/T — v 2 /c 2 vT 


'S/C* ) 


or 


ft = 


qv • E 


x/l — ~F 2 jc 2 


(26.38) 


But from Eq. (26.28), f t is supposed to be 

F • v _ q(E + v X B) ■ v 

\/l — v 2 /c 2 y/\ — v 2 1 F 2 

This is the same thing as Eq. (26.38), since (u X B) * v is zero. So everything comes 
out all right. 

Summarizing, our equation of motion can be written in the elegant form 

w ° = h = quJv ( 2 6 ’ 39 ) 


Although it is nice to see that the equations can be written that way, this form 
is not particularly useful It’s usually more convenient to solve for particle motions 
by using the original equations (26.24), and that's what we will usually do. 
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27 


Field Energy and Field Momentum 


27-1 Local conservation 


It is clear that the energy of matter is not conserved. When an object radiates 
light it loses energy. However, the energy lost is possibly describable in some other 
form, say in the light. Therefore the theory of the conservation of energy is 
incomplete without a consideration of the energy which is associated with the light 
or, in general, with the electromagnetic field. We take up now the law of conserva- 
tion of energy and, also, of momentum for the fields. Certainly, we cannot treat 
one without the other, because in the relativity theory they are different aspects of 
the same four-vector. 

Very early in Volume I, we discussed the conservation of energy; we said 
then merely that the total energy in the world is constant. Now we want to extend 
the idea of the energy conservation law in an important way — in a way that says 
something in detail about how energy is conserved. The new law will say that if 
energy goes away from a region, it is because it flows away through the boundaries 
of that region. It is a somewhat stronger law than the conservation of energy 
without such a restriction. 

To see what the statement means, let’s look at how the law of the conservation 
of charge works. We described the conservation of charge by saying that there is 
a current density j and a charge density p, and that when the charge decreases at 
some place there must be a flow of charge away from that place. We call that the 
conservation of charge. The mathematical form of the conservation law is 


V j = 


dp 

dt 


(27.1) 


This law has the consequence that the total charge in the world is always constant — 
there is never any net gain or loss of charge. However, the total charge in the 
world could be constant in another way. Suppose that there is some charge Q\ 
near some point (1) while there is no charge near some point (2) some distance 
away (Fig. 27-1). Now suppose that, as time goes on, the charge Q i were to 
gradually fade away and that simultaneously with the decrease of Q \ some charge 
Q 2 would appear near point (2), and in such a way that at every instant the sum of 
Qx and Q 2 was a constant. In other words, at any intermediate state the amount 
of charge lost by Q\ would be added to Q 2 . Then the total amount of charge in 
the world would be conserved. That’s a “world-wide” conservation, but not what 
we will call a “local” conservation, because in order for the charge to get from 
(1) to (2). it didn’t have to appear anywhere in the space between point (1) and 
point (2). Locally, the charge was just “lost.” 

There is a difficulty with such a “world-wide” conservation law in the theory 
of relativity. The concept of “simultaneous moments” at distant points is one which 
is not equivalent in different systems. Two events that are simultaneous in one 
system are not simultaneous for another system moving past. For “world-wide” 
conservation of the kind described, it is necessary that the charge lost from Q\ 
should appear simultaneously in Q 2 . Otherwise there would be some moments 
when the charge was not conserved. There seems to be no way to make the 
law of charge conservation relativistically invariant without making it a “local” 
conservation law. As a matter of fact, the requirement of the Lorentz relativistic 
invariance seems to restrict the possible laws of nature in surprising ways. In 
modern quantum field theory, for example, people have often wanted to alter the 
theory by allowing what we call a “nonlocal” interaction — where something here 

27-1 


27-1 Local conservation 

27-2 Energy conservation and 
electromagnetism 

27-3 Energy density and energy 
flow in the electromagnetic 
field 

27-4 The ambiguity of the field 
energy 

27-5 Examples of energy flow 

27-6 Field momentum 



(b) 


Fig. 27-1. Two ways to conserve 
charge: (a) Qi + Q 2 is constant; (b) 
dOi/c/t = fj-nda = —dQ 2 /dt. 



has a direct effect on something there — but we get in trouble with the relativity 
principle. 

“Local” conservation involves another idea. It says that a charge can get 
from one place to another only if there is something happening in the space between. 
To describe the law we need not only the density of charge, p, but also another 
kind of quantity, namely 7 , a vector giving the rate of flow of charge across a 
surface. Then the flow is related to the rate of change of the density by Eq. (27.1). 
This is the more extreme kind of a conservation law. It says that charge is con- 
served in a special way — conserved “locally.” 

It turns out that energy conservation is also a local process. There is not only 
an energy density in a given region of space but also a vector to represent the rate 
of flow of the energy through a surface. For example, when a light source radiates, 
we can find the light energy moving out from the source. If we imagine some mathe- 
matical surface surrounding the light source, the energy lost from inside the surface 
is equal to the energy that flows out through the surface. 


27-2 Energy conservation and electromagnetism 

We want now to write quantitatively the conservation of energy for electro- 
magnetism. To do that, we have to describe how much energy there is in any 
volume element of space, and also the rate of energy flow. Suppose we think first 
only of the electromagnetic field energy. We will let u represent the energy density 
in the field (that is, the amount of energy per unit volume in space) and let the 
vector S represent the energy flux of the field (that is, the flow of energy per unit 
time across a unit area perpendicular to the flow). Then, in perfect analogy with 
the conservation of charge, Eq (27 1), we can write the “local” law of energy 
conservation in the field as 

~ = — V - 5. (27.2) 

ot 

Of course, this law is not true in general; it is not true that the field energy is 
conserved. Suppose you are in a dark room and then turn on the light switch. All 
of a sudden the room is full of light, so there is energy in the field, although there 
wasn’t any energy there before. Equation (27.2) is not the complete conservation 
law, because the field energy alone is not conserved, only the total energy in the 
world — there is also the energy of matter. The field energy will change if there is 
some work being done by matter on the field or by the field on matter 

However, if there is matter inside the volume of interest, we know how much 
energy it has: Each particle has the energy m () c 2 /\/ 1 — v 2 /c 2 . The total energy 
of the matter is just the sum of all the particle energies, and the flow of this energy 
through a surface is just the sum of the energy carried by each particle that crosses 
the surface We want now to talk only about the energy of the electromagnetic 
field. So we must write an equation which says that the total field energy in a given 
volume decreases either because field energy flows out of the volume or because 
the field loses energy to matter (or gains energy, which is just a negative loss). 
The field energy inside a volume V is 


/ u dV , 

J v 

and its rate of decrease is minus the time derivative of this integral. The flow of 
field energy out of the volume V is the integral of the normal component of S over 
the surface S that encloses V, 


So 


S * n da. 


d_ 

dt 


udV = 

V 


S • it da + (work done on matter inside V). 


(27.3) 
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We have seen before that the field does work on each unit volume of matter 
at the rate E •/. [The force on a particle is F ~ q{E + v X B ), and the rate of 
doing work is F * v ~ qE * v. If there are TV particles per unit volume, the rate of 
doing work per unit volume is NqE * u, but Nqv ~ j ] So the quantity E • j must 
be equal to the Joss of energy per unit time and per unit volume by the field. 
Equation (27.3) then becomes 

udv = / S-nda+ / EjdV. (27.4) 

otJv Jz JV 

This is our conservation law for energy in the field. We can convert it into a 
differential equation like Eq. (27.2) if we can change the second term to a volume 
integral That is easy to do with Gauss’ theorem. The surface integral of the 
normal component of S is the integral of its divergence over the volume inside. 
So Eq. (27.3) is equivalent to 

~ ^rdV= V • S dV + / E jdV, 

J v at Jv Jv 

where we have put the time derivative of the first term inside the integral. Since 
this equation is true for any volume, we can take away the integrals and we have 
the energy equation for the electromagnetic fields: 

-f t =V-S+E-j. (27.5) 

Now this equation doesn’t do us a bit of good unless we know what u and S 
are. Perhaps we should just tell you what they are in terms of E and B , because 
all we really want is the result. However, we would rather show you the kind of 
argument that was used by Poynting in 1884 to obtain formulas for S and m, so 
you can see where they come from. (You won’t, however, need to learn this de- 
rivation for our later work.) 

27-3 Energy density and energy flow in the electromagnetic field 

The idea is to suppose that there is a field energy density u and a flux S that 
depend only upon the fields E and B . (For example, we know that in electrostatics, 
at least, the energy density can be written Je 0 E ■ E .) Of course, the u and S might 
depend on the potentials or something else, but let’s see what we can work out 
We can try to rewrite the quantity E j in such a way that it becomes the sum of 
two terms, one that is the time derivative of one quantity and another that is the 
divergence of a second quantity. The first quantity would then be u and the second 
would be 5 (with suitable signs). Both quantities must be written in terms of the 
fields only; that is, we want to write our equality as 

E-j = - V-S. (27.6) 

The left-hand side must first be expressed in terms of the fields only. How 
can we do that 7 By using Maxwell’s equations, of course. From Maxwell’s 
equation for the curl of B , 

j = e 0 c 2 V X B - e 0 ~- 

Substituting this in (27 6) we will have only 2Ts and BJ s: 

E j = e 0 c 2 E .(V X B) - eoE~- (27.7) 

We are already partly finished. The last term is a time derivative — it is 
(d/df)(%e 0 E * E). So ^e 0 E • E is at least one part of u. It’s the same thing we 
found in electrostatics. Now, all we have to do is to make the other term into the 
divergence of something. 
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Notice that the first term on the right-hand side of (27.7) is the same as 


(V X B) * E. (27.8) 

And, as you know from vector algebra, (a X b) c is the same as a * (h X c); 
so our term is also the same as 


V • (B X E) 


(27.9) 


and we have the divergence of “something, " just as we wanted. Only that's 
wrong 1 We warned you before that V is “like" a vector, but not “exactly" the 
same. The reason it is not is because there is an additional convention from cal- 
culus: when a derivative operator is in front of a product, it works on everything 
to the right. In Eq (27.7), the V operates only on B, not on E But in the form 
(27 9), the normal convention would say that V operates on both B and E So 
it’s not the same thing In fact, if we work out the components of V * (B X E) 
we can see that it is equal to E * (V X B) plus some other terms. It's like what 
happens when we take a derivative of a product in algebra For instance. 


d 

dx 


( jg ) 


df 

dx 


g + J 


dg 

dx 


Rather than working out all the components of V ■ (B X E ), we would like 
to show you a trick that is very useful for this kind of problem. It is a trick that 
allows you to use all the rules of vector algebra on expressions with the V operator, 
without getting into trouble The trick is to throw out— for a while at least — the 
rule of the calculus notation about what the derivative operator works on You 
see, ordinarily, the order of terms is used for two separate purposes. One is for 
calculus: f(d/dx)g is not the same as g(d/dx)f ; and the other is for vectors: 
a X b is different from b X a. We can, if we want, choose to abandon momentarily 
the calculus rule. Instead of saying that a derivative operates on everything to the 
right, we make a new rule that doesn't depend on the order in which terms are writ- 
ten down Then we can juggle terms around without worrying 

Here is our new convention, we show, by a subscript, what a differential op- 
erator works on; the order has no meaning. Suppose we let the operator D stand 
for d/dx. Then D f means that only the derivative of the variable quantity / is 
taken. Then 


But if we have D f fg , it means 

"'A -(!)*• 

But notice now that according to our new rule,/ D t g means the same thing We 
can write the same thing any which way. 

Dffg = g D f f = / D f g ~ fg D f . 

You see, the Df can even come after everything. (It's surprising that such a handy 
notation is never taught in books on mathematics or physics.) 

You may wonder: What if I want to write the derivative of fgl I want the 
derivative of both terms. That’s easy, you just say so; you write Dffg) + Dffg). 
That is just g(df/dx) + / (dg/dx), which is what you mean in the old notation by 
d(fg)/dx. 

You will see that it is now going to be very easy to work out a new expression 
for V * (B X E). We start by changing to the new notation; we write 

V • (B X E) = • (B X E) + ■ (B X E). (27.10) 

The moment we do that we don’t have to keep the order straight any more We 
always know that operates on E only, and V/; operates on B only In these 
circumstances, we can use V as though it were an ordinary vector. (Of course, 
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when we are finished, we will want to return to the “standard” notation that 
everybody usually uses ) So now we can do the various things like interchanging 
dots and crosses and making other kinds of rearrangements of the terms. For 
instance, the middle term of Eq. (27.10) can be rewritten as E - Vb X B. (You 
remember that abXc = bcX a.) And the last term is the same as BE X 
Ve- It looks freakish, but it is all right. Now if we try to go back to the ordinary 
convention, we have to arrange that the V operates only on its “own” variable. 
The first one is already that way, so we can just leave off the subscript. The second 
one needs some rearranging to put the V in front of the E , which we can do by 
reversing the cross product and changing sign: 

B - (E X V E ) = ~B ( V E X E ). 

Now it is in a conventional order, so we can return to the usual notation. Equation 
(27.10) is equivalent to 

V • (B X E) - E • (V X B) - B • (V X E). (27.1 1) 

(A quicker way would have been to use components in this special case, but it 
was worth taking the time to show you the mathematical trick. You probably 
won’t see it anywhere else, and it is very good for unlocking vector algebra from 
the rules about the order of terms with derivatives.) 

We now return to our energy conservation discussion and use our new result, 
Eq. (27.11), to transform the V X B term of Eq. (27.7). That energy equation 
becomes 

E j = e 0 c 2 V • (B X E) + e 0 c 2 B • (V X E) - ^ (%6 0 E • E) (27.12) 

Now you see we’re almost finished. We have one term which is a nice derivative 
with respect to t to use for u and another that is a beautiful divergence to represent 
S. Unfortunately, there is the center term left over, which is neither a divergence 
nor a derivative with respect to t . So we almost made it, but not quite. After 
some thought, we look back at the differential equations of Maxwell and discover 
that V X E is, fortunately, equal to — dB/dt , which means that we can turn the 
extra term into something that is a pure time derivative: 

8( . X£) . 

Now we have exactly what we want. Our energy equation reads 

E j = V • (e 0 c 2 B X E) - ~ (ff B-B+^E-E), (27.13) 

which is exactly like Eq. (27.6), if we make the definitions 

u = ~ E ■ E -\- B • B (27.14) 

and 

S = 6 () c 2 £ X B. (27.15) 

(Reversing the cross product makes the signs come out right.) 

Our program was successful. We have an expression for the energy density 
that is the sum of an “electric” energy density and a “magnetic” energy density, 
whose forms are just like the ones we found in statics when we worked out the 
energy in terms of the fields . Also, we have found a formula for the energy flow 
vector of the electromagnetic field. This new vector, S = e 0 c 2 E X B , is called 
“Poynting’s vector,” after its discoverer. It tells us the rate at which the field 
energy moves around m space. The energy which flows through a small area da 
per second is S • n da , where n is the unit vector perpendicular to da. (Now that 
we have our formulas for u and S , you can forget the derivations if you want.) 
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27-4 The ambiguity of the field energy 

Before we take up some applications of the Poynting formulas [Eqs. (27.14) 
and (27.15)], we would like to say that we have not really “proved” them. All 
we did was to find a possible “w” and a possible “5'.” How do we know that by 
juggling the terms around some more we couldn't find another formula for “w” 
and another formula for “5”? The new S and the new u would be different, but 
they would still satisfy Eq. (27.6). It’s possible. It can be done, but the forms that 
have been found always involve various derivatives of the field (and always with 
second-order terms like a second derivative or the square of a first derivative). 
There are, in fact, an infinite number of different possibilities for u and 5, and 
so far no one has thought of an experimental way to tell which one is right! People 
have guessed that the simplest one is probably the correct one, but we must say 
that we do not know for certain what is the actual location in space of the electro- 
magnetic field energy. So we too will take the easy way out and say that the field 
energy is given by Eq. (27.14). Then the flow vector 5 must be given by Eq. (27.15). 

It is interesting that there seems to be no unique way to resolve the indefinite- 
ness in the location of the field energy. It is sometimes claimed that this problem 
can be resolved by using the theory of gravitation in the following argument. 
In the theory of gravity, all energy is the source of gravitational attraction. There- 
fore the energy density of electricity must be located properly if we are to know in 
which direction the gravity force acts. As yet, however, no one has done such a 
delicate experiment that the precise location of the gravitational influence of 
electromagnetic fields could be determined. That electromagnetic fields alone can 
be the source of gravitational force is an idea it is hard to do without. It has, in 
fact, been observed that light is deflected as it passes near the sun — we could 
say that the sun pulls the light down toward it. Do you not want to allow that the 
light pulls equally on the sun? Anyway, everyone always accepts the simple 
expressions we have found for the location of electromagnetic energy and its flow. 
And although sometimes the results obtained from using them seem strange, 
noboby has ever found anything wrong with them — that is, no disagreement with 
experiment. So we will follow the rest of the world — besides, we believe that it is 
probably perfectly right. 

We should make one further remark about the energy formula. In the first 
place, the energy per unit volume in the field is very simple: It is the electrostatic 
energy plus the magnetic energy, if we write the electrostatic energy in terms of 
E 2 and the magnetic energy as B 2 . We found two such expressions as possible 
expressions for the energy when we were doing static problems. We also found a 
number of other formulas for the energy in the electrostatic field, such as p0, 
which is equal to the integral of E • E in the electrostatic case However, in an 
electrodynamic field the equality failed, and there was no obvious choice as to 
which was the right one. Now we know which is the right one. Similarly, we have 
found the formula for the magnetic energy that is correct in general The right 
formula for the energy density of dynamic fields is Eq (27.14) 



Fig. 27-2. The vectors E, B, and S 
for a light wave. 


27-5 Examples of energy flow 

Our formula for the energy flow vector S is something quite new. We want 
now to see how it works in some special cases and also to see whether it checks 
out with anything that we knew before. The first example we will take is light. 
In a light wave we have an E vector and a B vector at right angles to each other 
and to the direction of the wave propagation. (See Fig 27-2.) In an electromag- 
netic wave, the magnitude of B is equal to 1 /c times the magnitude of E , and since 
they are at right angles, 

\E X B\ = £ 

€ 

Therefore, for light, the flow of energy per unit area per second is 
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5 = e 0 cE*. 


(27.16) 



For a light wave in which E = F 0 cos to(/ — x/c), average rate of energy 
flow per unit area, (S) av — which is called the “intensity” of the light — is the mean 
value of the square of the electric field times e 0 c : 

Intensity = (S) av = e 0 c(£ 2 ) av . (27.17) 

Believe it or not, we have already derived this result in Section 31-3 of Vol. I, 
when we were studying light. We can believe that it is right because it also checks 
against something else When we have a light beam, there is an energy density in 
space given by Eq. (27.14). Using cB = E for a light wave, we get that 

But E varies in space, so the average energy density is 

<«> av = e 0 (E 2 ) av . (27.18) 


Now the wave travels at the speed c\ so we should think that the energy that goes 
through a square meter in a second is c times the amount of energy in one cubic 
meter. So we would say that 

(S) av = e 0 c(E 2 ). iV . 

And it’s right; it is the same as Eq. (27.17). 

Now we take another example. Here is a rather curious one. We look at the 
energy flow in a capacitor that we are charging slowly. (We don’t want frequencies 
so high that the capacitor is beginning to look like a resonant cavity, but we don't 
want DC either.) Suppose we use a circular parallel plate capacitor of our usual 
kind, as shown in Fig. 27-3. There is a nearly uniform electric field inside which is 
changing with time. At any instant the total electromagnetic energy inside is u 
times the volume. If the plates have a radius a and a separation A, the total energy 
between the plates is 

V = E^j(Ta 2 h). (27.19) 

This energy changes when E changes. When the capacitor is being charged, the 
volume between the plates is receiving energy at the rate 

^ = e 0 ircfhEE. (27.20) 

at 



Fig. 27-3. Near a charging capaci- 
tor, the Poynting vector S points inward 
toward the axis. 


So there must be a flow of energy into that volume from somewhere. Of course 
you know that it must come in on the charging wires — not at all! It can’t enter 
the space between the plates from that direction, because E is perpendicular to 
the plates; E X B must be parallel to the plates. 

You remember, of course, that there is a magnetic field that circles around 
the axis when the capacitor is charging. We discussed that in Chapter 23. Using 
the last of Maxwell’s equations, we found that the magnetic field at the edge of the 
capacitor is given by 

2tt ac 2 B ~ E • ira 2 , 
or 


Its direction is shown in Fig. 27-3. So there is an energy flow proportional 
to E X B that comes in all around the edges, as shown in the figure. The 
energy isn’t actually coming down the wires, but from the space surrounding the 
capacitor. 

Let’s check whether or not the total amount of flow through the whole surface 
between the edges of the plates checks with the rate of change of the energy inside — 
it had better; we went through all that work proving Eq. (27.15) to make sure, 
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but let's see. The area of the surface is lirah, and S = e 0 c 2 E X B is in magnitude 



Fig. 27-4. The fields outside a capacitor 
when it is being charged by bringing two 
charges from a large distance. 



Fig. 27-5 The Poynting vectorS near 
a wire carrying a current. 



Fig. 27-6. A charge and a magnet 
produce a Poynting vector that circulates 
in closed loops. 


£ ) ■ 

so the total flux of energy is 

7ra 2 he 0 EE. 

It does check with Eq. (27.20). But it tells us a peculiar thing: that when we are 
charging a capacitor, the energy is not coming down the wires; it is coming in 
through the edges of the gap. That’s what this theory says! 

How can that be? That’s not an easy question, but here is one way of thinking 
about it. Suppose that we had some charges above and below the capacitor and 
far away. When the charges are far away, there is a weak but enormously spread- 
out field that surrounds the capacitor. (See Fig. 27-4.) Then, as the charges 
come together, the field gets stronger nearer to the capacitor. So the field energy 
which is way out moves toward the capacitor and eventually ends up between the 
plates. 

As another example, we ask what happens in a piece of resistance wire when it 
is carrying a current. Since the wire has resistance, there is an electric field along it, 
driving the current. Because there is a potential drop along the wire, there is also 
an electric field just outside the wire, parallel to the surface. (See Fig. 27-5.) 
There is, in addition, a magnetic field which goes around the wire because of the 
current. The E and B are at right angles; therefore there is a Poynting vector 
directed radially inward, as shown in the figure. There is a flow of energy into the 
wire all around. It is, of course, equal to the energy being lost in the wire in the 
form of heat. So our “crazy” theory says that the electrons are getting their 
energy to generate heat because of the energy flowing into the wire from the field 
outside. Intuition would seem to tell us that the electrons get their energy from 
being pushed along the wire, so the energy should be flowing down (or up) along 
the wire. But the theory says that the electrons are really being pushed by an electric 
field, which has come from some charges very far away, and that the electrons get 
their energy for generating heat from these fields. The energy somehow flows 
from the distant charges into a wide area of space and then inward to the wire. 

Finally, in order to really convince you that this theory is obviously nuts, 
we will take one more example — an example in which an electric charge and a 
magnet are at rest near each other — both sitting quite still. Suppose we take the 
example of a point charge sitting near the center of a bar magnet, as shown in 
Fig. 27-6 Everything is at rest, so the energy is not changing with time. Also, 
E and B are quite static. But the Poynting vector says that there is a flow of energy, 
because there is an E X B that is not zero. If you look at the energy flow, you find 
that it just circulates around and around. There isn’t any change in the energy 
anywhere — everything which flows into one volume flows out again It is like 
incompressible water flowing around. So there is a circulation of energy in this 
so-called static condition. How absurd it gets! 

Perhaps it isn’t so terribly puzzling, though, when you remember that what 
we called a “static” magnet is really a circulating permanent current. In a perma- 
nent magnet the electrons are spinning permanently inside. So maybe a circulation 
of the energy outside isn’t so queer after all. 

You no doubt begin to get the impression that the Poynting theory at least 
partially violates your intuition as to where energy is located in an electromagnetic 
field. You might believe that you must revamp all your intuitions, and, therefore 
have a lot of things to study here. But it seems really not necessary You don’t 
need to feel that you will be in great trouble if you forget once in a while that the 
energy in a wire is flowing into the wire from the outside, rather than along the 
wire. It seems to be only rarely of value, when using the idea of energy conserva- 
tion, to notice in detail what path the energy is taking. The circulation of energy 
around a magnet and a charge seems, in most circumstances, to be quite unimpor- 
tant. It is not a vital detail, but it is clear that our ordinary intuitions are quite 
wrong. 
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27-6 Field momentum 

Next we would like to talk about the momentum in the electromagnetic field. 
Just as the field has energy, it will have a certain momentum per unit volume. 
Let us call that momentum density g . Of course, momentum has various possible 
directions, so that g must be a vector. Let’s talk about one component at a time; 
first, we take the ^-component. Since each component of momentum is conserved 
we should be able to write down a law that looks something like this: 

d / moment um\ _ dg x / moment um\ 
dt \ of matter ) x dt \ outflow ) x 

The left side is easy. The rate-of-change of the momentum of matter is just the 
force on it. For a particle, it is F = q(E + v X B) ; for a distribution of charges, 
the force per unit volume is ( pE + j X B). The “momentum outflow” term, 
however, is strange. It cannot be the divergence of a vector because it is not a 
scalar; it is, rather, an x-component of some vector. Anyway, it should probably 
look something like 

da db dc 

dx dy dz 5 

because the x-momentum could be flowing in any one of the three directions. 
In any case, whatever a , b, and c are, the combination is supposed to equal the 
outflow of the x-momentum. 

Now the game would be to write pE + j X B in terms only of E and B — 
eliminating p an j by using Maxwell’s equations — and then to juggle terms and make 
substitutions to get it into a form that looks like 

dg x da db dc 

dt “ h dx^~ dy dz' 

Then, by identifying terms, we would have expressions for g x , a, b , and c . It’s a 
lot of work, and we are not going to do it. Instead, we are only going to find an 
expression for g , the momentum density — and by a different route. 

There is an important theorem in mechanics which is this: whenever there is 
a flow of energy in any circumstance at all (field energy or any other kind of energy), 
the energy flowing through a unit area per unit Fme, when multiplied by 1/c 2 , is 
equal to the momentum per unit volume in the space In the special case of elec- 
trodynamics, this theorem gives the result that g is 1 /c 2 times the Poynting vector 


g = 



(27.21) 


So the Poynting vector gives not only energy flow but, if you divide by c 2 , also the 
momentum density. The same result would come out of the other analysis we 
suggested, but it is more interesting to notice this more general result. We will 
now give a number of interesting examples and arguments to convince you that 
the general theorem is true. 

First example: Suppose that we have a lot of particles in a box — let’s say N 
per cubic meter — and that they are moving along with some velocity v. Now let’s 
consider an imaginary plane surface perpendicular to v. The energy flow through 
a unit area of this surface per second is equal to Nv , the number which flow through 
the surface per second, times the energy carried by each one. The energy in each 
particle is m 0 c 2 /\/ 1 — v 2 /c 2 . So the energy flow per second is 




But the momentum of each particle is m^v/y/l — v 2 /c 2 , so the density of mo- 
mentum is 

N ^ , 

VT — v 2 /c 7l 
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which is just 1/c 2 times the energy flow — as the theorem says. So the theorem is 
true for a bunch of particles. 

It is also true for light. When we studied light in Volume I, we saw that when 
the energy is absorbed from a light beam, a certain amount of momentum is de- 
livered to the absorber. We have, in fact, shown in Chapter 36 of Vol. I that the 
momentum is 1/c times the energy absorbed [Eq. (36.24) of Vol. I]. If we let U 0 
be the energy arriving at a unit area per second, then the momentum arriving at a 
unit area per second is U 0 /c. But the momentum is travelling at the speed c, so its 
density in front of the absorber must be Uo/c 2 . So again the theorem is right. 

Finally we will give an argument due to Einstein which demonstrates the 
same thing once more. Suppose that we have a railroad car on wheels (assumed 
frictionless) with a certain big mass M. At one end there is a device which will 
shoot out some particles or light (or anything, it doesn’t make any difference what 
it is), which are then stopped at the opposite end of the car. There was some 
energy originally at one end — say the energy U indicated in Fig. 27-7 (a) — and then 
later it is at the opposite end, as shown in Fig. 27-7(c). The energy U has been 
displaced the distance L , the length of the car. Now the energy U has the mass 
U/c 2 , so if the car stayed still, the center of gravity of the car would be moved. 
Einstein didn’t like the idea that the center of gravity of an object could be moved 
by fooling around only on the inside, so he assumed that it is impossible to move 
the center of gravity by doing anything inside. But if that is the case, when we 
moved the energy U from one end to the other, the whole car must have recoiled 
some distance x, as shown in part (c) of the figure. You can see, in fact, that the 
total mass of the car, times x, must equal the mass of the energy moved, U/c 2 
times L (assuming that U/c 2 is much less than M): 

Mx = ^L. (27.22) 

c z 

Let’s now look at the special case of the energy being carried by a light flash. 
(The argument would work as well for particles, but we will follow Einstein, who 
was interested in the problem of light ) What causes the car to be moved 9 Einstein 
argued as follows: When the light is emitted there must be a recoil, some unknown 
recoil with momentum p . It is this recoil which makes the car roll backward. 
The recoil velocity v of the car will be this momentum divided by the mass of the 
car: 



Fig. 27-7. The energy U in motion at 
the speed c carries the momentum U/c. 


The car moves with this velocity until the light energy U gets to the opposite end. 
Then, when it hits, it gives back its momentum and stops the car. If x is small, 
then the time the car moves is nearly equal to L/c; so we have that 


x — vt 


L 

v — 

c 


A k 

M c * 


Putting this x in Eq. (27.22), we get that 

U 


Again we have the relation of energy and momentum for light. Dividing by c to 
get the momentum density g = p/c , we get once more that 

g = /z- (27.23) 


You may well wonder: What is so important about the center-of-gravity 
theorem? Maybe it is wrong. Perhaps, but then we would also lose the conserva- 
tion of angular momentum. Suppose that our boxcar is moving along a track at 
some speed v and that we shoot some light energy from the top to the bottom of 
the car — say, from A to B in Fig. 27-8. Now we look at the angular momentum of 
the system about the point P Before the energy U leaves A , it has the mass 
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m = U 2 /c and the velocity r, so it has the angular momentum mvr a When it 
arrives at B , it has the same mass and, if the linear momentum of the whole boxcar 
is not to change, it must still have the velocity v. It’s angular momentum about P 
is then mvr$. The angular momentum will be changed unless the right recoil 
momentum was given to the car when the light was emitted — that is, unless the 
light carries the momentum U/c. It turns out that the angular momentum con- 
servation and the theorem of center-of-gravity are closely related in the relativity 
theory. So the conservation of angular momentum would also be destroyed if our 
theorem were not true At any rate, it does turn out to be a true general law, and 
in the case of electrodynamics we can use it to get the momentum in the field. 

We will mention two further examples of momentum in the electromagnetic 
field. We pointed out in Section 26-2 the failure of the law of action and reaction 
when two charged particles were moving on orthogonal trajectories. The forces 
on the two particles don’t balance out, so the action and reaction are not equal, 
therefore the net momentum of the matter must be changing. It is not conserved 
But the momentum in the field is also changing in such a situation. If you work 
out the amount of momentum given by the Poynting vector, it is not constant. 
However, the change of the particle momenta is just made up by the field momen- 
tum, so the total momentum of particles plus field is conserved. 

Finally, another example is the situation with the magnet and the charge, 
shown in Fig. 27-6. We were unhappy to find that energy was flowing around in 
circles, but now, since we know that energy flow and momentum are proportional, 
we know also that there is momentum circulating in the space. But a circulating 
momentum means that there is angular momentum. So there is angular momentum 
in the field. Do you remember the paradox we described in Section 17-4 about a 
solenoid and some charges mounted on a disc? It seemed that when the current 
turned off, the whole disc should start to turn The puzzle was: Where did the 
angular momentum come from ? The answer is that if you have a magnetic field and 
some charges, there will be some angular momentum in the field. It must have 
been put there when the field was built up. When the field is turned off, the angular 
momentum is given back. So the disc in the paradox would start rotating. 
This mystic circulating flow of energy, which at first seemed so ridiculous, is ab- 
solutely necessary. There is really a momentum flow. It is needed to maintain the 
conservation of angular momentum in the whole world. 
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Fig. 27-8. The energy U must carry 
the momentum l/ c if the angular mo- 
mentum about P is to be conserved. 
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Electromagnetic Mass 


28-1 The field energy of a point charge 

In bringing together relativity and Maxwell’s equations, we have finished our 
main work on the theory of electromagnetism. There are, of course, some details 
we have skipped over and one large area that we will be concerned with m the future 
— the interaction of electromagnetic fields with matter. But we want to stop for a 
moment to show you that this tremendous edifice, which is such a beautiful 
success in explaining so many phenomena, ultimately falls on its face. When 
you follow any of our physics too far, you find that it always gets into some kind 
of trouble. Now we want to discuss a serious trouble — the failure of the classical 
electromagnetic theory. You can appreciate that there is a failure of all classical 
physics because of the quantum-mechanical effects. Classical mechanics is a mathe- 
matically consistent theory; it just doesn’t agree with experience. It is interesting, 
though, that the classical theory of electromagnetism is an unsatisfactory theory 
all by itself. There are difficulties associated with the ideas of Maxwell’s theory 
which are not solved by and not directly associated with quantum mechanics. 
You may say, “Perhaps there’s no use worrying about these difficulties. Since the 
quantum mechanics is going to change the laws of electrodynamics, we should 
wait to see what difficulties there are after the modification.” However, when 
electromagnetism is joined to quantum mechanics, the difficulties remain. So it 
will not be a waste of our time now to look at what these difficulties are. Also, 
they are of great historical importance. Furthermore, you may get some feeling 
of accomplishment from being able to go far enough with the theory to see every- 
thing — including all of its troubles. 

The difficulty we speak of is associated with the concepts of electromagnetic 
momentum and energy, when applied to the electron or any charged particle. 
The concepts of simple charged particles and the electromagnetic field are in some 
way inconsistent. To describe the difficulty, we begin by doing some exercises 
with our energy and momentum concepts. 

First, we compute the energy of a charged particle. Suppose we take a simple 
model of an electron in which all of its charge q is uniformly distributed on the 
surface of a sphere of radius a , which we may take to be zero for the special case of 
a point charge. Now let’s calculate the energy in the electromagnetic field. If 
the charge is standing still, there is no magnetic field, and the energy per unit 
volume is proportional to the square of the electric field. The magnitude of the 
electric field is q/Aire^r 2 , and the energy density is 
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To get the total energy, we must integrate this density over all space. Using the 
volume element 47rr 2 dr , the total energy, which we will call U 0 i,. c , is 
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This is readily integrated. The lower limit is a , and the upper limit is oo, so 
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If we use the electronic charge q e for q and the symbol e 2 for <7?/47 t€ 0 , then 


It is all fine until we set a equal to zero for a point charge — there’s the great 
difficulty. Because the energy of the field varies inversely as the fourth power of 
the distance from the center, its volume integral is infinite. There is an infinite 
amount of energy in the field surrounding a point charge. 

What’s wrong with an infinite energy? If the energy can’t get out, but must 
stay there forever, is there any real difficulty with an infinite energy? Of course, a 
quantity that comes out infinite may be annoying, but what really matters is only 
whether there are any observable physical effects. To answer that question, we 
must turn to something else besides the energy. Suppose we ask how the energy 
changes when we move the charge. Then, if the changes are infinite, we will be 
in trouble. 



Fig. 28-1. The fields E and B and the 
momentum density g for a positive elec- 
tron. For a negative electron, E and B 
are reversed but g is not. 
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28-2 The field momentum of a moving charge 

Suppose an electron is moving at a uniform velocity through space, assuming 
for a moment that the velocity is low compared with the speed of light. Associated 
with this moving electron there is a momentum — even if the electron had no mass 
before it was charged — because of the momentum in the electromagnetic field. 
We can show that the field momentum is in the direction of the velocity v of the 
charge and is, for small velocities, proportional to v. For a point P at the distance 
r from the center of the charge and at the angle 0 with respect to the line of motion 
(see Fig. 28-1) the electric field is radial and, as we have seen, the magnetic field 
is v X E/c 2 . The momentum density, Eq. (27.21), is 

g = e 0 EX B. 

It is directed obliquely toward the line of motion, as shown in the figure, and has 
the magnitude 


The fields are symmetric about the line of motion, so when we integrate over 
space, the transverse components will sum to zero, giving a resultant momentum 
parallel to v. The component of g m this direction is g sin 0 , which we must inte- 
grate over all space. We take as our volume element a ring with its plane per- 
pendicular to v, as shown in Fig. 28-2. Its volume is 27rr 2 sin 0 dd dr. The total 
momentum is then 

p — J E 2 sin 2 0 lirr 2 sin 0 dd dr. 


Since E is independent of 6 (for v « c), we can immediately integrate over 0; the 
integral is 



(1 — cos 2 6) d( cos 6) = —cos 6 -| j 


Fig. 28-2. The volume element 
27 rr 2 sin 6 dd dr used for calculating the 
field momentum. 


The limits of d are 0 and i r, so the 0-integral gives merely a factor of 4/3, and 
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The integral (for v « c ) is the one we have just evaluated to find the energy; it is 
q 2 /\6ir 2 ^a, and 

_ 2 q 2 v 
^ 3 47reo ac 2 ’ 

or 


P = 


3 ac 2 
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The momentum in the field — the electromagnetic momentum — is proportional to 
v. It is just what we should have for a particle with the mass equal to the coefficient 
of i>. We can, therefore, call this coefficient the electromagnetic mass , m 0 \ vc , and 
write it as 
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28-3 Electromagnetic mass 

Where does the mass come from? In our laws of mechanics we have supposed 
that every object “carries” a thing we call the mass — which also means that it 
“carries” a momentum proportional to its velocity. Now we discover that it is 
understandable that a charged particle carries a momentum proportional to its 
velocity. It might, in fact, be that the mass is just the effect of electrodynamics. 
The origin of mass has until now been unexplained. We have at last in the theory 
of electrodynamics a grand opportunity to understand something that we never 
understood before. It comes out of the blue — or rather, from Maxwell and 
Poynting — that any charged particle will have a momentum proportional to its 
velocity just from electromagnetic influences. 

Let’s be conservative and say, for a moment, that there are two kinds of mass — 
that the total momentum of an object could be the sum of a mechanical momentum 
and the electromagnetic momentum. The mechanical momentum is the “mechan- 
ical” mass, m lnvv u, times v. In experiments where we measure the mass of a particle 
by seeing how much momentum it has, or how it swings around in an orbit, we 
are measuring the total mass. We say generally that the momentum is the total 
mass (m m(iC n + times the velocity. So the observed mass can consist of two 

pieces (or possibly more if we include other fields): a mechanical piece plus an 
electromagnetic piece. We know that there is definitely an electromagnetic piece, 
and we have a formula for it. And there is the thrilling possibility that the me- 
chanical piece is not there at all — that the mass is all electromagnetic. 

Let’s see what size the electron must have if there is to be no mechanical mass. 
We can find out by setting the electromagnetic mass of Eq. (28.4) equal to the 
observed mass m e of an electron. We find 

2 e 2 

a = * ^ 

3 m e c 2 

The quantity 


is called the “classical electron radius”; it has the numerical value 2.82 X 10~ 13 
cm, about one one-hundred-thousandth of the diameter of an atom. 

Why is r 0 called the electron radius, rather than our a ? Because we could 
equally well do the same calculation with other assumed distributions of charges — 
the charge might be spread uniformly through the volume of a sphere or it might 
be smeared out like a fuzzy ball. For any particular assumption the factor 2/3 
would change to some other fraction. For instance, for a charge uniformly dis- 
tributed throughout the volume of a sphere, the 2/3 gets replaced by 4/5. Rather 
than to argue over which distribution is correct, it was decided to define r 0 as the 
“nominal” radius. Then different theories could supply their pet coefficients. 

Let’s pursue our electromagnetic theory of mass. Our calculation was for 
v « c; what happens if we go to high velocities? Early attempts led to a certain 
amount of confusion, but Lorentz realized that the charged sphere would contract 
into a ellipsoid at high velocities and that the fields would change in accordance 
with the formulas (26 6) and (26.7) we derived for the relativistic case in Chapter 26. 
If you carry through the integrals for p in that case, you find that for an arbitrary 
velocity v , the momentum is altered by the factor \/s/\ — v%/c 2 : 
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In other words, the electromagnetic mass rises with velocity inversely as 
v 7 1 — v 2 /c 2 — a discovery that was made before the theory of relativity. 

Early experiments were proposed to measure the changes with velocity in the 
observed mass of a particle in order to determine how much of the mass was 
mechanical and how much was electrical. It was believed at the time that the elec- 
trical part would vary with velocity, whereas the mechanical part would not . But 
while the experiments were being done, the theorists were also at work. Soon the 
theory of relativity was developed, which proposed that no matter what the origin 
of the mass, it all should vary as m 0 /s / 1 — v 2 /c 2 . Equation (28.7) was the 
beginning of the theory that mass depended on velocity. 

Let’s now go back to our calculation of the energy in the field, which led to 
Eq. (28.2). According to the theory of relativity, the energy U will have the mass 
U/c 2 ; Eq. (28.2) then says that the field of the electron should have the mass 
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which is not the same as the electromagnetic mass, m v \ vv , of Eq. (28.4). In fact, if 
we just combine Eqs. (28.2) and (28.4), we would write 

jj _ 3 2 

t^elec * 

This formula was discovered before relativity, and when Einstein and others began 
to realize that it must always be that U — me 2 , there was great confusion. 


28-4 The force of an electron on itself 

The discrepancy between the two formulas for the electromagnetic mass is 
especially annoying, because we have carefully proved that the theory of electro- 
dynamics is consistent with the principle of relativity. Yet the theory of relativity 
implies without question that the momentum must be the same as the energy 
times v/c 2 . So we are in some kind of trouble; we must have made a mistake. 
We did not make an algebraic mistake in our calculations, but we have left some- 
thing out. 

In deriving our equations for energy and momentum, we assumed the con- 
servation laws. We assumed that all forces were taken into account and that any 
work done and any momentum carried by other “nonelectrical” machinery was 
included. Now if we have a sphere of charge, the electrical forces are all repulsive 
and an electron would tend to fly apart. Because the system has unbalanced forces, 
we can get all kinds of errors in the laws relating energy and momentum. To get a 
consistent picture, we must imagine that something holds the electron together. 
The charges must be held to the sphere by some kind of rubber bands — something 
that keeps the charges from flying off. It was first pointed out by Poincare that the 
rubber bands — or whatever it is that holds the electron together — must be included 
m the energy and momentum calculations. For this reason the extra nonelectrical 
forces are also known by the more elegant name “the Poincare stresses.” If the 
extra forces are included in the calculations, the masses obtained in two ways are 
changed (in a way that depends on the detailed assumptions). And the results are 
consistent with relativity; i.e., the mass that comes out from the momentum cal- 
culation is the same as the one that comes from the energy calculation. However, 
both of them contain two contributions: an electromagnetic mass and contribution 
from the Poincare stresses. Only when the two are added together do we get a 
consistent theory. 

It is therefore impossible to get all the mass to be electromagnetic in the way 
we hoped. It is not a legal theory if we have nothing but electrodynamics. Some- 
thing else has to be added. Whatever you call them — “rubber bands,” or “Poincare 
stresses,” or something else — there have to be other forces in nature to make a 
consistent theory of this kind. 
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Clearly, as soon as we have to put forces on the inside of the electron, the 
beauty of the whole idea begins to disappear. Things get very complicated. You 
would want to ask: How strong are the stresses? How does the electron shake? 
Does it oscillate? What are all its internal properties? And so on. It might be 
possible that an electron does have some complicated internal properties. If we 
made a theory of the electron along these lines, it would predict odd properties, 
like modes of oscillation, which haven’t apparently been observed. We say “ap- 
parently” because we observe a lot of things in nature that still do not make sense. 
We may someday find out that one of the things we don’t understand today (for 
example, the muon) can, in fact, be explained as an oscillation of the Poincare 
stresses. It doesn’t seem likely, but no one can say for sure. There are so many 
things about fundamental particles that we still don’t understand. Anyway, the 
complex structure implied by this theory is undesirable, and the attempt to explain 
all mass in terms of electromagnetism — at least in the way we have described — has 
led to a blind alley. 

We would like to think a little more about why we say we have a mass when 
the momentum in the field is proportional to the velocity. Easy! The mass is the 
coefficient between momentum and velocity. But we can look at the mass in another 
way: a particle has mass if you have to exert a force in order to accelerate it. So 
it may help our understanding if we look a little more closely at where the forces 
come from. How do we know that there has to be a force? Because we have 
proved the law of the conservation of momentum for the fields. If we have a 
charged particle and push on it for awhile, there will be some momentum in the 
electromagnetic field. Momentum must have been poured into the field somehow. 
Therefore there must have been a force pushing on the electron in order to get it 
going — a force in addition to that required by its mechanical inertia, a force due 
to its electromagnetic interaction. And there must be a corresponding force back 
on the “pusher.” But where does that force come from? 





Fig. 28-3. The self-force on an accelerating electron is not zero because of the 
retardation. (By dF we mean the force on a surface element da; by c/ 2 F we mean the 
force on the surface element da a from the charge on the surface element dap.) 


The picture is something like this. We can think of the electron as a charged 
sphere. When it is at rest, each piece of charge repels electrically each other piece, 
but the forces all balance in pairs, so that there is no net force. [See Fig. 28~3(a).] 
However, when the electron is being accelerated, the forces will no longer be in 
balance because of the fact that the electromagnetic influences take time to go 
from one piece to another. For instance, the force on the piece a in Fig. 28-3(b) 
from a piece on the opposite side depends on the position of fi at an earlier time, 
as shown. Both the magnitude and direction of the force depend on the motion 
of the charge. If the charge is accelerating, the forces on various parts of the 
electron might be as shown in Fig. 28-3(c). When all these forces are added up, 
they don’t cancel out. They would cancel for a uniform velocity, even though 
it looks at first glance as though the retardation would give an unbalanced force 
even for a uniform velocity. But it turns out that there is no net force unless the 
electron is being accelerated. With acceleration, if we look at the forces between 
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the various parts of the electron, action and reaction are not exactly equal, and 
the electron exerts a force on itself that tries to hold back the acceleration. It holds 
itself back by its own bootstraps. 

It is possible, but difficult, to calculate this self-reaction force; however, we 
don’t want to go into such an elaborate calculation here. We will tell you what 
the result is for the special case of relatively uncomplicated motion in one dimension, 
say x. Then, the self-force can be written in a series. The first term in the series 
depends on the acceleration x, the next term is proportional to x, and so on.* 
The result is 

2 ^ 2 2 
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where a and y are numerical coefficients of the order of 1. The coefficient a of 
the x term depends on what charge distribution is assumed; if the charge is dis- 
tributed uniformly on a sphere, then a = 2/3. So there is a term, proportional 
to the acceleration, which varies inversely as the radius a of the electron and agrees 
exactly with the value we got in Eq. (28.4) for m c i cc . If the charge distribution is 
chosen to be different, so that a is changed, the fraction 2/3 in Eq. (28.4) would 
be changed in the same way The term in x is independent of the assumed radius 
a, and also of the assumed distribution of the charge; its coefficient is always 2/3. 
The next term is proportional to the radius a, and its coefficient y depends on the 
charge distribution. You will notice that if we let the electron radius a go to zero, 
the last term (and all higher terms) will go to zero; the second term remains con- 
stant, but the first term — the electromagnetic mass — goes to infinity. And we can 
see that the infinity arises because of the force of one part of the electron on another 
— because we have allowed what is perhaps a silly thing, the possibility of the 
“point” electron acting on itself. 


28-5 Attempts to modify the Maxwell theory 

We would like now to discuss how it might be possible to modify Maxwell’s 
theory of electrodynamics so that the idea of an electron as a simple point charge 
could be maintained. Many attempts have been made, and some of the theories 
were even able to arrange things so that all the electron mass was electromagnetic. 
But all of these theories have died. It is still interesting to discuss some of the 
possibilities that have been suggested — to see the struggles of the human mind. 

We started out our theory of electricity by talking about the interaction of 
one charge with another. Then we made up a theory of these interacting charges 
and ended up with a field theory. We believe it so much that we allow it to tell 
us about the force of one part of an electron on another. Perhaps the entire diffi- 
culty is that electrons do not act on themselves; perhaps we are making too great 
an extrapolation from the interaction of separate electrons to the idea that an 
electron interacts with itself. Therefore some theories have been proposed in which 
the possibility that an electron acts on itself is ruled out. Then there is no longer 
the infinity due to the self-action. Also, there is no longer any electromagnetic 
mass associated with the particle; all the mass is back to being mechanical, but 
there are new difficulties in the theory. 

We must say immediately that such theories require a modification of the 
idea of the electromagnetic field. You remember we said at the start that the force 
on a particle at any point was determined by just two quantities — E and B. If 
we abandon the “self-force” this can no longer be true, because if there is an elec- 
tron in a certain place, the force isn’t given by the total E and B , but by only those 
parts due to other charges. So we have to keep track always of how much of E 
and B is due to the charge on which you are calculating the force and how much 
is due to the other charges. This makes the theory much more elaborate, but it 
gets rid of the difficulty of the infinity. 


* We are using the notation: x = dx/dt , x = d 2 x/dt 2 , x = d 3 x/dt 3 , etc. 
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So we can, if we want to, say that there is no such thing as the electron acting 
upon itself, and throw away the whole set of forces in Eq. (28.9). However, we 
have then thrown away the baby with the bath! Because the second term in Eq. 
(28.9), the term in x, is needed. That force does something very definite. If you 
throw it away, you’re in trouble again. When we accelerate a charge, it radiates 
electromagnetic waves, so it loses energy. Therefore, to accelerate a charge, we 
must require more force than is required to accelerate a neutral object of the same 
mass; otherwise energy wouldn’t be conserved. The rate at which we do work on 
an accelerating charge must be equal to the rate of loss of energy per second by 
radiation. We have talked about this effect before — it is called the radiation re- 
sistance. We still have to answer the question: Where does the extra force, against 
which we must do this work, come from? When a big antenna is radiating, the 
forces come from the influence of one part of the antenna current on another. 
For a single accelerating electron radiating into otherwise empty space, there 
would seem to be only one place the force could come from — the action of one 
part of the electron on another part. 

We found back in Chapter 32 of VoL I that an oscillating charge radiates 
energy at the rate 


dW = 2 e 2 (x ) 2 
dt 3 c 3 


(28.10) 


Let’s see what we get for the rate of doing work on an electron against the boot- 
strap force of Eq. (28.9). The rate of work is the force times the velocity, or F x : 

dW e 2 2 e 2 

= c + ■ ■ • (28.11) 
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The first term is proportional to dx 2 /dt , and therefore just corresponds to the rate 
of change of the kinetic energy \ mv 2 associated with the electromagnetic mass. 
The second term should correspond to the radiated power in Eq. (28.10). But it 
is different. The discrepancy comes from the fact that the term in Eq. (28.11) is 
generally true, whereas Eq. (28.10) is right only for an oscillating charge. We can 
show that the two are equivalent if the motion of the charge is periodic. To do 
that, we rewrite the second term of Eq. (28.11) as 


2e^d_ 
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which is just an algebraic transformation. If the motion of the electron is periodic, 
the quantity xx returns periodically to the same value, so that if we take the 
average of its time derivative, we get zero. The second term, however, is always 
positive (it's a square), so its average is also positive This term gives the net work 
done and is just equal to Eq. (28.10). 

The term in x of the bootstrap force is required in order to have energy 
conservation in radiating systems, and we can’t throw it away. It was, in fact, one 
of the triumphs of Lorentz to show that there is such a force and that it comes from 
the action of the electron on itself. We must believe in the idea of the action of the 
electron on itself, and we need the term in x. The problem is how we can get that 
term without getting the first term in Eq. (28.9), which gives all the trouble. We 
don’t know how. You see that the classical electron theory has pushed itself 
into a tight corner. 

There have been several other attempts to modify the laws in order to straighten 
the thing out. One way, proposed by Born and Infeld, is to change the Maxwell 
equations in a complicated way so that they are no longer linear. Then the electro- 
magnetic energy and momentum can be made to come out finite. But the laws 
they suggest predict phenomena which have never been observed. Their theory 
also suffers from another difficulty we will come to later, which is common to all 
the attempts to avoid the troubles we have described. 

The following peculiar possibility was suggested by Dirac. He said: Let’s 
admit that an electron acts on itself through the second term in Eq. (28.9) but not 
through the first. He then had an ingenious idea for getting rid of one but not the 
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other. Look, he said, we made a special assumption when we took only the 
retarded wave solutions of Maxwell’s equations; if we were to take the advanced 
waves instead, we would get something different. The formula for the self-force 
would be 

2 t 2 2 
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ac 2 3 c 2 c 4 

This equation is just like Eq. (28.9) except for the sign of the second term — and 
some higher terms— of the series [Changing from retarded to advanced waves 
is just changing the sign of the delay which, it is not hard to see, is equivalent to 
changing the sign of / everywhere. The only effect on Eq. (28.9) is to change the 
sign of all the odd time derivatives.] So, Dirac said, let’s make the new rule that 
an electron acts on itself by one-half the difference of the retarded and advanced 
fields which it produces. The difference of Eqs. (28.9) and (28.12), divided by two, 
is then 

2 e 2 

F = — ^ — x + higher terms. 

In all the higher terms, the radius a appears to some positive power in the numera- 
tor. Therefore, when we go to the limit of a point charge, we get only the one 
term — just what is needed. In this way, Dirac got the radiation resistance force 
and none of the inertial forces. There is no electromagnetic mass, and the classical 
theory is saved — but at the expense of an arbitrary assumption about the self-force. 

The arbitrariness of the extra assumption of Dirac was removed, to some ex- 
tent at least, by Wheeler and Feynman, who proposed a still stranger theory 
They suggest that point charges interact only with other charges, but that the inter- 
action is half through the advanced and half through the retarded waves. It turns 
out, most surprisingly, that in most situations you won’t see any effects of the 
advanced waves, but they do have the effect of producing just the radiation re- 
action force. The radiation resistance is not due to the electron acting on itself, 
but from the following peculiar effect. When an electron is accelerated at the 
rime /, it shakes all the other charges in the world at a later time t' = t + r/c 
(where r is the distance to the other charge), because of the retarded waves. But 
then these other charges react back on the original electron through their advanced 
waves, which will arrive at the time t", equal to f minus r/c , which is, of course, 
just t. (They also react back with their retarded waves too, but that just corre- 
sponds to the normal “reflected” waves.) The combination of the advanced and 
retarded waves means that at the instant it is accelerated an oscillating charge 
feels a force from all the charges that are “going to” absorb its radiated waves. 
You see what tight knots people have gotten into in trying to get a theory of the 
electron! 

We’ll describe now still another kind of theory, to show the kind of things 
that people think of when they are stuck. This is another modification of the laws 
of electrodynamics, proposed by Bopp. You realize that once you decide to change 
the equations of electromagnetism you can start anywhere you want. You can 
change the force law for an electron, or you can change the Maxwell equations 
(as we saw in the examples we have described), or you can make a change some- 
where else. One possibility is to change the formulas that give the potentials in 
terms of the charges and currents. One of our formulas has been that the potentials 
at some point are given by the current density (or charge) at each other point at an 
earlier time Using our four-vector notation for the potentials, we write 

*<'• 0 - sb <» is) 

Bopp’s beautifully simple idea is that: Maybe the trouble is in the 1 fr factor in 
the integral. Suppose we were to start out by assuming only that the potential at 
one point depends on the charge density at any other point as some function of 
the distance between the points, say as f{r\f). The total potential at point (1) 
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will then be given by the integral of j j* times this function over all space: 

4.(1) = f M2) f(r 12 ) dV 2 . 

That’s all. No differential equation, nothing else. Well, one more thing. We also 
ask that the result should be relativistically invariant. So by “distance” we should 
take the invariant “distance” between two points in space-time. This distance 
squared (within a sign which doesn’t matter) is 

4 = c\h - i 2 ) 2 - r 2 12 

= c 2 (/i - t 2 ) 2 - ( X ! - x 2 ) 2 - (Ji - y 2 ) 2 - (Zi - Z 2 ) 2 . (28.14) 


So, for a relativistically invariant theory, we should take some function of the 
magnitude of s x2 , or what is the same thing, some function of s\ 2 . So Bopp’s 
theory is that 

4(1. h) = / M2, t 2 )F(s? 2 ) dv 2 dt 2 . (28.15) 

(The integral must, of course, be over the four-dimensional volume di 2 dx 2 dy 2 dz 2 ) 
All that remains is to choose a suitable function for F. We assume only one 
thing about F — that it is very small except when its argument is near zero — so that a 
graph of F would be a curve like the one in Fig. 28-4. It is a narrow spike with a 
finite area centered at s 2 = 0, and with a width which we can say is roughly a 2 . 
We can say, crudely, that when we calculate the potential at point (1), only those 
points (2) produce any appreciable effect if s\ 2 = c 2 (t 2 — 1 1) 2 — r\ 2 is within 
of zero. We can indicate this by saying that Fis important only for 

s\ 2 = c 2 (t 1 — t 2 ) 2 — r \ 2 « (28.16) 


You can make it more mathematical if you want to, but that’s the idea. 

Now suppose that a is very small in comparison with the size of ordinary 
objects like motors, generators, and the like so that for normal problems r 12 » a. 
Then Eq. (28.16) says that charges contribute to the integral of Eq. (28.15) only 
when t x — t 2 is in the small range 




Since a 2 /r\ 2 « 1, the square root can be approximated by 1 =*= a 2 /lr\ 2 , so 



a 2 \ r 12 ^ a 2 

2r2 !2 1 C 2F\ 2 C 


Fig. 28-4. The function F(s 2 ) used in 
the nonlocal theory of Bopp. 


What is the significance? This result says that the only times t 2 that are im- 
portant in the integral of A M are those which differ from the time t x , at which we 
want the potential, by the delay r 12 /c — with a negligible correction so long as 
r X2 a. In other words, this theory of Bopp approaches the Maxwell theory — so 
long as we are far away from any particular charge — in the sense that it gives the 
retarded wave effects. 

We can, in fact, see approximately what the integral of Eq. (28.15) is going 
to give. If we integrate first over t 2 from —00 to +00 — keeping r X2 fixed — then 
^2 is also going to go from — 00 to + 00 . The integral will all come from / 2 ’ s in 
a small interval of width At 2 ~ 2 X a 2 /2rx 2 c, centered at t x — rx 2 /c. Say 
that the function F(s 2 ) has the value K at s 2 = 0; then the integral over t 2 gives 
approximately Kj^At^ or 

K# A. 

c r 12 

We should, of course, take the value of j j, at t 2 = t x — r\ 2 /c , so that Eq. (28.15) 
becomes 

^ [ AR' i r _ '■;/< : > dVt . 

c J r l2 

28-9 



If we pick K = q 2 c/4TT€ 0 a 2 , we are right back to the retarded potential solution 
of Maxwell’s equations — including automatically the 1 /r dependence! And it 
all came out of the simple proposition that the potential at one point in space- 
time depends on the current density at all other points in space-time, but with 
a weighting factor that is some narrow function of the four-dimensional distance 
between the two points. This theory again predicts a finite electromagnetic mass 
for the electron, and the energy and mass have the right relation for the relativity 
theory They must, because the theory is rela tivistically invariant from the start, 
and everything seems to be all right 

There is, however, one fundamental objection to this theory and to all the 
other theories we have described. All particles we know obey the laws of quantum 
mechanics, so a quantum-mechanical modification of electrodynamics has to be 
made Light behaves like photons It isn’t 100 percent like the Maxwell theory. 
So the electrodynamic theory has to be changed. We have already mentioned that 
it might be a waste of time to work so hard to straighten out the classical theory, 
because it could turn out that in quantum electrodynamics the dilliculties will 
disappear or may be resolved in some other fashion. But the difficulties do not 
disappear in quantum electrodynamics. That is one of the reasons that people 
have spent so much effort trying to straighten out the classical difficulties, hoping 
that if they could straighten out the classical difficulty and then make the quantum 
modifications, everything would be straightened out The Maxwell theory still 
has the difficulties after the quantum mechanics modifications are made. 

The quantum effects do make some changes — the formula for the mass is 
modified, and Planck’s constant h appears — but the answer still comes out infinite 
unless you cut off an integration somehow — just as we had to stop the classical 
integrals at r = a. And the answers depend on how you stop the integrals. We 
cannot, unfortunately, demonstrate for you here that the difficulties are really 
basically the same, because we have developed so little of the theory of quantum 
mechanics and even less of quantum electrodynamics. So you must just take our 
word that the quantized theory of Maxwell’s electrodynamics gives an infinite 
mass for a point electron. 

It turns out, however, that nobody has ever succeeded in making a self-consistent 
quantum theory out of any of the modified theories. Born and Infeld’s ideas have 
never been satisfactorily made into a quantum theory The theories with the ad- 
vanced and retarded waves of Dirac, or of Wheeler and Feynman, have never 
been made into a satisfactory quantum theory The theoiy of Bopp has never been 
made into a satisfactory quantum theory. So today, there is no known solution 
to this problem. We do not know how to make a consistent theory — including 
the quantum mechanics — which does not produce an infinity for the self-energy of 
an electron, or any point charge. And at the same time, there is no satisfactory 
theory that describes a non-point charge. It’s an unsolved problem 

In case you are deciding to rush off to make a theory in which the action of an 
electron on itself is completely removed, so that electromagnetic mass is no longer 
meaningful, and then to make a quantum theory of it, you should be warned that 
you are certain to be in trouble. There is definite experimental evidence of the 
existence of electromagnetic mertia^-there is evidence that some of the mass of 
charged particles is electromagnetic in origin 

It used to be said in the older books that since Nature will obviously not pre- 
sent us with two particles — one neutral and the other charged, but otherwise the 
same — we will never be able to tell how much of the mass is electromagnetic and 
how much is mechanical But it turns out that Nature has been kind enough to 
present us with just such objects, so that by comparing the observed mass of the 
charged one with the observed mass of the neutral one, we can tell whether there 
is any electromagnetic mass. For example, there are the neutrons and protons. 
They interact with tremendous forces — the nuclear forces—whose origin is un- 
known However, as we have already described, the nuclear forces have one re- 
markable property. So far as they are concerned, the neutron and proton are 
exactly the same The nuclear forces between neutron and neutron, neutron and 
proton, and proton and proton are all identical as far as we can tell Only the little 
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electromagnetic forces are different; electrically the proton and neutron are as 
different as night and day. This is just what we wanted There are two particles, 
identical from the point of view of the strong interactions, but different electrically. 
And they have a small difference in mass. The mass difference between the proton 
and the neutron — expressed as the difference in the rest-energy me 2 in units of 
Mev — is about 1.3 Mev, which is about 2.6 times the electron mass. The classical 
theory would then predict a radius of about ^ to \ the classical electron radius, 
or about 10“ ll * cm. Of course, one should really use the quantum theory, but by 
some strange accident, all the constants — 27 t's and K s, etc. — come out so that the 
quantum theory gives roughly the same radius as the classical theory. The only 
trouble is that the sign is wrong! The neutron is heavier than the proton. 


Table 28-1 
Particle Masses 


Particle 

Charge 

(electronic) 

Mass 

(Mev) 

Am* 

(Mev) 

n (neutron) 

0 

939.5 


p (proton) 

+i 

938.2 

-1.3 

x (x-meson) 

0 

135.0 



±1 

139.6 

+4.6 

K (K-meson) 

0 

497.8 



±1 

493.9 

-3.9 

2 (sigma) 

0 

1191.5 



+1 

1189.4 

-2.1 


-1 

1196.0 

+4.5 


* Am — (mass of charged) — (mass of neutral). 


Nature has also given us several other pairs — or triplets — of particles which 
appear to be exactly the same except for their electrical charge. They interact with 
protons and neutrons, through the so-called “strong” interactions of the nuclear 
forces. In such interactions, the particles of a given kind — say the 7 r-mesons — 
behave in every way like one object except for their electrical charge. In Table 
28-1 we give a list of such particles, together with their measured masses. The 
charged 7r-mesons — positive or negative — have a mass of 139.6 Mev, but the 
neutral x-meson is 4 6 Mev lighter. We believe that this mass difference is electro- 
magnetic; it would correspond to a particle radius of 3 to 4 X 10“ 14 cm. You will 
see from the table that the mass differences of the other particles are usually of the 
same general size. 

Now the size of these particles can be determined by other methods, for in- 
stance by the diameters they appear to have in high-energy collisions. So the 
electromagnetic mass seems to be in general agreement with electromagnetic 
theory, if we stop our integrals of the field energy at the same radius obtained by 
these other methods. That’s why we believe that the differences do represent 
electromagnetic mass. 

You are no doubt worried about the different signs of the mass differences in 
the table. It is easy to see why the charged ones should be heavier than the neutral 
ones. But what about those pairs like the proton and the neutron, where the mea- 
sured mass comes out the other way? Well, it turns out that these particles are 
complicated, and the computation of the electromagnetic mass must be more 
elaborate for them. For instance, although the neutron has no net charge, it does 
have a charge distribution inside it — it is only the net charge that is zero In fact, 
we believe that the neutron looks — at least sometimes — like a proton with a nega- 
tive x-meson in a “cloud” around it, as shown in Fig. 28-5. Although the neutron 
is “neutral,” because its total charge is zero, there are still electromagnetic energies 


Negative 

x-meson 



PROTON 


Fig. 28-5. A neutron may exist, at 
times, as a proton surrounded by a 
negative x-meson. 
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(for example, it has a magnetic moment), so it’s not easy to tell the sign of the 
electromagnetic mass difference without a detailed theory of the internal structure. 

We only wish to emphasize here the following points* (1) the electromagnetic 
theory predicts the existence of an electromagnetic mass, but it also falls on its 
face in doing so, because it does not produce a consistent theory — and the same is 
true with the quantum modifications; (2) there is experimental evidence for the 
existence of electromagnetic mass; and (3) all these masses are roughly the same 
as the mass of an electron. So we come back again to the original idea of Lorentz — 
maybe all the mass of an electron is purely electromagnetic, maybe the whole 
0 511 Mev is due to electrodynamics Is it or isn't it? We haven't got a theory, so 
we cannot say. 

We must mention one more piece of information, which is the most annoying 
There is another particle in the world called a muon — or ^u-meson — which, so far 
as we can tell, differs in no way whatsoever from an electron except for its mass. It 
acts in every way like an electron: it interacts with neutrinos and with the electro- 
magnetic field, and it has no nuclear forces. It does nothing different from what 
an electron does— at least, nothing which cannot be understood as merely a con- 
sequence of its higher mass (206 77 times the electron mass). Theiefore, whenever 
someone finally gets the explanation of the mass of an electron, he will then have 
the puzzle of where a muon gets its mass. Why? Because whatever the electron 
does, the muon does the same — so the mass ought to come out the same There 
are those who believe faithfully in the idea that the muon and the electron are the 
same particle and that, in the final theory of the mass, the formula for the mass 
will be a quadratic equation with two roots — one for each particle There are also 
those who propose it will be a transcendental equation with an infinite number of 
roots, and who are engaged in guessing what the masses of the other particles in 
the series must be, and why these particles haven't been discovered yet 


28-6 The nuclear force field 

We would like to make some further remarks about the part of the mass of 
nuclear particles that is not electromagnetic Where does this other large fraction 
come from 9 There are other forces besides electrodynamics — like nuclear forces — 
that have their own field theories, although no one knows whether the current 
theories are right These theories also predict a field energy which gives the 
nuclear particles a mass term analogous to electromagnetic mass; we could call 
it the “7r-mesic-field-mass." It is presumably very large, because the forces are 
great, and it is the possible origin of the mass of the heavy particles. But the 
meson field theories are still in a most rudimentary state Even with the well- 
developed theory of electromagnetism, we found it impossible to get beyond first 
base in explaining the electron mass. With the theory of the mesons, we strike out 
We may take a moment to outline the theory of the mesons, because of its 
interesting connection with electrodynamics. In electrodynamics, the field can be 
described in terms of a four-potential that satisfies the equation 

□ 2 -- sources. 

Now we have seen that pieces of the field can be radiated away so that they exist 
separated from the sources. These are the photons of light, and they are described 
by a differential equation without sources: 

□ % = o. 

People have argued that the field of nuclear forces ought also to have its own 
“photons" — they would presumably be the 7r-mesons — and that they should be 
described by an analogous differential equation. (Because of the weakness of the 
human brain, we can’t think of something really new, so we argue by analogy 
with what we know.) So the meson equation might be 

□ 2 0 - 0, 
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where d> could be a different four-vector or perhaps a scalar. It turns out that the 
pion has no polarization, so <£ should be a scalar. With the simple equation 
□ 2 0 = 0, the meson field would vary with distance from a source as 1/r 2 , just 
as the electric field does. But we know that nuclear forces have much shorter dis- 
tances of action, so the simple equation won’t work. There is one way we can 
change things without disrupting the relativistic invariance: we can add or subtract 
from the D’AIembertian a constant, times (f > . So Yukawa suggested that the free 
quanta of the nuclear force field might obey the equation 

□ 2 0 - h 2 4> = 0 , (28 17 ) 

where p 2 is a constant — that is, an invariant scalar. (Since □ 2 is a scalar differ- 
ential operator in four dimensions, its invariance is unchanged if we add another 
scalar to it.) 

Let’s see what Eq. (28.17) gives for the nuclear force when things are not 
changing with time. We want a spherically symmetric solution of 

V 2 <f> — fJL 2 (f> ~ 0 


around some point source at, say, the origin. If <f> depends only on r , we know that 


So we have the equation 


or 


v2 * = % <"»• 

\ ^2 00) - M 2 0 = 0 

£ (rtf) = mV)- 


Thinking of as our dependent variable, this is an equation we have seen many 
times. It’s solution is 


r<t> = Ke ±tir . 


Clearly, <j> cannot become infinite for large r, so the + sign in the exponent is 
ruled out. The solution is 


<j> = K 


e 


— fir 


r 


(28.18) 


This function is called the Yukawa potential . For an attractive force, K is a negative 
number whose magnitude must be adjusted to fit the experimentally observed 
strength of the forces. 

The Yukawa potential of the nuclear forces dies off more rapidly than 1/r 
by the exponential factor. The potential — and therefore the force — falls to zero 
much more rapidly than 1/r for distances beyond l//x, as shown in Fig. 28-6 
The “range” of nuclear forces is much less than the “range” of electrostatic forces 
It is found experimentally that the nuclear forces do not extend beyond about 
10“ 1 cm, so p ~ 1 0 1 5 m “ 1 

Finally, let’s look at the free-wave solution of Eq (28 17) If we substitute 



Fig. 28-6. The Yukawa 
e~» r /r, compared with the 
potential 1/r. 


into Eq. (28 17), we get that 


0 = 4> 0 e l(ut ~ kz) 


co 
c 2 


k 2 - M 2 = 0. 


Relating frequency to energy and wave number to momentum, as we did at the 
end of Chapter 36 of Vol. i, we get that 



which says that the Yukawa “photon” has a mass equal to ph/c. If we use for p 
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potential 

Coulomb 



the estimate 10 15 m _l , which gives the observed range of the nuclear forces, the 
mass comes out to 3 X 10~ 25 gm, or 170 Mev, which is roughly the observed mass 
of the 7r-meson. So, by an analogy with electrodynamics, we would say that the 
7r-meson is the “photon” of the nuclear force field But now we have pushed the 
ideas of electrodynamics into regions where they may not really be valid — we have 
gone beyond electrodynamics to the problem of the nuclear forces. 
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29 


The Motion of Charges in Electric 
and Magnetic Fields 


29-1 Motion in a uniform electric or magnetic field 

We want now to describe — mainly in a qualitative way — the motions of 
charges in various circumstances. Most of the interesting phenomena in which 
charges are moving in fields occur in very complicated situations, with many, 
many charges all interacting with each other For instance, when an electromagne- 
tic wave goes through a block of material or a plasma, billions and billions of 
charges are interacting with the wave and with each other. We will come to such 
problems later, but now we just want to discuss the much simpler problem of the 
motions of a single charge in a given field. We can then disregard all other charges 
— except, of course, those charges and currents which exist somewhere to produce 
the fields we will assume. 

We should probably ask first about the motion of a particle m a uniform elec- 
tric field At low velocities, the motion is not particularly interesting — it is just a 
uniform acceleration in the direction of the field. However, if the particle picks 
up enough energy to become relativistic, then the motion gets more complicated. 
But we will leave the solution for that case for you to play with 

Next, we consider the motion in a uniform magnetic field with zero electric 
field. We have already solved this problem — one solution is that the particle goes 
in a circle The magnetic force qv X B is always at right angles to the motion, 
so dp/d t is perpendicular to p and has the magnitude vp/R, where R is the radius 
of the circle. 

F= qi)B = V -l. 

The radius of the circular orbit is then 



That is only one possibility. If the particle has a component of its motion 
along the field direction, that motion is constant, since there can be no component 
of the magnetic force in the direction of the field. The general motion of a particle 
in a uniform magnetic field is a constant velocity parallel to B and a circular motion 
at right angles to B — the trajectory is a cylindrical helix (Fig. 29-1). The radius 
of the helix is given by Eq (29 1) if we replace p by p±, the component of mo- 
mentum at right angles to the field- 


29-2 Momentum analysis 

A uniform magnetic field is often used in making a ‘‘momentum analyzer," 
or “momentum spectrometer/’ for high-energy charged particles. Suppose that 
charged particles are shot into a uniform magnetic field at the point A in Fig. 
29-2(a), the magnetic field being perpendicular to the plane of the drawing. Each 
particle will go into an orbit which is a circle whose radius is proportional to its 
momentum. If all the particles enter perpendicular to the edge of the field, they 
will leave the field at a distance x (from A) which is proportional to their momentum 
p . A counter placed at some point such as C will detect only those particles whose 
momentum is in an interval Ap near the momentum p = qBx/2 

It is, of course, not necessary that the particles go through 180° before they 
are counted, but the so-called “180° spectrometer” has a special property It is not 
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Fig. 29-1. Motion of a particle in a 
uniform magnetic field. 
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(b) 


Fig. 29-2. A uniform-field, momen- 
tum spectrometer with 1 80° focusing: 
(a) different momenta; (b) different 
angles. (The magnetic field is directed 
perpendicular to the plane of the figure.) 



D 


Fig. 29-3. An axial-field spectrom- 
eter. 



Fig. 29-4. An ellipsoidal coil with 
equal currents in each axial interval Ax 
produces a uniform magnetic field inside. 


necessary that all the particles enter at right angles to the field edge. Figure 29-2(b) 
shows the trajectories of three particles, all with the same momentum but entering 
the field at different angles. You see that they take different trajectories, but all 
leave the field very close to the point C. We say that there is a “focus.” Such a 
focusing property has the advantage that larger angles can be accepted at A — 
although some limit is usually imposed, as shown in the figure. A larger angular 
acceptance usually means that more particles are counted in a given time, decreasing 
the time required for a given measurement. 

By varying the magnetic field, or moving the counter along in x, or by using 
many counters to cover a range of x, the “spectrum” of momenta in the incoming 
beam can be measured. [By the “momentum spectrum”/ (/?), we mean that the 
number of particles with momenta between p and (p + dp) is f(p)dp.] Such 
measurements have been made, for example, to determine the distribution of 
energies in the d-decay of various nuclei. 

There are many other forms of momentum spectrometers, but we will describe 
just one more, which has an especially large solid angle of acceptance. It is based 
on the helical orbits in a uniform field, like the one shown in Fig. 29-1. Let’s 
think of a cylindrical coordinate system — p, 0 , z — set up with the z-axis along the 
direction of the field. If a particle is emitted from the origin at some angle a 
with respect to the z-axis, it will move along a spiral whose equation is 

p = a sin kz , 0 = bz , 

where a, b , and k are parameters you can easily work out in terms of p, a, and the 
magnetic field B. If we plot the distance p from the axis as a function of z for a 
given momentum, but for several starting angles, we will get curves like the solid 
ones drawn in Fig. 29-3. (Remember that this is just a kind of projection of a 
helical trajectory.) When the angle between the axis and the starting direction 
is larger, the peak value of p is large but the longitudinal velocity is less, so the 
trajectories for different angles tend to come to a kind of “focus” near the point 
A in the figure. If we put a narrow aperture of A , particles with a range of initial 
angles can still get through and pass on to the axis, where they can be counted by 
the long detector D. 

Particles which leave the source at the origin with a higher momentum but 
at the same angles, follow the paths shown by the broken lines and do not get 
through the aperture at A. So the apparatus selects a small interval of momenta 
The advantage over the first spectrometer described is that the aperture A — and 
the aperture A f — can be an annulus, so that particles which leave the source in a 
rather large solid angle are accepted. A large fraction of the particles from the 
source are used — an important advantage for weak sources or for very precise 
measurements. 

One pays a price for this advantage, however, because a large volume of 
uniform magnetic field is required, and this is usually only practical for low-energy 
particles One way of making a uniform field, you remember, is to wind a coil on 
a sphere, with a surface current density proportional to the sine of the angle 
You can also show that the same thing is true for an ellipsoid of rotation. So such 
spectrometers are often made by winding an elliptical coil on a wooden (or alumi- 
num) frame. All that is required is that the current in each interval of axial distance 
Ax be the same, as shown in Fig 29-4 


29-3 An electrostatic lens 

Particle focusing has many applications. For instance, the electrons that leave 
the cathode in a TV picture tube are brought to a focus at the screen — to make a 
fine spot. In this case, one wants to take electrons all of the same energy but with 
different initial angles and bring them together in a small spot. The problem is 
like focusing light with a lens, and devices which do the corresponding job for 
particles are also called lenses. 
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Fig. 29-5. An electrostatic lens. The field lines shown are “lines of 
force/ 1 that is, of q£. 


One example of an electron lens is sketched in Fig 29-5. It is an “electro- 
static” lens whose operation depends on the electric field between two adjacent 
electrodes. Its operation can be understood by considering what happens to a 
parallel beam that enters from the left. When the electrons arrive at the region a, 
they feel a force with a sidewise component and get a certain impulse that bends them 
toward the axis You might think that they would get an equal and opposite im- 
pulse in the region b , but that is not so. By the time the electrons reach b they have 
gained energy and so spend less tune in the region b . The forces are the same, but 
the time is shorter, so the impulse is less. In going through the regions a and 6, 
there is a net axial impulse, and the electrons are bent toward a common point 
In leaving the high-voltage region, the particles get another kick toward the axis 
The force is outward in region c and inward in region d, but the particles stay longer 
in the latter region, so there is again a net impulse For distances not too far from 
the axis, the total impulse through the lens is proportional to the distance from the 
axis (Can you see why 9 ), and this is just the condition necessary for lens-type 
focusing. 

You can use the same arguments to show that there is focusing if the 
potential of the middle electrode is either positive or negative with respect to the 
other two. Electrostatic lenses of this type are commonly used in cathode-ray 
tubes and in some electron microscopes. 



29-4 A magnetic lens 

Another kind of lens — often found in electron microscopes — is the magnetic 
lens sketched schematically in Fig. 29-6. A cyhndrically symmetric electromagnet 
has very sharp circular pole tips which produce a strong, nonuniform field in a 
small region. Electrons which travel vertically through this region are focused 
You can understand the mechanism by looking at the magnified view of the pole-tip 
region drawn in Fig. 29-7. Consider two electrons a and b that leave the source 
S at some angle with respect to the axis. As electron a reaches the beginning of the 
field, it is deflected away from you by the horizontal component of the field But 
then it will have a lateral velocity, so that when it passes through the strong vertical 
field, it will get an impulse toward the axis. Its lateral motion is taken out by the 
magnetic force as it leaves the field, so the net effect is an impulse toward the 
axis, plus a “rotation" about the axis. All the forces on particle b are opposite, 
so it also is deflected toward the axis. In the figure, the divergent electrons are 
brought into parallel paths. The action is like a lens with an object at the focal 
point. Another similar lens upstream can be used to focus the electrons back to a 
single point, making an image of the source S. 


29-5 The electron microscope 



Fig. 29-7. Electron motion in the 
magnetic lens. 


You know that electron microscopes can “see” objects too small to be seen 
by optical microscopes. We discussed in Chapter 30 of Vol. I the basic limitations 
of any optical system due to diffraction of the lens opening If a lens opening sub- 
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Fig, 29-8. The resolution of a micro- 
scope is limited by the angle subtended 
from the source. 



Fig. 29-9. Spherical aberration of 
a lens. 


tends the angle 2 6 from a source (see Fig. 29-8), two neighboring spots at the source 
cannot be seen as separate if they are closer than about 

sin 6 

where X is the wavelength of the light. With the best optical microscope, 6 ap- 
proaches the theoretical limit of 90°, so 6 is about equal to X, or approximately 
5000 angstroms. 

The same limitation would also apply to an electron microscope, but there 
the wavelength is — for 50-kilovolt electrons — about 0.05 angstrom. If one could 
use a lens opening of near 30°, it would be possible to see objects only ^ of an 
angstrom apart. Since the atoms m molecules are typically 1 or 2 angstroms apart, 
we could get photographs of molecules. Biology would be easy; we would have 
a photograph of the DNA structure. What a tremendous thing that would be! 
Most of present-day research in molecular biology is an attempt to figure out the 
shapes of complex organic molecules. If we could only see them! 

Unfortunately, the best resolving power that has been achieved in an electron 
microscope is more like 20 angstroms. The reason is that no one has yet designed 
a lens with a large opening. All lenses have “spherical aberration,” which means 
that rays at large angles from the axis have a different point of focus than the rays 
nearer the axis, as shown in Fig. 29-9 By special techniques, optical microscope 
lenses can be made with a negligible spherical aberration, but no one has yet 
been able to make an electron lens which avoids spherical aberration. 

In fact, one can show that any electrostatic or magnetic lens of the types we 
have described must have an irreducible amount of spherical aberration. This 
aberration — together with diffraction — limits the resolving power of electron 
microscopes to their present value. 

The limitation we have mentioned does not apply to electric and magnetic 
fields which are not axially symmetric or which are not constant in time. Perhaps 
some day someone will think of a new kind of electron lens that will overcome the 
inherent aberration of the simple electron lens. Then we will be able to photograph 
atoms directly. Perhaps one day chemical compounds will be analyzed by looking 
at the positions of the atoms rather than by looking at the color of some pre- 
cipitate! 



Fig. 29-10. Particle motion in a 
slightly nonuniform field. 


29-6 Accelerator guide fields 

Magnetic fields are also used to produce special particle trajectories in high- 
energy particle accelerators. Machines like the cyclotron and synchrotron bring 
particles to high energies by passing the particles repeatedly through a strong 
electric field. The particles are held in their cyclic orbits by a magnetic field. 

We have seen that a particle in a uniform magnetic field will go in a circular 
orbit. This, however, is true only for a perfectly uniform field. Imagine a field 
B which is nearly uniform over a large area but which is slightly stronger in one 
region than in another. If we put a particle of momentum p in this field, it will go 
in a nearly circular orbit with the radius R = p/qB. The radius of curvature will, 
however, be slightly smaller in the region where the field is stronger. The orbit is 
not a closed circle but will “walk” through the field, as shown in Fig. 29-10. 
We can, if we wish, consider that the slight “error” in the field produces an extra 
angular kick which sends the particle off on a new track. If the particles are to make 
millions of revolutions in an accelerator, some kind of “radial focusing” is needed 
which will tend to keep the trajectories close to some design orbit. 

Another difficulty with a uniform field is that the particles do not remain in a 
plane If they start out with the slightest angle — or are given a slight angle by 
any small error in the field — they will go in a helical path that will eventually take 
them into the magnet pole or the ceiling or floor of the vacuum tank Some 
arrangement must be made to inhibit such vertical drifts; the field must provide 
“vertical focusing” as well as radial focusing. 
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Fig. 29-11. Radial motion of a par- Fig. 29-12. Radial motion of a par- Fig. 29-13. Radial motion of a par- 
ticle in a magnetic field with a large tide in a magnetic field with a small tide in a magnetic field with a large 
positive slope. negative slope. negative slope. 


One would, at first, guess that radial focusing could be provided by making a 
magnetic field which increases with increasing distance from the center of the design 
path Then if a particle goes out to a large radius, it will be in a stronger field which 
will bend it back toward the correct radius. If it goes to too small a radius, the 
bending will be less, and it will be returned toward the design radius. If a particle 
is once started at some angle with respect to the ideal circle, it will oscillate about 
the ideal circular orbit, as shown in Fig. 29-11. The radial focusing would keep the 
particles near the circular path. 

Actually there is still some radial focusing even with the opposite field slope 
This can happen if the radius of curvature of the trajectory does not increase more 
rapidly than the increase in the distance of the particle from the center of the field. 
The particle orbits will be as drawn in Fig 29-12. If the gradient of the field is too 
large, however, the orbits will not return to the design radius but will spiral inward 
or outward, as shown in Fig. 29-13. 

We usually describe the slope of the field in terms of the “relative gradient” 
or field index , n : 


n 


dB/B 

dr/r 


(29.2) 


A guide field gives radial focusing if this relative gradient is greater than — 1. 

A radial field gradient will also produce vertical forces on the particles 
Suppose we have a field that is stronger nearer to the center of the orbit and weaker 
at the outside. A vertical cross section of the magnet at right angles to the orbit 
might be as shown in Fig. 29-14. (For protons the orbits would be coming out of 
the page ) If the field is to be stronger to the left and weaker to the right, the lines 
of the magnetic field must be curved as shown . We can see that this must be so 
by using the law that the circulation of B is zero in free space. If we take coordinates 
as shown in the figure, then 


(V X B)y = 


or 


3B X __ dB z 
dz dx 


(29.3) 



Fig. 29-14. A vertical guide field as 
seen in a cross section perpendicular to 
the orbits. 


Since we assume that dBjdx is negative, there must be an equal negative dB x /dz . 
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If the “nominal” plane of the orbit is a plane of symmetry where B x = 0, then the 
radial component B x will be negative above the plane and positive below The lines 
must be curved as shown. 

Such a field will have vertical focusing properties. Imagine a proton that is 
travelling more or less parallel to the central orbit but above it. The horizontal 
component of B will exert a downward force on it. If the proton is below the central 
orbit, the force is reversed. So there is an effective “restoring force” toward the 
central orbit. From our arguments there will be vertical focusing, provided that 
the vertical field decreases with increasing radius ; but if the field gradient is positive, 
there will be “vertical defocusing.” So for vertical focusing, the field index n must 
be less than zero. We found above that for radial focusing n had to be greater 
than — 1. The two conditions together give the condition that 

— 1 < n < 0 

if the particles are to be kept in stable orbits. In cyclotrons, values very near zero 
are used; in betatrons and synchrotrons, the value n = —0.6 is typically used. 


29-7 Alternating-gradient focusing 

Such small values of n give rather “weak” focusing. It is clear that much more 
effective radial focusing would be given by a large positive gradient (n » 1), but 
then the vertical forces would be strongly defocusing Similarly, large negative 
slopes (n « —1) would give stronger vertical forces but would cause radial de- 
focusing. It was realized about 10 years ago, however, that a force that alternates 
between strong focusing and strong defocusing can still have a net focusing force 

To explain how alternating-gradient focusing works, we will first describe the 
operation of a quadrupole lens, which is based on the same principle. Imagine that 
a uniform negative magnetic field is added to the field of Fig 29-14, with the 
strength adjusted to make zero field at the orbit. The resulting field— for small 
displacements from the neutral point — would be like the field shown in Fig 29-15 
Such a four-pole magnet is called a “quadrupole lens.” A positive particle that 
enters (from the reader) to the right or left of the center is pushed back toward 
the center. If the particle enters above or below, it is pushed away from the center. 
This is a horizontal focusing lens If the horizontal gradient is reversed — as can 
be done by reversing all the polarities — the signs of all the forces are reversed 
and we have a vertical focusing lens, as in Fig. 29-16 For such lenses, the field 
strength — and therefore the focusing forces — increase linearly with the distance 
of the lens from the axis. 



Fig. 29-15. A horizontal focusing 
quadrupole lens. 
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(b) 


Fig. 29-17. Horizontal and vertical focusing with a pair of quadrupole lenses. 


Now imagine that two such lenses are placed in series. If a particle enters with 
some horizontal displacement from the axis, as shown in Fig. 29-17(a), it will be 
deflected toward the axis in the first lens. When it arrives at the second lens it is 
closer to the axis, so the force outward is less and the outward deflection is less 
There is a net bending toward the axis; the average effect is horizontally focusing 
On the other hand, if we look at a particle which enters off the axis in the vertical 
direction, the path will be as shown in Fig. 29-1 7(b). The particle is first deflected 
away from the axis, but then it arrives at the second lens with a larger displacement, 
feels a stronger force, and so is bent toward the axis. Again the net effect is focusing. 
Thus a pair of quadrupole lenses acts independently for horizontal and vertical 
motion — very much like an optical lens. Quadrupole lenses are used to form and 
control beams of particles in much the same way that optical lenses are used for 
light beams. 

We should point out that an alternating-gradient system does not always 
produce focusing. If the gradients are too large (in relation to the particle momen- 
tum or to the spacing between the lenses), the net effect can be a defocusing one. 
You can see how that could happen if you imagine that the spacing between the 
two lenses of Fig. 29-17 were increased, say, by a factor of three or four. 

Let’s return now to the synchrotron guide magnet. We can consider that it 
consists of an alternating sequence of “positive” and “negative” lenses with a 
superimposed uniform field. The uniform field serves to bend the particles, on the 
average, in a horizontal circle (with no effect on the vertical motion), and the 
alternating lenses act on any particles that might tend to go astray — pushing them 
always toward the central orbit (on the average). 

There is a nice mechanical analog which demonstrates that a force which 
alternates between a “focusing” force and a “defocusing” force can have a net 
“focusing" effect. Imagine a mechanical “pendulum” which consists of a solid 
rod with a weight on the end, suspended from a pivot which is arranged to be moved 
rapidly up and down by a motor driven crank. Such a pendulum has two equili- 
brium positions. Besides the normal, downward-hanging position, the pendulum 
is also in equilibrium “hanging upward” — with its “bob” above the pivot! Such a 
pendulum is drawn in Fig. 29-18. 

By the following argument you can see that the vertical pivot motion is 
equivalent to an alternating focusing force. When the pivot is accelerated down- 
ward, the “bob” tends to move inward, as indicated in Fig. 29-19. When the 
pivot is accelerated upward, the effect is reversed. The force restoring the “bob” 
toward the axis alternates, but the average effect is a force toward the axis. So the 
pendulum will swing back and forth about a neutral position which is just opposite 
the normal one. 

There is, of course, a much easier way of keeping a pendulum upside down, 
and that is by balancing it on your finger* But try to balance two independent 
sticks on the same finger\ Or one stick with your eyes closed! Balancing involves 
making a correction for what is going wrong. And this is not possible, in general, 
if there are several things going wrong at once. In a synchrotron there are billions 
of particles going around together, each one of which may start out with a different 
“error.” The kind of focusing we have been describing works on them all. 



Fig. 29-18. A pendulum with an 
oscillating pivot can have a stable posi- 
tion with the bob above the pivot. 



Fig. 29-19. A downward accelera- 
tion of the pivot causes the pendulum to 
move toward the vertical. 
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Fig. 29-20. Path of a particle in 
crossed electric and magnetic fields. 


29-8 Motion in crossed electric and magnetic fields 

So far we have talked about particles in electric fields only or in magnetic 
fields only. There are some interesting effects when there are both kinds of fields 
at the same time. Suppose we have a uniform magnetic field B and an electric 
field E at right angles. Particles that start out perpendicular to B will move in a 
curve like the one in Fig 29-20 (The figure is a plane curve, not a helix!) We can 
understand this motion qualitatively. When the particle (assumed positive) moves 
in the direction of E , it picks up speed, and so it is bent less by the magnetic field. 
When it is going against the F'-field, it loses speed and is continually bent more by 
the magnetic field. The net effect is that it has an average “drift” in the direction 
of E X B. 

We can. in fact, show that the motion is a uniform circular motion super- 
imposed on a uniform sidewise motion at the speed v d = E/B — the trajectory in 
Fig. 29-20 is a cycloid. Imagine an observer who is moving to the right at a con- 
stant speed. In his frame our magnetic field gets transformed to a new magnetic 
field plus an electric field in the downward direction. If he has just the right speed, 
his total electric field will be zero, and he will see the electron going in a circle. So 
the motion we see is a circular motion, plus a translation at the drift speed 
v,i = E/B The motion of electrons in crossed electric and magnetic fields is the 
basis of the magnetron tubes, l e., oscillators used for generating microwave energy. 

There are many other interesting examples of particle motions in electric and 
magnetic fields — such as the orbits of the electrons and protons trapped in the 
Van Allen belts — but we do not, unfortunately, have the time to deal with them here 
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30 


The Internal Geometry of Crystals 


30-1 The internal geometry of crystals 

We have finished the study of the basic laws of electricity and magnetism, and 
we are now going to study the electromagnetic properties of matter. We begin 
by describing solids — that is, crystals. When the atoms of matter are not moving 
around very much, they get stuck together and arrange themselves in a configura- 
tion with as low an energy as possible. If the atoms in a certain place have found a 
pattern which seems to be of low energy, then the atoms somewhere else will 
probably make the same arrangement. For these reasons, we have in a solid ma- 
terial a repetitive pattern of atoms 

In other words, the conditions in a crystal are this way: The environment of a 
particular atom in a crystal has a certain arrangement, and if you look at the same 
kind of an atom at another place farther along, you will find one whose surround- 
ings are exactly the same. If you pick an atom farther along by the same distance, 
you will find the conditions exactly the same once more. The pattern is repeated 
over and over again — and, of course, in three dimensions. 

Imagine the problem of designing a wallpaper — or a cloth, or some geometric 
design for a plane area — in which you are supposed to have a design element which 
repeats and repeats and repeats, so that you can make the area as large as you want. 
This is the two-dimensional analog of a problem which a crystal solves in three 
dimensions. For example, Fig. 30-1 (a) shows a common kind of wallpapei design. 
There is a single element repeated in a pattern that can go on forever. The geometric 
characteristics of this wallpaper design, considering only its repetition properties 
and not worrying about the geometry of the flower itself or its artistic merit, are 
contained in Fig. 30-1 (b). If you start at any point, you can find the corresponding 
point by moving the distance a along the direction of arrow 1 You can also get 
to a corresponding point if you move the distance b in the direction of the other 
arrow. There are, of course, many other directions. You can go, for example, 
from point a to point f3 and reach a corresponding position, but such a step 
can be considered as a combination of a step along direction 1, followed by a step 
along direction 2. One of the basic properties of the pattern can be described by 
the two shortest steps to nearby equal positions. By “equal” positions we mean that 
if you were to stand in any one of them and look around you, you would see exactly 
the same thing as if you were to stand in another one. That’s the fundamental 
property of a crystal. The only difference is that a crystal is a three-dimensional 
arrangement instead of a two-dimensional arrangement; and naturally, instead of 
flowers, each element of the lattice is some kind of an arrangement of atoms — 
perhaps six hydrogen atoms and two carbon atoms — in some kind of pattern 
The pattern of atoms in a crystal can be found out experimentally by x-ray diffrac- 
tion. We have mentioned this method briefly before, and won’t say any more now 
except that the precise arrangement of the atoms in space has been worked out for 
most simple crystals and also for some fairly complex ones. 

The internal pattern of a crystal shows up in several ways. First, the binding 
strength of the atoms in certain directions is usually stronger than in other direc- 
tions. This means that there are certain planes through the crystal where it is more 
easily broken than others. They are called the cleavage planes. If you crack a 
crystal with a knife blade it will often split apart along such a plane. Second, the 
internal structure often appears at the surface because of the way the crystal was 
formed. Imagine a crystal being deposited out of a solution. There are the atoms 
floating around in the solution and finally settling down when they find a position 
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Fig. 30-1. A repeating pattern in 
two dimensions. 
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Fig. 30-2. Natural crystals: (a) 

quartz, (b) sodium chloride, (c) mica. 



Fig. 30-3. The lattice of a molecular 
crystal. 


of lowest energy. (It’s as if the wallpaper got made by flowers drifting around 
until one drifted accidentally into place and got stuck, and then the next, and the 
next so that the pattern gradually grows.) You can appreciate that there will be 
certain directions in which it will grow at a different speed than in other directions, 
thereby growing into some kind of geometrical shape. Because of such effects, the 
outside surfaces of many crystals show some of the character of the internal 
arrangement of the atoms 

For example, Fig. 30-2(a) shows the shape of a typical quartz crystal whose 
internal pattern is hexagonal. If you look closely at such a crystal, you will notice 
that the outside does not make a very good hexagon because the sides are not all 
of equal length — they are, in fact, often very unequal. But in one respect it is a 
very good hexagon: the angles between the faces are exactly 120°. Clearly, the size 
of any particular face is an accident of the growth, but the angles are a representa- 
tion of the internal geometry So every crystal of quartz has a different shape, 
even though the angles between corresponding faces are always the same. 

The internal geometry of a crystal of sodium chloride is also evident from its 
external shape Figure 30-2(b) shows the shape of a typical grain of salt. Again 
the crystal is not a perfect cube, but the faces are exactly at right angles to one 
another 

A more complicated crystal is mica, which has the shape shown m Fig 30-2(c) 
It is a highly anisotropic crystal, as is easily seen from the fact that it is very tough 
if you try to pull it apart in one direction (horizontally in the figure), but very easy 
to split by pulling apart in the other direction (vertically) It has commonly been 
used to obtain very tough, thin sheets Mica and quartz are two examples of 
natural minerals containing silica. A third example of a mineral with silica is 
asbestos, which has the interesting property that it is easily pulled apart in two 
directions but not in the third. It appears to be made of very strong, linear fibers. 

30-2 Chemical bonds in crystals 

The mechanical properties of crystals clearly depend on the kind of chemical 
bindings between the atoms. The strikingly different strength of mica along differ- 
ent directions depends on the kinds of interatomic binding in the different directions. 
You have already learned in chemistry, no doubt, about the different kinds of 
chemical bonds First, there are ionic bonds, as we have already discussed for 
sodium chloride Roughly speaking, the sodium atoms have lost an electron and 
become positive ions, the chlorine atoms have gained an electron and become 
negative ions. The positive and negative ions are arranged in a three-dimensional 
checkerboard and are held together by electrical forces. 

The covalent bond — in which electrons are shared between two atoms — is 
more common and is usually very strong. In a diamond, for example, the carbon 
atoms have covalent bonds in all four directions to the nearest neighbors, so the 
crystal is very hard indeed. There is also covalent bonding between silicon and 
oxygen in a quartz crystal, but there the bond is really only partially covalent. 
Because there is not complete sharing of the electrons, the atoms are partly charged, 
and the crystal is somewhat ionic Nature is not as simple as we try to make it; 
there are really all possible gradations between covalent and ionic bonding 

A sugar crystal has still another kind of binding In it there are large molecules 
in which the atoms are held strongly together by covalent bonds, so that the mole- 
cule is a tough structure. But since the strong bonds are completely satisfied, there 
are only relatively weak attractions between the separate, individual molecules 
In such molecular crystals the molecules keep their individual identity, so to speak, 
and the internal arrangement might be as shown in Fig. 30-3. Since the molecules 
are not held strongly to each other, the crystals are easy to break They are quite 
different from something like diamond, which is really one giant molecule that 
cannot be broken anywhere without disrupting strong covalent bonds. Panffin 
is another example of a molecular crystal. 

An extreme example of a molecular crystal occurs in a substance like solid 
argon. There is very little attraction between the atoms — each atom is a completely 
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saturated monatomic molecule. But at very low temperatures, the thermal motion 
is very small, so the slight interatomic forces can cause the atoms to settle down into 
a regular array like a pile of closely packed spheres. 

The metals form a completely different class of substances The bonding is 
of an entirely different kind. In a metal the bonding is not between adjacent atoms 
but is a property of the whole crystal. The valence electrons are not attached to 
one atom or to a pair of atoms but are shared throughout the crystal. Each atom 
contributes an electron to a universal pool of electrons, and the atomic positive 
ions reside in the sea of negative electrons. The electron sea holds the ions together 
like some kind of glue. 

In the metals, since there are no special bonds in any particular direction, there 
is no strong directionality in the binding. They are still crystalline, however, be- 
cause the total energy is lowest when the atomic ions are arranged in some definite 
array — although the energy of the preferred arrangement is not usually much lower 
than other possible ones. To a first approximation, the atoms of many metals are 
like small spheres packed in as tightly as possible. 


30-3 The growth of crystals 

Try to imagine the natural formation of crystals in the earth. In the earth's 
surface there is a big mixture of all kinds of atoms. They are being continually 
churned about by volcanic action, by wind, and by water — continually being moved 
about and mixed. Yet, by some trick, silicon atoms gradually begin to find each 
other, and to find oxygen atoms, to make silica. One atom at a time is added to 
the others to build up a crystal — the mixture gets unmixed. And somewhere 
nearby, sodium and chlorine atoms are finding each other and building up a crystal 
of salt. 

How does it happen that once a crystal is started, it permits only a particular 
kind of atom to join on 9 It happens because the whole system is working toward 
the lowest possible energy. A growing crystal will accept a new atom if it is going 
to make the energy as low as possible. But how does it know that a silicon — or 
an oxygen — atom at some particular spot is going to result in the lowest possible 
energy? It does it by trial and error. In the liquid, all of the atoms are in perpetual 
motion. Each atom bounces against its neighbors about 10 times every second. 
If it hits against the right spot of growing crystal, it has a somewhat smaller chance 
of jumping off again if the energy is low. By continually testing over periods of 
millions of years at a rate of 10 13 tests per second, the atoms gradually build up 
at the places where they find their lowest energy. Eventually they grow into big 
crystals. 



30-4 Crystal lattices 

The arrangement of the atoms in a crystal — the crystal lattice — can take on 
many geometric forms. We would like to describe first the simplest lattices, which 
are characteristic of most of the metals and of the solid form of the inert gases. 
They are the cubic lattices which can occur in two forms: the body-centered cubic, 
shown in Fig. 30-4(a), and the face-centered cubic shown in Fig. 30-4(b). The 
drawings show, of course, only one cube of the lattice; you are to imagine that the 
pattern is repeated indefinitely in three dimensions. Also, to make the drawing 
clearer, only the “centers” of the atoms are shown. In an actual crystal, the atoms 
are more like spheres in contact with each other. The dark and light spheres in 
the drawings may, in general, stand for different kinds of atoms or may be the 
same kind. For instance, iron has a body-centered cubic lattice at low temperatures, 
but a face-centered cubic lattice at higher temperatures. The physical properties 
are quite different in the two crystalline forms. 

How do such forms come about? Imagine that you have the problem of 
packing spherical atoms together as tightly as possible. One way would be to start 
by making a layer in a “hexagonal close-packed array,” as shown in Fig. 30-5(a) 
Then you could build up a second layer like the first, but displaced horizontally, 



Fig. 30—4. The unit cell of cubic 
crystals: (a) body-centered, (b) face- 
centered. 
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Fig. 30-6. 
seen from one 


Is this a hexagon or a cube 
corner 2 


as shown in Fig. 30-5(b) Next, you can put on the third layer. But notice' 
There are two distinct ways of placing the third layer If you start the third layer 
by placing an atom at A in Fig 30-5(b), each atom in the third layer is directly 
above an atom of the bottom layer On the other hand, if you start the third layer 
by putting an atom at the position B , the atoms of the third layer will be centered 
at points exactly in the middle of a triangle formed by three atoms of the bottom 
layer. Any other starting place is equivalent to A or B , so there are only two ways 
of placing the third layer. 

If the third layer has an atom at point B , the crystal lattice is a face-centered 
cubic — but seen at an angle It seems funny that starting with hexagons you can 
end up with cubes. But notice that a cube looked at from a corner has a hexagonal 
outline For instance, Fig. 30-6 could represent a plane hexagon or a cube seen in 
perspective! 

If a third layer is added to Fig. 30-5(b) by starting with an atom at A , there is 
no cubical structure, and the lattice has instead only a hexagonal symmetry. It is 
clear that both possibilities we have described are equally close-packed 

Some metals — for example, copper and silver — choose the first alternative, 
the face-centered cubic. Others — for example, beryllium and magnesium — choose 
the other alternatives; they form hexagonal crystals. Clearly, which crystal lattice 
appears cannot depend only on the packing of little spheres, but must also be deter- 
mined in part by other factors In particular, it depends on the slight remaining 
angular dependence of the interatomic forces (or, in the case of the metals, on the 
energy of the electron pool) You will, no doubt, learn all about such things in 
your chemistry courses. 


30-5 Symmetries in two dimensions 

We would now like to discuss some of the properties of crystals from the point 
of view of their internal symmetries. The main feature of a crystal is that if you 
start at one atom and move to a corresponding atom one lattice unit away, you 
are again in the same kind of an environment. That’s the fundamental proposition. 
But if you were an atom, there would be another kind of change that could take 
you again to the same environment — that is, another possible ‘'symmetry.” 
Figure 30-7 (a) shows another possible “wallpaper-type” design (though one you 
have probably never seen). Suppose we compare the environments for points 
A and B You might, at first, think that they are the same — but not quite Points 
C and D are equivalent to A, but the environment of B is like that of A only if the 
surroundings are reversed, as in a mirror reflection. 
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Fig. 30—7. A pattern of high symmetry. 


There are other kinds of “equivalent” points in the pattern. For instance, 
the points E and F have the “same” environments except that one is rotated 90° 
with respect to the other. The pattern is quite special. A rotation of 90° — or any 
multiple of it — about a vertex such as A gives the same pattern all over again. A 
crystal with such a structure would have square corners on the outside, but inside 
it is more complicated than a simple cube. 

Now that we have described some special examples, let’s try to figure out all 
the possible symmetries a crystal can have. First, we consider what happens in a 
plane. A plane lattice can be defined by the two so-called primitive vectors that go 
from one point of the lattice to the two nearest equivalent points. The two vectors 
1 and 2 are the primitive vectors of the lattice of Fig. 30-1. The two vectors a and 
b of Fig 30-7 (a) are the primitive vectors of the pattern there We could, of course, 
equally well replace a by — a, or b by — b . Since a and b are equal in magnitude 
and at right angles, a rotation of 90° turns a into Z>, and b into — a, giving the same 
lattice once again. 

We see that there are lattices which have a “four-sided” symmetry. And we 
have described earlier a close-packed array based on a hexagon which could have 
a six-sided symmetry. A rotation of the array of circles in Fig. 30-5(a) by an angle 
of 60° about the center of any circle brings the pattern back to itself. 

What other kinds of rotational symmetry are there? Can we have, for example, 
a fivefold or an eightfold rotational symmetry 9 It is easy to see that they are 
impossible. The only symmetry with more sides than four is a stx-sided symmetry . 
First, let’s show that more than sixfold symmetry is impossible. Suppose we try to 
imagine a lattice with two equal primitive vectors with an enclosed angle less than 
60°, as in Fig. 30-8 (a). We are to suppose that points B and C are equivalent 
to A , and that a and b are the two shortest vectors from A to its equivalent neighbors 
But that is clearly wrong, because the distance between B and C is shorter than from 
either one to A. There must be a neighbor at D equivalent to A which is closer 
than B or C. We should have chosen b ' as one of our primitive vectors. So the 
angle between the two primitive vectors must be 60° or larger. Octagonal symmetry 
is not possible. 

What about fivefold symmetry? If we assume that the primitive vectors a 
and b have equal lengths and make an angle of 2tt/5 = 72°, as in Fig. 30-8(b), 
then there should also be an equivalent lattice point at D, at 72° from C. But the 
vector b f from E to D is then less than b, so b is not a primitive vector. There can 
be no fivefold symmetry. The only possibilities that do not get us into this kind 
of difficulty are 0 = 60°, 90°, or 120°. Zero or 180° are also clearly possible. 
One way of stating our result is that the pattern can be left unchanged by a rotation 
of one full turn (no change at all), one-half of a turn, one-third, one-fourth, or 
one-sixth of a turn And those are all the possible rotational symmetries in a 
plane — a total of five. If Q = lir/n, we speak of an ‘Vz-fold” symmetry. We say 


/ 

/ 

/ 


D / C 



to) 


C 



(b> 


Fig. 30-8. (a) Rotational symmetries 

greater than sixfold are not possible, 
(bl Fivefold rotational symmetry is not 
possible. 
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Fig. 30-9. Symmetry under inversion. Pattern (b) is unchanged if R — > —R, but pattern 
(a) is changed. In three dimensions pattern (d) is symmetric under an inversion but (c) is not. 


that a pattern with n equal to 4 or to 6 has a “higher symmetry” than one with 
n equal to 1 or to 2. 

Returning to Fig. 30-7(a), we see that the pattern has a fourfold rotational 
symmetry. We have drawn in Fig. 30-7(b) another design which has the same 
symmetry properties as part (a). The little comma-like figures arc asymmetric 
objects which serve to define the symmetry of the design inside of each square 
Notice that the commas are reversed in alternate squares, so that the unit cell is 
larger than one of the small squares. If there were no commas, the pattern would 
still have fourfold symmetry, but the unit cell would be smaller. The patterns of 
Fig. 30-7 also have other symmetry properties. For instance, a reflection about any 
of the broken lines R-R reproduces the same pattern 

The patterns of Fig. 30-7 have still another kind of symmetiy. If the pattern 
is reflected about the line Y-Y and shifted one square to the right (or left), we get 
back the original pattern The line Y-Y is called a "glide" line. 

These are all the possible symmetries in two dimensions There is one more 
spatial symmetry operation which is equivalent in two dimensions to a 180° rotation, 
but which is a quite distinct operation in three dimensions. It is inversion . By an 
inversion we mean that any point at the vector displacement R from some origin 
[for instance, the point A in Fig. 30-9(b)] is moved to the point at — R 

An inversion of pattern (a) of Fig. 30-9 produces a new pattern, but an in- 
version of pattern (b) reproduces the same pattern. Fot a two-dimensional pattern 
(as you can see from the figure), an inversion of the pattern (b) through the point 
A is equivalent to a rotation of 180° about the same point Suppose, however, 
we make the pattern in Fig. 30-9(b) three dimensional by imagining that the little 
6’s and 9’s each have an “arrow” pointing out of the page . After an inversion in 
three dimensions all the arrows will be reversed, so the pattern is not leproduced. 
If we indicate the heads and tails of the arrows by dots and crosses, respectively, 
we can make a three-dimensional pattern, as in Fig 30-9(c), which is not symmetric 
under an inversion, or we can make a pattern like the one shown in (d), which 
does have such a symmetry. Notice that it is not possible to imitate a three- 
dimensional inversion by any combination of rotations. 

If we characterize the “symmetry” of a pattern — or lattice — by the kinds of 
symmetry operations we have been describing, it turns out that for two dimensions 
17 distinct patterns are possible We have drawn one pattern of the lowest possible 
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symmetry in Fig. 30-1, and one of high symmetry in Fig. 30-7. We will leave you 
with the game of trying to figure out all of the 17 possible patterns. 

It is peculiar how few of the 17 possible patterns are used in making wall- 
paper and fabrics. One always sees the same three or four basic patterns. Is this 
because of a lack of imagination of designers, or because many of the possible 
patterns are not pleasing to the eye? 

30-6 Symmetries in three dimensions 

So far we have talked only about patterns in two dimensions. What we are 
really interested in, however, are patterns of atoms in three dimensions. First, 
it is clear that a three-dimensional crystal will have three primitive vectors. If 
we then ask about the possible symmetry operations in three dimensions, we find 
that there are 230 different possible symmetries! For some purposes, these 230 
types can be grouped into seven classes, which are drawn in Fig. 30-10. The lattice 
with the least symmetry is called the triclinic . Its unit cell is a parallelepiped. The 
primitive vectors are of different lengths, and no two of the angles between them are 
equal. There is no possibility of any rotational or reflection symmetry. There are, 
however, still two possible symmetries — the unit cell is, or is not, changed by an 
inversion through the vertex. (By an inversion in three dimensions, we again mean 
that spatial displacements R are replaced by — R — in other words, that (x, y, z) 
goes into ( — x, ~y, — z) So the triclinic lattice has only two possible symmetries, 
unless there is some special relation among the primitive vectors. For example, if 
all the vectors are equal and are separated by equal angles, one has the trigonal 
lattice shown in the figure. This figure can have an additional symmetry, it may 
be unchanged by a rotation about the long, body diagonal. 

If one of the primitive vectors, say c, is at right angles to the other two, we 
get a monoclinic unit cell. A new symmetry is possible — a rotation by 180° about c 
The hexagonal cell is a special case in which the vectors a and b are equal and the 
angle between them is 60°, so that a rotation of 60°, or 1 20°, or 1 80° about the vector 
c repeats the same lattice (for certain internal symmetries). 

If all three primitive vectors are at right angles, but of different lengths, we 
get the orthorhombic cell. The figure is symmetric for rotations of 180° about the 
three axes. Higher-order symmetries are possible with the iciragonal cell, which 
has all right angles and two equal primitive vectors Finally, there is the cubic 
cell, which is the most symmetric of all. 

The point of all this discussion about symmetries is that the internal symmetries 
of the crystals show up — sometimes in subtle ways — in the macroscopic physical 
properties of the crystal For instance, a crystal will, in general, have a tensor 
electric polarizability. If we describe the tensor in terms of the ellipsoid of polari- 
zation, we should expect that some of the crystal symmetries should show up 
also in the ellipsoid. For example, a cubic crystal is symmetric with respect to 
a rotation of 90° about any one of three orthogonal directions. Clearly, the 
only ellipsoid with this property is a sphere. A cubic crystal must be an isotropic 
dielectric . 

On the other hand, a tetragonal crystal has a fourfold rotational symmetry 
Its ellipsoid must have two of its principal axes equal, and the third must be 
parallel to the axis of the crystal. Similarly, since the orthorhombic crystal has 
twofold rotational symmetry about three orthogonal axes, its axes must coincide 
with the axes of the polarization ellipsoid. In a like manner, one of the axes of a 
monochmc crystal must be parallel to one of the principal axes of the ellipsoid, 
though we can’t say anything about the other axes. Since a triclinic crystal has no 
rotational symmetry, the ellipsoid can have any orientation at all. 

As you can see, we can make a big game of figuring out the possible sym- 
metries and relating them to the possible physical tensors. We have considered 
only the polarization tensor, but things get more complicated for others — for 
instance, for the tensor of elasticity. There is a branch of mathematics called 
“group theory” that deals with such subjects, but usually you can figure out what 
you want with common sense. 
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Fig. 30-1 1. Slippage of crystal planes. 



Fig. 30-12. A photograph of a small 
crystal of copper after stretching. [Cour- 
tesy of S. S. Brenner, Senior Scientist, 
United States Steel Research Center, 
Monroeville, Pa ] 
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30-7 The strength of metals 

We have said that metals usually have a simple cubic crystal structure; we 
want now to discuss their mechanical properties — which depend on this structure. 
Metals are, generally speaking, very “soft,” because it is easy to slide one layer 
of the crystal over the next. You may think: “That’s ridiculous; metals are strong.” 
Not so, a single crystal of a metal can be distorted very easily. 

Suppose we look at two layers of a crystal subjected to a shear force, as shown 
in the diagram of Fig. 30-1 1(a). You might at first think the whole layer would 
resist motion until the force was big enough to push the whole layer “over the 
hump,” so that it shifted one notch to the left. Although slipping does occur along 
a plane, it doesn’t happen that way (If it did, you would calculate that the metal 
is much stronger than it really is.) What happens is more like one atom going at a 
time; first the atom on the left makes its jump, then the next, and so on, as indicated 
in Fig. 30-1 1(b). In effect it is the vacant space between two atoms that quickly 
travels to the right, with the net result that the whole second layer has moved 
over one atomic spacing. The slipping goes this way because it takes much less 
energy to lift one atom at a time over the hump than to lift a whole row. Once 
the force is enough to start the process, it goes the rest of the way very fast 

It turns out that in a real crystal, slipping will occur repeatedly at one plane, 
then will stop there and start at some other plane. The details of why it starts and 
stops are quite mysterious. It is, in fact, quite strange that successive regions of 
slip are often fairly evenly spaced. Figure 30-12 shows a photograph of a tiny, 
thm copper crystal that has been stretched. You can see the various planes where 
slipping has occurred. 

The sudden slipping of individual crystal planes is quite apparent if you take 
a piece of tin wire that has large crystals in it and stretch it while holding it next 
to your ear. You can hear a rush of “ticks” as the planes snap to their new posi- 
tions, one after the other. 

The problem of having a “missing” atom in one row is somewhat more difficult 
than it might appear from Fig. 30-11. When there are more layers, the situation 
must be something like that shown in Fig. 30-13. Such an imperfection in a crystal 
is called a dislocation . It is presumed that such dislocations are either present 
when the crystal was formed or are generated at some notch or crack at the surface 
Once they are produced, they can move relatively freely through the crystal The 
gross distortions result from the motions of many of such dislocations. 

Dislocations can move freely— that is, they require little extia energy — so 
long as the rest of the crystal has a perfect lattice. But they may get “stuck" if they 
encounter some other kind of imperfection in the crystal. If it takes a lot of energy 
for them to pass the imperfection, they will be stopped. This is precisely the 
mechanism that gives strength to imperfect metal crystals. Pure iron crystals are 
quite soft, but a small concentration of impurity atoms may cause enough imper- 
fections to effectively immobilize the dislocations. As you know, steel, which is 
primarily iron, is very hard. To make steel, a small amount of carbon is dissolved 
in the iron melt; if the melt is cooled rapidly, the carbon precipitates out in little 
grains, making many microscopic distortions in the lattice. The dislocations can 
no longer move about, and the metal is hard. 

Pure copper is very soft, but can be “work-hardened." This is done by ham- 
mering on it or bending it back and forth. In this case, many new dislocations of 
various kinds are made which interfere with one another, cutting down their 
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mobility. Perhaps you’ve seen the trick of taking a bar of “dead soft” copper 
and gently bending it around someone’s wrist as a bracelet. In the process, it 
becomes work-hardened and cannot easily be unbent again 1 A work-hardened 
metal like copper can be made soft again by annealing at a high temperature. 
The thermal motion of the atoms “irons out” the dislocations and makes large 
single crystals again. We have, so far, described only the so-called slip dislocation. 
There are many other kinds, one of which is the screw dislocation shown in Fig. 
30-14. Such dislocations often play an important part in crystal growth. 



30-8 Dislocations and crystal growth 

One of the great puzzles for a long time was how crystals can possibly grow. 
We have described how it is that each atom might, by repeated testing, determine 
whether it was better to be in the crystal or not. But that means that each atom 
must find a place of low energy. However, an atom put on a new surface is only 
bound by one or two bonds from below, and doesn’t have the same energy it 
would have if it were placed in a corner, where it would have atoms on three sides 
Suppose we imagine a growing crystal as a stack of blocks, as shown in Fig. 30-15 
If we try a new block at, say, position A , it will have only one of the six neighbors 
it should ultimately get. With so many bonds lacking, its energy is not very low. 
It would be better off at position B, where it already has one-half of its quota of 
bonds. Crystals do indeed grow by attaching new atoms at places like B. 

What happens, though, when that line is finished? To start a new line, an 
atom must come to rest with only two sides attached, and that is again not very 
likely. Even if it did, what would happen when the layer was finished? How 
could a new layer get started? One answer is that the crystal prefers to grow at a 
dislocation, for instance around a screw dislocation like the one shown in Fig. 
30-14. As blocks are added to this crystal, there is always some place where there 
are three available bonds. The crystal prefers, therefore, to grow with a dislocation 
built in. Such a spiral pattern of growth is shown in Fig. 30-16, which is a photo- 
graph of a single crystal of paraffin. 


Fig. 30-14. A screw dislocation. 
[From Charles Kittel, Introduction to 
Solid State Physics , John Wiley and Sons, 
Inc., New York, 2nd ed., 1956.] 




Fig. 30-16. A paraffin crystal which 
has grown around a screw dislocation. 
[From Charles Kittel, fntroduct/on to So//d 
State Physics, John Wiley and Sons, Inc., 
New York, 2nd ed., 1956J 


30-9 The Bragg-Nye crystal model 

We cannot, of course, see what goes on with the individual atoms in a crystal. 
Also, as you realize by now, there are many complicated phenomena that are not 
easy to treat quantitatively. Sir Lawrence Bragg and J. F. Nye have devised a 
scheme for making a model of a metallic crystal which shows in a striking way 
many of the phenomena that are believed to occur in a real metal. In the following 
pages we have reproduced their original article, which describes their method and 
shows some of the results they obtained with it. (The article is reprinted from the 
Proceedings of the Royal Society of London , Vol. 190, September 1947, pp. 474-481 
— with the permission of the authors and of the Royal Society.) 
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A dynamical model of a crystal structure 

By Sir Lawrence Bragg, FRS. and J. F. Nye 
Cavendish Laboratory , University of Cambridge 

(Received 9 January 1947 — Read 19 June 1947) 

[Plates 8 to 21] 

The crystal structure of a metal is represented by an assemblage of bubbles, a millimetre or 
less in diameter, floating on the surface of a soap solution The bubbles are blown from a fine 
pipette beneath the surface with a constant air pressure, and are remarkably unif orm ui size 
They are held together by surface tension, either in a single layer on the surface or m a three- 
dimensional mass An assemblage may contain hundreds of thousands of bubbles and persists 
for an hour or more The assemblages show structures wmcti nave been supposed to exist 
m metals, and simulate effects which have been observed, such as gram boundaries, disloca- 
tions and other types of fault, slip, recrystallization, annealing, and strains due to ‘foreign’ 
atoms 

1 The bubble model 

Models of crystal structure have been described from time to time in which the 
atoms are represented by small floating or suspended magnets, or by circular HibItk 
floating on a water surface and held together by the forces of capillary attraction 
These models have certain disadvantages, for instance, m the case of floating objects 
in contact, frictional forces impede their free relative movement A more senous 
disadvantage is that the number of components is limited for a large number of 
components is required in order to approach the state of affairs in a real crystal 
The present paper describes the behaviour of a model m which the atoms are repre- 
sented by small bubbles from 2 0 to 0 1 mm m diameter floating on the surface of 
a soap solution These small bubbles are sufficiently persistent for experiments 
lasting an hour or more, they slide past each other without friction, and they can 
be produced in large numbers Some of the illustrations in this paper were taken 
from assemblages of bubbles numbering 100,000 or more The model most nearly 
represents the behaviour of a metal structure, because the bubbles are of one type 
only and are held together by a general capillary attraction which represents the 
binding force of the free electrons in the metal A bnef description of the model has 
been given in the Journal of Scientific Instruments (Bragg 19426) 



Flo u be 1 Apparatus for producing rafts of bubbles. 


2 Method of formation 

The bubbles are blown from a fine orifice, beneath the surface of a soap solution. 
We have had the best results with a solution the formula of which was given to us 
by Mr Green of the Royal Institution 15 2 c c of oleic acid (pure redistilled) is well 
shaken m 50 c c of distilled water This is mixed thoroughly with 73 c c of 10 % 
solution of tn-ethanolamine and the mixture made up to 200 c c To this is added 
164 c c of pure glycerine It is left to stand and the clear liquid is drawn off from 
below In some experiments this was diluted m three times its volume of water to 
Teduce viscosity The orifice of the jet is about 5 mm below the surface A constant 
air pressure of 50 to 200 cm of water is supplied by means of two Winchester 
flasks Normally the bubbles are remarkably uniform in size Occasionally they 
issue m an irregular manner, but this can be corrected by a change of jet or of pres- 
sure Unwanted bubbles can easdy be destroyed by playing a small flame over the 
surface Figure 1 shows the apparatus We have found it of advantage to blacken 
the bottom of the vessel, because details of structure, such as gram boundaries and 
dislocations, then show up more clearly 

Figure 2, plate 8, shows a portion of a raft ’ or two-dimensional crystal of bubbles 
Its regularity can be judged by looking at the figure in a glancmg direction The 
size of the bubbles vanes with the aperture, but does not appear to vary to any 
marked degree with the pressure or the depth of the orifice beneath the surface 
The main effect of increasing the pressure is to increase the rate of issue of the 
bubbles As an example, a thick-walled jet of 49 /1 bore with a pressure of 100 cm 
produced bubbles of 1*2 mm in diameter A thm-walled jet of 27 ft diameter and 
a pressure of 1 80 cm produced bubbles of 0 6 mm diameter It is convenient to 
refer to bubbles of 2 0 to 1 0 mm diameter as ‘large 1 bubbles, those from 0 8 to 
0*6 mm diameter as ‘medium’ bubbles, and those from 0*3 to 0*1 mm diameter 
as small ' bubbles, since their behaviour vanes with their size 



Figure 3 Apparatus for producing bubbles of small size 


With this apparatus we have not found it possible to reduce the size of the jet 
and so produce bubbles of smaller diameter than 0 6 mm As it was desired to expen- 
ment with very small bubbles, we had recourse to placmg the soap solution m a 
rotating vessel and introducing a fine jet as nearly as possible parallel to a stream 
line The bubbles are swept away as they form, and under steady conditions are 
reasonably uniform They issue at a rate of one thousand or more per second, givmg 
a high-pitched note The soap solution mounts up in a steep wall around the peri- 
meter of the vessel whde it is rotating but carries back most of the bubbles with it 
when rotation ceases With this device, illustrated in figure 3, bubbles down to 
0*12 mm m diameter can be obtained As an example, an orifice 38 y, across m a 
thm-walled jet, with a pressure of 190 cm of water, and a speed of the fluid of 
180 cm /sec past the orifice, produced bubbles of 0 14 mm diameter. In this case 
a dish of diameter 9*5 cm and speed of 6 rev /sec was used Figure 4, plate 8, is an 
enlarged picture of these ‘small’ bubbles and shows their degree of regularity, the 
pattern is not as perfect with a rotating as with a stationary vessel, the rows be in g 
seen to be slightly irregular when viewed m a glancmg direction 

These two-dimensional crystals show structures which have been supposed to 
exist in metals, and simulate effects which have been observed, such as gram 
boundaries, dislocations and other types of fault, slip, recrystalhzation, annealing, 
and strains due to ‘ foreign ’ atoms. 

3 Grain boundaries 

Figures 5 a, 56 and 5c, plates 9 and 10, show typical gram boundaries for bubbles 
of 1*87, 0 76 and 0*30 mm diameter respectively The width of the disturbed area 
at the boundary, where the bubbles have an irregular distribution, is m general 
greater the smaller the bubbles In figure 5a, which shows portions of several 
adjacent grams, bubbles at a boundary between two grams adhere definitely to one 
crystalline arrangement or the other In figure 5c there is a marked ‘Beilby layer’ 
between the two grams The small babbles, as will be seen, have a greater rigidity 
than the large ones, and this appears to give nse to more irregularity at the interface 

Separate gramB show up distinctly when photographs of polycrystalline rafts 
such as figures 5a to 5c, plates 9 and 10, and figures 12a to 12e, plates 14 to 16, 
are viewed obliquely. With suitable hghtmg, the floating raft of bubbles itself when 
viewed obliquely resembles a polished and etched metal in a remarkable way 

It often happens that some ‘impurity atoms’, or bubbles which are markedly 
larger or smaller than the average, are found in a polycrystalline raft, and when this 
is so a large proportion of them are situated at the gram boundaries It would be 
incorrect to say that the irregular babbles make their way to the boundaries, it is 
a defect of the model that no diffusion of bubbles through the structure can take 
place, mutual adjustments of neighbours alone being possible It appears that the 
boundaries tend to readjust themselves by the growth of one crystal at the expense 
of another till they pass through the irregular atoms 

4 Dislocations 

When a smgle crystal or polycrystalline raft is compressed, extended, or other- 
wise deformed it exhibits a behaviour very similar to that which has been pictured 
for metals subjected to strain Up to a certain limit the model is within its elastic 
range Beyond that point it yields by slip along one of the three equally inclined 
directions of closely packed rows Slip takes place by the bubbles in one row moving 
forward over those m the next row by an amount equal to the distaftce between 
neighbours. It is very interesting to watch this process taking place The 
movement is not simultaneous along the whole row but begins at one end with 
the appearance of a ‘dislocation’, where there is locally one more bubble in the 
rows on one side of the slip line as compared with those on the other This dis- 
location then runs along the shp line from one side of the crystal to the other, the 
final result bemg a shp by one ‘ mter-atomic ’ distance Such a process has been 
invoked by Orowan, by Polanyi and by Taylor to explam the small forces required 
to produce plastic gliding m metal structures The theory put forward by Taylor 
(1934) to explam the mechanism of plastic deformation of crystals considers the 
mutual action and equilibrium of such dislocations The bubbles afford a very 
sinking picture of what has been supposed to take place in the metal Sometimes 
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the dislocations run along quite slowly, taking a matter of seconds to cross a crystal, 
stationary dislocations also are to be seen m crystals which are not homogeneously 
strained They appear as short black lines, and can be seen in the series of photo- 
graphs, figures 12 a to 12 e, plates 14 to 16. When a poly crystal line raft is compressed, 
these dark lmes are seen to be dashing about in all directions across the crystals. 

Figures 6 a, 66 and 6 c, plates 10 and 11, show examples of dislocations In 
figure 6 a, where the diameter of the bubbles is 1 9 mm , the dislocation is very 
local, extending over about six bubbles In figure 66 (diameter 0 7 6 mm ) it extends 
over twelve bubbles, and in figure 6 c (diameter 0 30 mm ) its influence can be 
traced for a length of about fifty bubbles The greater rigidity of the small bubbles 
leads to longer dislocations. The study of any mass of bubbles shows, however, 
that there is not a standard length of dislocation for each size The length depends 
upon the nature of the strain m the crystal A boundary between two crystals with 
corresponding axes at approximately 30° (the maximum angle which can occur) 
may be regarded as a senes of dislocations in alternate rows, and in this case the 
dislocations are very short As the angle between the neighbouring crystals decreases, 
the dislocations occur at wider intervals and at the same time become longer, till 
one finally has single dislocations in a large body of perfect structure as shown m 
figures 6 a, 66 and 6 c 

Figure 7 , plate 11 , shows three parallel dislocations If we call them positive and 
negative (following Taylor) they are positive, negative, positive, reading from left 
to right The strip between the last two has three bubbles in excess, as can be seen 
by looking along the rows in a horizontal direction Figure 8 , plate 12 , shows a 
dislocation projecting from a gram boundary, an effect often observed. 

Figure 9 , plate 12 , shows a place where two bubbles take the place of one This 
may be regarded as a limiting case of positive and negative dislocations on neigh- 
bouring rows, with the compressive sides of the dislocations facmg each other. The 
oontrary case would lead to a hole in the structure, one bubble being missing at the 
point where the dislocations met 

5. Other types of fault 

Figure 10 , plate 12 , shows a narrow stop between two crystals of parallel orien- 
tation, the strip being crossed by a number of fault lines where the bubbles are not 
in close packmg It is in such places as these that recrystallization may be expected 
The boundaries approach and the strip is absorbed into a wider area of perfect 
crystal 

Figures 1 la to 1 1 </, plates I3and 14 are examples of arrangements which frequently 
appear m places where there is a local deficiency of bubbles While a dislocation is 
aP CTi as a dark stripe m a general view, these structures show up in the shape of the 
letter V or as triangles A typical V structure is Been m figure 11a When the model is 
being distorted, a V structure is formed by two dislocations meetmg at an inclination 
of 60° , it is destroyed by the dislocations continuing along their paths. Figure 116 
shows a small triangle, which also embodies a dislocation, for it will be noticed that 
the rows below the fault have one more bubble than those below If a mild amount 
of ‘thermal movement’ is imposed by gentle agitation of one side of the crystal, 
inch faulty places disappear and a perfect structure is formed 

Here and there in the crystals there is a blank space where a bubble is missing, 
showing as a black dot in a general view Examples occur m figure 1 1 g. Such a gap 
cannot be closed by a local readjustment, since filling the hole causes another to 
appear Such holes both appear and disappear when the crystal is ‘ cold-worked 

These structures in the model suggest that similar local faults may exist in an 
actual metal They may play a part in processes such as diffusion or the order- 
disorder change by reducing energy barriers in their neighbourhood, and act as 
nuclei for crystallization in an allotropic change 

6 Recrystallization and annealing 

Figures 12 a to I2e, plates 14 to 16, show the same raft of bubbles at successive 
times A raft covering the surface of the solution was given a vigorous stirring with 
a glass rake, and then left to adjust itself Figure 12a shows its aspect about 1 sec. 
after stirring has ceased. The raft is broken into a number of small ‘crystallites’, 
these are m a high state of non-homogeneous strain as is shown by the numerous 
dislocations and other faults The following photograph (figure 126) shows the 
same raft 32 sec later. The small grains have coalesced to form larger grains, and 
much of the strain has disappeared in the process. Recrystallization takes place 
right through the series, the last three photographs of which show the appearance 
of the raft 2 , 14 and 25 min after the initial stirring. It is not possible to follow the 
rearrangement for much longer times, because the bubbles shrink after long standing, 
apparently due to the diffusion of air through their walls, and they also become thin 
and tend to buret No agitation was given to the model during this process An 
ever slower process of rearrangement goes on, the movement of the bubbles in one 
part of the raft setting up strains which activate a rearrangement in a neighbouring 
part, and that in its turn still another 

A number of interesting points are to be seen in this series. Note the three small 
grains at the points indicated by the co-ordinates A A, BB, CC . A persists, though 


changed in form, throughout the whole senes B is still present after 14 min , but 
has disappeared m 26 mm , leaving behind it four dislocations marking internal 
strain m the grain. Gram C shrinks and finally disappears m figure 12 d, leaving a 
hole and a V which has disappeared in figure 12 e At the same time the ill-defined 
boundary in figure 12d at DD has become a definite one in figure 12 e Note also 
the straightening out of the gram boundary in the neighbourhood of EE in figures 
126 to 12e. Dislocations of various lengths c an be seen, marking all stages between 
a slight warping of the structure and a definite boundary Holes where bubbles 
are mis sin g show up as black dots Some of these holes are formed or filled up by 
movements of dislocations, but others represent places where a bubble has buret 
Many examples of V’s and some of triangles can be seen Other interesting points 
will be apparent from a study of this senes of photographs. 

Figures 13a, 136 and 13c, plate 17, show a portion of a raft 1 sec , 4 sec and 4 mm. 
after the stimng process, and is interesting as showing two successive stages in the 
relaxation towards a more perfect arrangement The changes show up well when one 
looks m a glancing direction across the page The arrangement is very broken in 
figure 13a In figure 136 the bubbles have grouped themselves in rows, but the 
curvature of these rows indicates a high degree of internal strain In figure 13c this 
strain has been relieved by the formation of a new boundary at A-A, the rows on 
either side now being straight It would appear that the energy of this strained 
crystal is greater than that of the inter crystal line boundary We are indebted to 
Messrs Kodak for the photographs of figure 13, which were taken when the cine- 
matograph film referred to below was produced 

7 Effect of impurity atom 

Figure 14, plate 18, shows the widespread effect of a bubble which is of the wrong 
size If this figure is compared with the perfect rafts shown in figures 2 and 4, 
plate 8 , it will be seen that three bubbles, one larger and two smaller than 
normal, disturb the regularity of the rows over the whole of the figure As has been 
mentioned above, bubbles of the wrong size are generally found in the grain boun- 
daries, where holes of irregular size occur which can accommodate them 

8 Mechanical properties of the two-dimensional model 

The mechanical properties of a two-dimensional perfect raft have been described 
in the paper referred to above (Bragg 19426 ) The raft lies between two parallel 
springs dipping horizontally in the surface of the soap solution The pitch of the 
springs is adjusted to fit the spacing of the rows of bubbles, which then adhere firmly 
to them One spring can be translated parallel to itself by a micrometer screw, and 
the other is supported by two thin vertical glass fibres The shearing stress can be 
measured by noting the deflexion of the glass fibres When subjected to a shearing 
stram, the raft obeys Hooke’s law of elasticity up to the point where the elastic 
limit is reached It then slips along some intermediate row by an amount equal to 
the width of one bubble The elastic shear and slip can be repeated several times The 
elastic limit is approximately reached when one side of the raft has been sheared 
by an amount equal to a bubble width past the other side This feature supports 
the basic assumption made by one of us m the calculation of the elaatic limit of a 
metal {Bragg 1942 a), in which it is supposed that each crystallite m a cold-worked 
metal only yields when the strain in it has reached such a value that energy is 
released by the slip 

A calculation has been made by M M Nicolson of the forces between the bubbles, 
and will be published shortly It shows two interesting points The curve for 
rhe variation of potential energy with distance between centres is very similar to 
those which have been plotted for atoms It has a minimum for a distance between 
centres slightly less than a free bubble diameter, and rises sharply for smaller dis- 
tances Further, the rise is extremely sharp for bubbles of 0 1 mm diameter but 
much less so for bubbles of 1 mm diameter, thus confirming the impression given 
by the model that the small bubbles behave as if they were much more rigid than 
the large ones. 

9 Three-dimensional assemblages 

If the bubbles are allowed to accumulate in multiple layers on the surface, they 
form a mass of three-dimensional ‘crystals’ with one of the arrangements of closest 
packing Figure 15, plate 18, shows an oblique view of such a mass, its resemblance 
to a polished and etched metal surface is noticeable In figure 16, plate 20 , a similar 
mass is seen viewed normally Parts of the structure are definitely in cubic closest 
packmg, the outer surface being the (111) face or (100) face Figure 17a, plate 19, 
shows a ( 111 ) face The outlines of the three bubbles on which each upper bubble 
rests can be clearly seen, and the next layer of these bubbles is faintly visible in a 
position not beneath the uppermost layer, showing that the packmg of the ( 111 ) 
planes has the well-known cubic succession Figure 17 6 plate 19, Bhows a ( 100 ) face 
with each bubble resting on four others The cubic axes are of course inclined at 
45° to the close-packed rows of the surface layer Figure 17 c, plate 19, shows a 
twin m the cubic structure across the face (111) The uppermost faces are (111) 
and ( 100 ), and they make a small angle with each other, though this is not apparent 
m the figure, it shows up in an oblique view Figure 17 d, plate 19, appears to show 
both the cubic and hexagonal succession of closely packed planes, but it is difficult 
to verify whether the left-hand side follows the true hexagonal close-packed struc- 
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ture because it is not certain that the assemblage had a depth of more than two 
layers at this point Many instances of twins, and of mtercrystallme boundaries, 
can be seen m figure 16, plate 20 

Figure 18, plate 21, shows several dislocations in a three-dimensional structure 
subjected to a bending stram 

10 Demonstration of the model 

With the co-operation of Messrs Kodak, a 16 mm cinematograph film has been 
made of the movements of the dislocations and gram boundaries when smgle crystal 
and poly crystalline rafts are sheared compressed, or extended Moreover, if the 
soap solution is placed in a glass vessel with a flat bottom, the model lends itself to 
projection on a large scale by transmitted light Since a certain depth is required for 
producing the bubbles, and the solution is rather opaque, it is desirable to make the 
projection through a glass block resung on the bottom of the vessel and just sub- 
merged beneath the surface 

In conclusion, we wish to express our thanks to Mr C E Harrold, of King’s College, 
Cambridge, who made for as some of the pipettes which were used to produce the 
bubbles 
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Figube 4. Perfect crystalline raft of bubbles. Diameter 0-30 mm. 
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Grain boundaries 



Figure 5a. Diameter 1 87 mm. 



Figure 5b. Diameter 0*76 mm. 
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Figure 6a. A dislocation. Diameter 1-9 mm. 
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Dislocations 
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Figure 66. Diameter 0*76 mm. 




Figure 7. Parallel dislocations. Diameter 0-76 mm. 
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Figure 8. Dislocation projecting from a grain boundary. Diameter 0*30 mm. 



Figure 9. Dislocations in adjacent rows. 


Diameter 1*9 mm. 



Figure 10. 


Series of fault lines between two areas of parallel orientation. Diameter 0*30 mm. 
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Figure 13. Two stages of recrystallization. Diameter 1*64 mm. 
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Figure 18. 


Dislocations m three-dimensional structure. Diameter 0*70 mm. 
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31 


Tensors 


31-1 The tensor of polarizability 

Physicists always have a haoit of taking the simplest example of any phenome- 
non and calling it “physics,” leaving the more complicated examples to become 
the concern of other fields — say of applied mathematics, electrical engineering, 
chemistry, or crystallography. Even solid-state physics is almost only half physics 
because it worries too much about special substances. So in these lectures we will 
be leaving out many interesting things. For instance, one of the important proper- 
ties of crystals — or of most substances — is that their electric polarizability is 
different in different directions. If you apply a field in any direction, the atomic 
charges shift a little and produce a dipole moment, but the magnitude of the 
moment depends very much on the direction of the field. That is, of course, 
quite a complication. But in physics we usually start out by talking about the 
special case in which the polarizability is the same in all directions, to make life 
easier. We leave the other cases to some other field. Therefore, for our later work, 
we will not need at all what we are going to talk about in this chapter. 

The mathematics of tensors is particularly useful for describing properties 
of substances which vary in direction — although that’s only one example of their 
use. Since most of you are not going to become physicists, but are going to go 
into the real world, where things depend severely upon direction, sooner or later 
you will need to use tensors. In order not to leave anything out, we are going to 
describe tensors, although not in great detail. We want the feeling that our treat- 
ment of physics is complete. For example, our electrodynamics is complete — as 
complete as any electricity and magnetism course, even a graduate course. Our 
mechanics is not complete, because we studied mechanics when you didn’t have a 
high level of mathematical sophistication, and we were not able to discuss subjects 
like the principle of least action, or Lagrangians, or Hamiltonians, and so on, 
which are more elegant ways of describing mechanics. Except for general relativity, 
however, we do have the complete laws of mechanics. Our electricity and magnetism 
is complete, and a lot of other things are quite complete. The quantum mechanics, 
naturally, will not be — we have to leave something for the future. But you should 
at least know what a tensor is. 

We emphasized in Chapter 30 that the properties of crystalline substances are 
different in different directions — we say they are anisotropic . The variation of 
the induced dipole moment with the direction of the applied electric field is only 
one example, the one we will use for our example of a tensor. Let's say that for a 
given direction of the electric field the induced dipole moment per unit volume P 
is proportional to the strength of the applied field E . (This is a good approximation 
for many substances if E is not too large.) We will call the proportionality 
constant a.* We want now to consider substances in which a depends on the 
direction of the applied field, as, for example, in crystals like calcite, which make 
double images when you look through them. 

Suppose, in a particular crystal, we find that an electric field Ei in the ^-direc- 
tion produces the polarization P x in the x-direction. Then we find that an electric 
field E 2 in the ^-direction, with the same strength , as E x produces a different polar- 
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* In Chapter 10 we followed the usual convention and wrote P = eoxE and called 
x (“khi”) the “susceptibility.” Here, it will be more convenient to use a single letter, so 
we write a for eox For isotropic dielectrics, a - (k-1 )eo, where k is the dielectric constant 
(see Section 10-4). 
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Fig. 31-1. The vector addition of 
polarizations in an anisotropic crystal. 


ization P 2 in the ^-direction. What would happen if we put an electric field at 
45° ? Well, that’s a superposition of two fields along x and y, so the polarization 
P will be the vector sum of Pi and P 2 , as shown in Fig. 3 1-1 (a). The polarization 
is no longer in the same direction as the electric field. You can see how that might 
come about. There may be charges which can move easily up and down, but 
which are rather stiff for sidewise motions. When a force is applied at 45°, the 
charges move farther up than they do toward the side. The displacements are 
not in the direction of the external force, because there are asymmetric internal 
elastic forces. 

There is, of course, nothing special about 45°. It is generally true that the 
induced polarization of a crystal is not in the direction of the electric field. In our 
example above, we happened to make a “lucky” choice of our x- and >>-axes, 
for which P was along E for both the x- and ^-directions. If the crystal were 
rotated with respect to the coordinate axes, the electric field E 2 in the ^-direction 
would have produced a polarization P with both an x- and a ^-component. 
Similarly, the polarization due to an electric field in the x-direction would have 
produced a polarization with an x-component and a jk-component. Then the 
polarizations would be as shown in Fig. 31-l(b), instead of as in part (a). Things 
get more complicated — but for any field E, the magnitude of P is still proportional 
to the magnitude of E. 

We want now to treat the general case of an arbitrary orientation of a crystal 
with respect to the coordinate axes. An electric field in the x-direction will produce 
a polarization P with x-, y- 9 and z-components ; we can write 

Px = (X xxP'X'> Py OL yx E X9 P z Ot zx E x . (31.1) 

All we are saying here is that if the electric field is in the x-direction, the 
polarization does not have to be in that same direction, but rather has an x-, a y- 9 
and a z-component — each proportional to E x . We are calling the constants of 
proportionality a xx , a yX9 and a zx , respectively (the first letter to tell us which com- 
ponent of P is involved, the last to refer to the direction of the electric field). 

Similarly, for a field in the ^-direction, we can write 


Px a xy E y , 


Py = 


x yyPy> 


F ■ 

y^y 9 


(31.2) 


and for a field in the z-direction, 


P x — a, Py °LyzP'Z) Pz G i zz E z . (31.3) 

Now we have said that polarization depends linearly on the fields, so if there is an 
electric field E that has both an x- and a ^-component, the resulting x-component 
of P will be the sum of the two P x s of Eqs. (31.1) and (3 1 .2). If E has components 

along x, y 9 and z, the resulting components of P will be the sum of the three 

contributions in Eqs. (31.1), (31.2), and (31.3). In other words, P will be given by 

P x ~ a xx E x H~ oi xy E y T - ol xz E Z9 

Py ~ oc yx E x ~|- oiyyEy T" oiyzE Z9 (31.4) 

Pz ~ 4" WzyPy 4” Ot zz E z . 

The dielectric behavior of the crystal is then completely described by the nine 
quantities ( a xx , a xy9 a xz , a yz , . . .), which we can represent by the symbol a xr 
(The subscripts / and j each stand for any one of the three possible letters x, y , 
and z.) Any arbitrary electric field E can be resolved with the components E X9 E v , 
and E z \ from these we can use the a tJ to find P x , P V9 and P Z9 which together give 
the total polarization P. The set of nine coefficients a tJ is called a tensor — in this 
instance, the tensor of polarizability . Just as we say that the three numbers (E X9 
E y9 E z ) “form the vector Ef we say that the nine numbers (a XX9 a xy9 . . .) “form 
the tensor a 
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31-2 Transforming the tensor components 

You know that when we change to a different coordinate system x', y\ and z', 
the components E x * 9 E y > , and E z * of the vector will be quite different — as will 
also the components of P. So all the coefficients a tJ will be different for a different 
set of coordinates. You can, in fact, see how the as must be changed by changing 
the components of E and P in the proper way, because if we describe the same 
physical electric field in the new coordinate system we should get the same polariza- 
tion. For any new set of coordinates, P x > is a linear combination of P Xi P y , and P z : 

Px' = aP x + bPy + cP z , 

and similarly for the other components. If you substitute for P Xi P y , and P z in 
terms of the E' s, using Eq. (31.4), you get 

P x' ~ Q’i&XxP'X “f~ °^XyEy + a x zP>z) 

+ b{a yx E x + a yy Ey + ' * *) 

+ c(a zx E x + ■ • * +*■*)• 

Then you write E x , E yi and E z in terms of E x *> E y ', and E z >\ for instance, 

E x = ctE x * + b'E y > + c'Eg’, 

where a ! , b\ c f are related to, but not equal to, a, b , c. So you have P x >, expressed 
in terms of the components E x >, E y > , and E z >\ that is, you have the new a %7 . It is 
fairly messy, but quite straightforward. 

When we talk about changing the axes we are assuming that the crystal stays 
put in space . If the crystal were rotated with the axes, the a’s would not change. 
Conversely, if the orientation of the crystal were changed with respect to the axes, 
we would have a new set of <Ys. But if they are known for any one orientation of 
the crystal, they can be found for any other orientation by the transformation we 
have just described. In other words, the dielectric property of a crystal is described 
completely by giving the components of the polarization tensor a t7 with respect 
to any arbitrarily chosen set of axes. Just as we can associate a vector velocity 
v = O’j;, v yy v z ) with a particle, knowing that the three components will change 
in a certain definite way if we change our coordinate axes, so with a crystal we 
associate its polarization tensor a i; , whose nine components will transform in a 
certain definite way if the coordinate system is changed. 

The relation between P and E written in Eq. (31.4) can be put in the more 
compact notation: 

P, = (31.5) 

where it is understood that i represents either x, y, or z and that the sum is taken 
ony = x, y, and z. Many special notations have been invented for dealing with 
tensors, but each of them is convenient only for a limited class of problems. One 
common convention is to omit the sum sign (X!) in Eq. (31.5), leaving it under- 
stood that whenever the same subscript occurs twice (here y), a sum is to be taken 
over that index. Since we will be using tensors so little, we will not bother to 
adopt any such special notations or conventions. 


31-3 The energy ellipsoid 

We want now to get some experience with tensors. Suppose we ask the in- 
teresting question: What energy is required to polarize the crystal (in addition to 
the energy in the electric field which we know is e^E 2 /! per unit volume)? Consider 
for a moment the atomic charges that are being displaced. The work done in dis- 
placing the charge the distance dx is qE x dx , and if there are N charges per unit 
volume, the work done is qE x N dx. But qN dx is the change dP x in the dipole 
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moment per unit volume. So the energy required per unit volume is 

E x dP x . 

Combining the work for the three components of the field, the work per unit 
volume is found to be 

E dP. 

Since the magnitude of P is proportional to E, the work done per unit volume in 
bringing the polarization from 0 to P is the integral of E • dP. Calling this work 
up* we write * 

up = \E P = * ££,/>,. (31.6) 

X 

Now we can express P in terms of E by Eq. (31.5), and we have that 

up = iZE (31.7) 

l 3 

The energy density u P is a number independent of the choice of axes, so it is a 
scalar. A tensor has then the property that when it is summed over one index 
(with a vector), it gives a new vector; and when it is summed over both indexes 
(with two vectors), it gives a scalar. 

The tensor a l3 should really be called a “tensor of second rank,” because it 
has two indexes. A vector — with one index — is a tensor of the first rank, and a 
scalar — with no index — is a tensor of zero rank. So we say that the electric field 
E is a tensor of the first rank and that the energy density u P is a tensor of zero 
rank. It is possible to extend the ideas of a tensor to three or more indexes, and 
so to make tensors of ranks higher than two. 

The subscripts of the polarization tensor range over three possible values— 
they are tensors in three dimensions. The mathematicians consider tensors in 
four, five, or more dimensions. We have already used a four-dimensional tensor 
F yv in our relativistic description of the electromagnetic field (Chapter 26). 

The polarization tensor a l} has the interesting property that it is symmetric , 
that is, that a xy = a yx , and so on for any pair of indexes. (This is a physical 
property of a real crystal and not necessary for all tensors.) You can prove for 
yourself that this must be true by computing the change in energy of a crystal 
through the following cycle: (1) Turn on a field in the x-direction; (2) turn on a 
field in the ^-direction; (3) turn off the x-field; (4) turn off the y- field. The crystal 
is now back where it started, and the net work done on the polarization must be 
back to zero. You can show, however, that for this to be true, a xy must be equal 
to a yx . The same kind of argument can, of course, be given for a xz , etc. So the 
polarization tensor is symmetric. 

This also means that the polarization tensor can be measured by just measuring 
the energy required to polarize the crystal in various directions. Suppose we apply 
an is-field with only an x- and a ^-component ; then according to Eq. (31.7), 

Up — 2" [°LxxEx + (pixy “f~ °lyx)E x Ey <X yy Ey\. (31.8) 

With an E x alone, we can determine a xx ; with an E y alone, we can determine a yy ; 
with both E x and E y , we get an extra energy due to the term with (a xy + a yx ). 
Since the a xy and a yx are equal, this term is 2 a xy and can be related to the energy. 

The energy expression, Eq. (31.8), has a nice geometric interpretation. 
Suppose we ask what fields E x and E y correspond to some given energy density — say 
w 0 * That is just the mathematical problem of solving the equation 

°LxxE x + 2 ot xy E x E y a yy E u ~ 2i/q. 

This is a quadratic equation, so if we plot E x and E y , the solutions of this equation 

* This work done in producing the polarization by an electric field is not to be confused 
with the potential energy —po>E of a permanent dipole moment po. 
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are all the points on an ellipse (Fig. 31-2). (It must be an ellipse, rather than a 
parabola or a hyperbola, because the energy for any field is always positive and 
finite.) The vector E with components E x and E y can be drawn from the origin 
to the ellipse. So such an “energy ellipse” is a nice way of “visualizing” the polar- 
ization tensor. 

If we now generalize to include all three components, the electric vector E in 
any direction required to give a unit energy density gives a point which will be on 
the surface of an ellipsoid, as shown in Fig. 31-3. The shape of this ellipsoid of 
constant energy uniquely characterizes the tensor polarizability. 

Now an ellipsoid has the nice property that it can always be described simply 
by giving the directions of three “principal axes” and the diameters of the ellipse 
along these axes. The “principal axes” are the directions of the longest and 
shortest diameters and the direction at right angles to both. They are indicated 
by the axes a , h , and c in Fig. 31-3. With respect to these axes, the ellipsoid has 
the particularly simple equation 

+ &bbEb Ot cc E c 2 Uq. 

So with respect to these axes, the dielectric tensor has only three components 
that are not zero: a aa , a bb , and a (( That is to say, no matter how complicated a 
crystal is, it is always possible to choose a set of axes (not necessarily the crystal 
axes) for which the polarization tensor has only three components. With such a 
set of axes, Eq, (31.4) becomes simply 

= &aa,Eai P b ~ ^bb^bj P c ~ ^cc^r- (31.9) 

An electric field along any one of the principal axes produces a polarization along 
the same axis, but the coefficients for the three axes may, of course, be different. 

Often, a tensor is described by listing the nine coefficients in a table inside of 
a pair of brackets: 

(Xxx &xy &xz 

atyx &yy ®-yz * (31.10) 

GLzx OL Z y &zz_ 



Fig. 31-2. Locus of the vector E = 
(E a , Ey) that gives a constant energy of 
polarization. 
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Fig. 31-3. The energy ellipsoid of 
the polarization tensor. 


For the principal axes a, b , and c, only the diagonal terms are not zero; we say 
then that “the tensor is diagonal.” The complete tensor is 


0 a b b 

0 0 


0 

0 

&cc_ 


(31.11) 


The important point is that any polarization tensor (in fact, any symmetric tensor 
of rank two in any number of dimensions) can be put in this form by choosing a 
suitable set of coordinate axes. 

If the three elements of the polarization tensor in diagonal form are all equal, 
that is, if 

&aa &bb &cc (31.12) 

the energy ellipsoid becomes a sphere, and the polarizability is the same in all 
directions. The material is isotropic. In the tensor notation, 

Ot iy a$ijj 

where 8 tJ is the unit tensor 

("1 0 0 " 

= 0 1 0 • 

|0 0 1 _ 

That means, of course, 

5ij 1, if i j, 

b l3 = 0, if i ^ j. 


(31.13) 

(31.14) 

(31.15) 
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The tensor 5^ is often called the “Kronecker delta. ” You may amuse yourself 
by proving that the tensor (31.14) has exactly the same form if you change the 
coordinate system to any other rectangular one. The polarization tensor of Eq. 
(31.13) gives 

Pi ~ a zC = 

3 

which means the same as our old result for isotropic dielectrics: 

P = aE. 

The shape and orientation of the polarization ellipsoid can sometimes be 
related to the symmetry properties of the crystal. We have said in Chapter 30 
that there are 230 different possible internal symmetries of a three-dimensional 
lattice and that they can, for many purposes, be conveniently grouped into seven 
classes, according to the shape of the unit cell. Now the ellipsoid of polarizability 
must share the internal geometric symmetries of the crystal. For example, a 
tnclimc crystal has low symmetry — the ellipsoid of polarizability will have unequal 
axes, and its orientation will not, in general, be aligned with the crystal axes. On 
the other hand, a monoclinic crystal has the property that its properties are un- 
changed if the crystal is rotated 180° about one axis. So the polarization tensor 
must be the same after such a rotation. It follows that the ellipsoid of the polariz- 
ability must return to itself after a 180° rotation. That can happen only if one of 
the axes of the ellipsoid is in the same direction as the symmetry axis of the crystal. 
Otherwise, the orientation and dimensions of the ellipsoid are unrestricted 

For an orthorhombic crystal, however, the axes of the ellipsoid must corre- 
spond to the crystal axes, because a 180° rotation about any one of the three axes 
repeats the same lattice. If we go to a tetragonal crystal, the ellipse must have the 
same symmetry, so it must have two equal diameters. Finally, for a cubic crystal, 
all three diameters of the ellipsoid must be equal, it becomes a sphere, and the 
polarizability of the crystal is the same in all directions. 

There is a big game of figuring out the possible kinds of tensors for all the 
possible symmetries of a crystal. It is called a “group-theoretical” analysis. But 
for the simple case of the polarizability tensor, it is relatively easy to see what the 
relations must be. 


31-4 Other tensors; the tensor of inertia 

There are many other examples of tensors appearing in physics. For example, 
in a metal, or in any conductor, one often finds that the current density / is ap- 
proximately proportional to the electric field E\ the proportionality constant is 
called the conductivity cr : 

j = cfE. 

For crystals, however, the relation between j and E is more complicated; the 
conductivity is not the same in all directions. The conductivity is a tensor, and 
we write 

Jl — ^ y GijEj- 

Another example of a physical tensor is the moment of inertia. In Chapter 18 
of Volume I we saw that a solid object rotating about a fixed axis has an angular 
momentum L proportional to the angular velocity co, and we called the proportion- 
ality factor /, the moment of inertia: 


L = I(o. 

For an arbitrarily shaped object, the moment of inertia depends on its orientation 
with respect to the axis of rotation. For instance, a rectangular block will have 
different moments about each of its three orthogonal axes. Now angular velocity 
co and angular momentum L are both vectors. For rotations about one of the axes 
of symmetry, they are parallel. But if the moment of inertia is different for the 
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three principal axes, then w and L are, in general, not in the same direction 
(see Fig. 31-4). They are related in a way analogous to the relation between 
E and P. In general, we must write 

^ x Ixx^x Ixytdy T" Jxz^zi 

Ly Ixy^x T lyy^y 4“ Iy Z G) Z , (3M6) 

E Z I ZX 0) x + I Z yO)y ~f“ IzZ^Z‘ 

The nine coefficients I %3 are called the tensor of inertia. Following the analogy 
with the polarization, the kinetic energy for any angular momentum must be 
some quadratic form in the components a) x , o) y , and co 2 : 

KE = i C31-17) 

13 

We can use the energy to define the ellipsoid of inertia. Also, energy arguments 
can be used to show that the tensor is symmetric — that I 13 = I n . 

The tensor of inertia for a rigid body can be worked out if the shape of the 
object is known. We need only to write down the total kinetic energy of all the 
particles in the body. A particle of mass m and velocity v has the kinetic energy 
%mv 2 , and the total kinetic energy is just the sum 

^2 imv 2 



Fig. 31-4. The angular momentum 
t of a solid object is not, in general, 
parallel to its angular velocity co. 


over all of the particles of the body. The velocity v of each particle is related to 
the angular velocity co of the solid body. Let’s assume that the body is rotating 
about its center of mass, which we take to be at rest. Then if r is the displacement 
of a particle from the center of mass, its velocity v is given by co X r. So the total 
kinetic energy is 

KE = X %m(w X rf. (31.18) 


Now all we have to do is write w X r out in terms of the components c o x , co v , c o 2 , 
and x, y, z, and compare the result with Eq. (31.17); we find I %3 by identifying 
terms. Carrying out the algebra, we write 

(co X r) 2 — (co X r)x + (« X r) 2 + (co X r) 2 

= (c o y z — co z y) 2 + (u z x — o o x z) 2 + (to x y — co ty *) 2 
= + co 2 z 2 — 2cx) y co z zy 4- w 2 y 2 
"T ^z* 2 — 2co z co x xz C 0 2 Z 2 

4- to£y 2 — 2 o) x co y yx + co 2 * 2 . 

Multiplying this equation by m/2, summing over all particles, and comparing 
with Eq. (31.17), we see that l xx , for instance, is given by 

i xx = X) m (y 2 + z2 )- 

This is the formula we have had before (Chapter 19, Vol. I) for the moment of 
inertia of a body about the *-axis. Since r 2 = x 2 + y 2 + z 2 , we can also write 
this term as 

hx = X ~ x2 )- 


Working out all of the other terms, the tensor of inertia can be written as 


£ m{r 2 - x 2 ) 

— £ myx 

— £ mzx 


— £ mxy 
£ m(r 2 ~ y 2 ) 

— £ mzy 


— £ mxz 

— £ myz 
£ m(r 2 — z 2 ) 


If you wish, this may be written in “tensor notation” as 

I t3 = ^ ^ w(r $i 3 r z fj). 


(31.19) 


(31.20) 
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where the r x are the components (x, y, z) of the position vector of a particle and 
the X! means to sum over all the particles. The moment of inertia, then, is a tensor 
of the second rank whose terms are a property of the body and relate L to w by 

(3121) 

3 

For a body of any shape whatever, we can find the ellipsoid of inertia and, 
therefore, the three principal axes. Referred to these axes, the tensor will be 
diagonal, so for any object there are always three orthogonal axes for which the 
angular velocity and angular momentum are parallel They are called the principal 
axes of inertia. 


31-5 The cross product 

We should point out that we have been using tensors of the second rank 
since Chapter 20 of Volume I. There, we defined a “torque in a plane,” such as 
Tcys hy 

T xy = xF u ~ yF x . 

Generalized to three dimensions, we could write 

= r z F 3 - r 3 F t . (31.22) 

The quantity r %3 is a tensor of the second rank. One way to see that this is so is by 
combining r l} with some vector, say the unit vector e, according to 

Z 

j 

If this quantity is a vector , then t %j must transform as a tensor — this is our definition 
of a tensor. Substituting for r l3 , we have 

Z = Z r > F ’ e > - Z r > e > F ‘ 

3 3 3 

= r % (F * e) - (r * e)F l . 

Since the dot products are scalars, the two terms on the right-hand side are vectors, 
and likewise their difference. So r l3 is a tensor. 

But r t) is a special kind of tensor; it is antisymmetric, that is, 

T l] ~ T jU 

so it has only three nonzero terms — r xy , r yz , and r zx . We were able to show in 
Chapter 20 of Volume 1 that these three terms, almost “by accident,” transform 
like the three components of a vector, so that we could clejme 

T ~~ hn ^ T z ) — (Tyzi Tzx* Try) 

We say “by accident,” because it happens only in three dimensions. In four 
dimensions, for instance, an antisymmetric tensor of the second rank has six 
nonezero terms and certainly cannot be replaced by a vector with four components. 

Just as the axial vector r = r X F is a tensor, so also is every cross product 
of two polar vectors — all the same arguments apply. By luck, however, they are 
also representable by vectors (really pseudovectors), so our mathematics has been 
made easier for us. 

Mathematically, if a and b are any two vectors, the nine quantities a t b 3 form 
a tensor (although it may have no useful physical purpose). Thus, for the position 
vector r,, r,r 3 is a tensor, and since 8 tJ is also, we see that the right side of Eq. 
(31 20) is indeed a tensor. Likewise Eq (31.22) is a tensor, since the two terms on 
the right-hand side are tensors. 
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31-6 The tensor of stress 


The symmetric tensors we have described so far arose as coefficients in re- 
lating one vector to another. We would like to look now at a tensor which has a 
different physical significance — the tensor of stress . Suppose we have a solid 
object with various forces on it. We say that there are various “stresses” inside, 
by which we mean that there are internal forces between neighboring parts of the 
material. We have talked a little about such stresses in a two-dimensional case 
when we considered the surface tension in a stretched diaphragm in Section 
12-3. We will now see that the internal forces in the material of a three-dimensional 
body can be described in terms of a tensor. 

Consider a body of some elastic material — say a block of jello. If we make 
a cut through the block, the material on each side of the cut will, in general, get 
displaced by the internal forces. Before the cut was made, there must have been 
forces between the two parts of the block that kept the material in place; we can 
define the stresses in terms of these forces. Suppose we look at an imaginary plane 
perpendicular to the x-axis — like the planed in Fig 31-5 — and ask about the force 
across a small area Ay A z in this plane The material on the left of the area exerts 
the force AF\ on the material to the right, as shown in part (b) of the figure 
There is, of course, the opposite reaction force — AFi exerted on the material to 
the left of the surface. If the area is small enough, we expect that AF\ is propor- 
tional to the area Ay A z. 

You are already familiar with one kind of stress — the pressure in a static 
liquid. There the force is equal to the pressure times the area and is at right angles 
to the surface element. For solids — also for viscous liquids in motion — the force 
need not be normal to the surface; there are shear forces in addition to pressures 
(positive or negative) (By a “shear” force we mean the tangential components 
of the force across a surface.) All three components of the force must be taken 
into account. Notice also that if we make our cut on a plane with some other 
orientation, the forces will be different. A complete description of the internal 
stress requires a tensor. 



Fig 31-5. The material to the left of 
the plane cr exerts across the area 
Ay Az the force AF] on the material to 
the right of the plane. 



Fig. 31-6. The force AF] across an 
element of area Ay Az perpendicular to 
the x-axis is resolved into the three 
components AF*i, AF y i, and AF z j. 


We define the stress tensor in the following way: First, we imagine a cut 
perpendicular to the x-axis and resolve the force AF ^ across the cut into its com- 
ponents AF x i, AF, fU AF z i, as in Fig. 31-6. The ratio of these forces to the area 
Ay Az, we call S xx , S yXi and S zx . For example, 

e _ AFyl 

* yx Ay Az * 

The first index y refers to the direction force component; the second index x is 
normal to the area. If you wish, you can write the area Aj; Az as Aa r , meaning an 
element of area perpendicular to x. Then 

e = Win 
yx A a x 

Next, we think of an imaginary cut perpendicular to the j-axis. Across a small 
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Fig. 31-7. The force across an ele- 
ment of area perpendicular to y is re- 
solved into three rectangular components 



Ax 


Fig. 31-8. The force F n across the 
face N (whose unit normal is n) is resolved 
I^to components. 


area Aa Az there will be a force A /V Again we resolve this force into three com- 
ponents, as shown in Fig 31-7, and define the three components of the stress, 
S x// , S t/U , S Z>J , as the force per unit area in the three directions. Finally, we make an 
imaginary cut perpendicular to z and define the three components S xz , S l/Z , and S zz , 
So we have the nine numbers 


5 





S x 

A, 

s z 


(31.23) 


We want to show now that these nine numbers are sufficient to describe com- 
pletely the internal state of stress, and that S tJ is indeed a tensor Suppose we want 
to know the force across a surface oriented at some arbitrary angle Can we find 
it from S l fl Yes, m the following way: We imagine a little solid figure which has 
one face N in the new surface, and the other faces parallel to the coordinate axes. 
If the face N happened to be parallel to the z-axis, we would have the triangular 
piece shown in Fig. 31-8. (This is a somewhat special case, but will illustrate well 
enough the general method.) Now the stress forces on the little solid triangle in 
Fig 31-8 are in equilibrium (at least in the limit of infinitesimal dimensions), 
so the total force on it must be zero. We know the forces on the faces parallel to 
the coordinate axes directly from S tl Their vector sum must equal the foice on 
the lace yV, so we can express this force in terms of S tJ . 

Our assumption that the surface forces on the small triangular volume are in 
equilibrium neglects any other body forces that might be present, such as giavity 
oi pseudo forces if our coordinate system is not an inertial frame Notice, however, 
that such body forces will be proportional to the volume of the little triangle and, 
therefore, to Aa, Ay, Az, whereas all the surface forces are proportional to the 
areas such as Aa Ay, Ay Az, etc. So if we take the scale of the little wedge small 
enough, the body forces can always be neglected in comparison with the surface 
forces. 

Let’s now add up the forces on the little wedge. We take first the A-component, 
which is the sum of five parts — one from each face However, if Az is small enough, 
the forces on the triangular faces (perpendicular to the z-axis) will be equal and 
opposite, so we can forget them. The A-component of the force on the bottom 
rectangle is 

AF z2 — S xy Ax Az. 


The A-component of the force on the vertical rectangle is 

AF X \ — S X x Ay Az. 

These two must be equal to the A-component of the force outward across the face 
N. Let’s call n the unit vector normal to the face /V, and the force on it F„ , then 
we have 

AF xn S xx Ay Az d - St X y Ax Az, 


The A-component S xn of the stress across this plane is equal to AF ni divided by 
the area, which is A\/zAx 2 + Ay 2 , or 


Ay 


\/Aa 2 + Ay 2 


+ S x 


Aa 

a/Aa 2 + Ay 2 


Now Aa/\/Aa 2 4- Ay? is the cosine of the angle Q between n and the y-axis, as 
shown in Fig. 3 1-8, so it can also be written as n y , the y-component of n. Similarly, 
Ay/\/~A a 2 + Ay 2 is sin 8 = n x . We can write 


Ar n S XX ft X H“ S T yfty. 


If we now generalize to an arbitrary surface element, we would get that 

Acfl S xx n x d - S X yTly Sxzftz 
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or, in general, 


s,„ = Y s ^- ( 31 - 24 ) 

3 

We can find the force across any surface element in terms of the S lJy so it does 
describe completely the state of internal stress of the material. 

Equation (31 24) says that the tensor S tJ relates the force S n to the unit vector 
n, just as a tJ relates P to E. Since n and S n are vectors, the components of S l3 must 
transform as a tensor with changes in coordinate axes. So S tJ is indeed a tensor. 

We can also show that S l3 is a symmetric tensor by looking at the forces on a 
little cube of material. Suppose we take a little cube, oriented with its faces parallel 
to our coordinate axes, and look at it in cross section, as shown in Fig 31-9. If 
we let the edge of the cube be one unit, the x- and ^-components of the forces on 
the faces normal to the x- and y-axes might be as shown in the figure. If the cube 
is small, the stresses do not change appreciably from one side of the cube to the 
opposite side, so the force components are equal and opposite as shown Now 
there must be no torque on the cube, or it would start spinning. The total torque 
about the center is (S l/X — S xy ) (times the unit edge of the cube), and since the 
total is zero, S l/X is equal to S xy , and the stress tensor is symmetric. 

Since S tJ is a symmetric tensor, it can be described by an ellipsoid which will 
have three principal axes. For surfaces normal to these axes, the stresses are 
particularly simple — they correspond to pushes or pulls perpendicular to the sur- 
faces There are no shear forces along these faces. For any stress, we can always 
choose our axes so that the shear components are zero. If the ellipsoid is a sphere, 
there are only normal forces in any direction. This corresponds to a hydrostatic 
pressure (positive or negative). So for a hydrostatic pressure, the tensor is diagonal 
and all three components are equal; they are, in fact, just equal to the pressure p. 
We can write 

S tJ = pd tJ . (31.25) 

The stress tensor — and also its ellipsoid — will, in general, vary from point to 
point in a block of material; to describe the whole block we need to give the value 
of each component of S tJ as a function of position. So the stress tensor is a field . 
We have had scalar fields , like the temperature T(x,y, z ), which give one number 
for each point in space, and vector fields like E(x , y, z), which give three numbers 
for each point. Now we have a tensor field which gives nine numbers for each 
point in space — or really six for the symmetric tensor S 2J . A complete description 
of the internal forces in an arbitrarily distorted solid requires six functions of 
x, y, and z. 

31-7 Tensors of higher rank 

The stress tensor S tJ describes the internal forces of matter. If the material is 
elastic, it is convenient to describe the internal distortion in terms of another tensor 
T tJ — called the strain tensor. For a simple object like a bar of metal, you know 
that the change in length, A L, is approximately proportional to the force, so we 
say it obeys Hooke’s law: 

A L = 7 F. 

For a solid elastic body with arbitrary distortions, the strain T tJ is related to the 
stress S l3 by a set of linear equations: 

T tJ = (31.26) 

k,l 

Also, you know that the potential energy of a spring (or bar) is 

±FAL = 

The generalization for the elastic energy density in a solid body is 

^elastic ^ ^ 2^ (31.27) 

ijkl 



Fig. 31-9. The x- and y-forces on 
four faces of a small unit cube. 
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The complete description of the elastic properties of a crystal must be given in 
terms of the coefficients y l3k i- This introduces us to a new beast. It is a tensor of the 
fourth rank. Since each index can take on any one of three values, x, y, or z, there 
are 3 4 = 81 coefficients. But there are really only 21 different numbers. First, 
since S l3 is symmetric, it has only six different values, and only 36 different co- 
efficients are needed in Eq. (31.27). But also, S l3 can be interchanged with S k i 
without changing the energy, so 7 ljki must be symmetric if we interchange ij 
and kL This reduces the number of different coefficients to 21. So to describe the 
elastic properties of a crystal of the lowest possible symmetry requires 21 elastic 
constants! This number is, of course, reduced for crystals of higher symmetry . 
For example, a cubic crystal has only three elastic constants, and an isotropic 
substance has only two. 

That the latter is true can be seen as follows How can the components of 
7 ljk i be independent of the direction of the axes, as they must be if the material 
is isotropic? Answer: They can be independent only if they are expressible in terms 
of the tensor 5 l3 . There are two possible expressions, d lJ d k i and 5 lk 8 } i + 8 t} 8 jky 
which have the required symmetry, so 7 llk i must be a linear combination of them. 
Therefore, for isotropic materials, 

y%3ki — + b($tk$ji + 8 t i8jk), 

and the material requires two constants, a and b. to describe its elastic properties. 
We will leave it for you to show that a cubic crystal needs only three 

As a final example, this time of a third-rank tensor, we have the piezoelectric 
effect. Under stress, a crystal generates an electric field proportional to the stress; 
hence, in general, the law is 

Ei = ^2 P,jkS 3 b. 

3,k 

where E h is the electric field, and the P l)k are the piezoelectric coefficients — or the 
piezoelectric tensor Can you show that if the crystal has a center of inversion 
(invariant under x, y, z —> — x, —y, —z) the piezoelectric coefficients are all zero? 

31-8 The four-tensor of electromagnetic momentum 

All the tensors we have looked at so far in this chapter relate to the three 
dimensions of space; they are defined to have a certain transformation property 
under spatial rotations. In Chapter 26 we had occasion to use a tensor in the four 
dimensions of relativistic space-time — the electromagnetic field tensor /> The 
components of such a four-tensor transform under a Lorentz transformation of 
the coordinates in a special way that we worked out. (Although we did not do it 
that way, we could have considered the Lorentz transformation as a “rotation'’ 
in a four-dimensional “space” called Minkowski space; then the analogy with what 
we are doing here would have been clearer ) 

As our last example, we want to consider another tensor in the four dimensions 
( t , x, y, z) of relativity theory. When we wrote the stress tensor, we defined S l3 
as a component of a force across a unit area. But a force is equal to the time 
rate of change of a momentum. Therefore, instead of saying “S x „ is the x-compon- 
ent of the force across a unit area perpendicular to yf we could equally well say, 
“S xy is the rate of flow of the x-component of momentum through a unit area 
perpendicular to yf In other words, each term of S tJ also represents the flow of 
the /-component of momentum through a unit area perpendicular to the y-direction 
These are pure space components, but they are parts of a “larger” tensor S^ v in 
four dimensions (ti and v = /, x, y, z) containing additional components like 
S tx , Sy U S tti etc. We will now try to find the physical meaning of these extra 
components. 

We know that the space components represent flow of momentum. We can 
get a clue on how to extend this to the time dimension by studying another kind of 
“flow” — the flow of electric charge. For the scalar quantity, charge, the rate of 
flow (per unit area perpendicular to the flow) is a space vector — the current density 
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vector j. We have seen that the time component of this flow vector is the density 
of the stuff that is flowing For instance, j can be combined with a time component, 
jt = p, the charge density, to make the four-vector = (p,/); that is, the p in 
takes on the values /, x , y, z to mean “density, rate of flow m the ^-direction, 
rate of flow in y, rate of flow in z” of the scalar charge. 

Now by analogy with our statement about the time component of the flow of 
a scalar quantity, we might expect that with S xx , S xv , and S r z , describing the flow 
of the x-component of momentum, there should be a time component S xi which 
would be the density of whatever is flowing; that is, S xt should be the density of 
x-momentum. So we can extend our tensor horizontally to include a /-component 
We have 

S xt = density of x-momentum, 

S xx = Jt-flow of x-momentum, 

S xy = >’-flow of x-momentum, 

S xz = z-flow of x-momentum. 

Similarly, for the ^-component of momentum we have the three components of 
flow — S yx , S yy , S l/Z — to which we should add a fourth term: 

S y t = density of jr-momentum. 

And, of course, to S zx , S zy , S zz we would add 

S zi = density of z-momentum. 

In four dimensions there is also a /-component of momentum, which is, we 
know, energy So the tensor S t3 should be extended vertically with S tx , S lv , and 
S iZi where 

S tx — x-flow of energy, 

S tu = inflow of energy, (31,28) 

S tz = z-flow of energy; 

that is, S tx is the flow of energy per unit area and per unit time across a surface 
perpendicular to the x-axis, and so on. Finally, to complete our tensor we need 
S t t, which would be the density of energy . We have extended our stress tensor 
S l3 of three dimensions to the four-dimensional stress-energy tensor S^ v . The 
index p can take on the four values /, x, y , and z, meaning, respectively, “density,” 
“flow per unit area in the x-direction,” “flow per unit area in the ^-direction,'' 

and “flow per unit area in the z-direction ” In the same way, v takes on the four 

values /, x, y f z to tell us what flows, namely, “energy,” “momentum in the x-direc- 
tion,” “momentum in the ^-direction, ,s and “momentum m the z-direction.” 

As an example, we will discuss this tensor not in matter, but m a region of free 
space in which there is an electromagnetic field. We know that the flow of energy is 
the Poynting vector S = e t )C 2 E X B. So the x-, y-, and z-components of S are, 
from the relativistic point of view, the components S tx , S ty , and S iz of our four- 
dimensional stress-energy tensor. The symmetry of the tensor S l3 carries over into 
the time components as well, so the four-dimensional tensor S ^ is symmetric: 

Sw - (31.29) 

In other words, the components S tf , S IJt , S zt , which are the densities of x, y, and 
z momentum , are also equal to the x-, y-, and z-components of the Poynting vector 
5, the energy flow — as we have already shown in an earlier chapter by a different 
kind of argument. 

The remaining components of the electromagnetic stress tensor can also 
be expressed in terms of the electric and magnetic fields E and B That is to say, 
we must admit stress or, to put it less mysteriously, flow of momentum in the 
electromagnetic field We discussed this in Chapter 27 in connection with Eq 
(27.21), but did not work out the details 
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Those who want to exercise their prowess in tensors in four dimensions might 
like to see the formula for in terms of the fields: 

= e j’ /£ - i £ FfaFfa} ’ 

\ a ) 

where sums on a, @ are on /, x, y, z but (as usual in relativity) we adopt a special 
meaning for the sum sign Y and for the symbol 5. In the sums the x, y, z terms 
are to be subtracted and 5 tt = +1, while 5 XX = b yy = 8 ZZ = — 1 and <5^ =■ 0 
for p v (c = 1). Can you verify that it gives the energy density S tt = 
(e 0 /2) ( E 2 + 5 2 ) and the Poynting vector e 0 5 X 5? Can you show that in an 
electrostatic field with 5 = 0 the principal axes of stress are in the direction of the 
electric field, that there is a tension (e 0 /2)E 2 along the direction of the field, and that 
there is an equal pressure in directions perpendicular to the field direction? 
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32 


Refractive Index of Dense Materials 


32-1 Polarization of matter 

We want now to discuss the phenomenon of the refraction of light — and also, 
therefore, the absorption of light — by dense materials. In Chapter 31 of Volume [ 
we discussed the theory of the index of refraction, but because of our limited 
mathematical abilities at that time, we had to restrict ourselves to finding the index 
only for materials of low density, like gases. The physical principles that produced 
the index were, however, made clear The electric field of the light wave polarizes 
the molecules of the gas, producing oscillating dipole moments. The acceleration 
of the oscillating charges radiates new waves of the field. This new field, interfering 
with the old field, produces a changed field which is equivalent to a phase shift of 
the original wave. Because this phase shift is proportional to the thickness of the 
material, the effect is equivalent to having a different phase velocity in the material. 
When we looked at the subject before, we neglected the complications that arise 
from such effects as the new wave changing the fields at the oscillating dipoles. 
We assumed that the forces on the charges in the atoms came just from the incoming 
wave, whereas, in fact, their oscillations are driven not only by the incoming wave 
but also by the radiated waves of all the other atoms It would have been difficult 
for us at that time to include this effect, so we studied only the rarefied gas, 
where such effects are not important. 

Now, however, we will find that it is very easy to treat the problem by the use 
of differential equations. This method obscures the physical origin of the index 
(as coming from the re-radiated waves interfering with die original waves), but 
it makes the theory for dense materials much simpler. This chapter will bring 
together a large number of pieces from our earlier work. We’ve taken up practically 
everything we will need, so there are relatively few really new ideas to be introduced. 
Since you may need to refresh your memory about what we are going to need, 
we give in Table 32-1 a list of the equations we are going to use, together with a 
reference to the place where each can be found. In most instances, we will not take 
the time to give the physical arguments again, but will just use the equations. 

Table 32-1 

Our work in this chapter will be based on the following material, 
already covered in earlier chapters 


32-1 Polarization of matter 

32-2 Maxwell’s equations in a 
dielectric 

32-3 Waves in a dielectric 

32-4 The complex index of refraction 

32-5 The index of a mixture 

32-6 Waves in metals 

32-7 Low-frequency and 

high-frequency approximations ; 
the skin depth and the plasma 
frequency 


Review: See Table 32-1. 


Subject 

Reference 

Equation 

Damped oscillations 

Vol I, Chap. 23 

mix + 7x + o)qv) = F 

Index of gases 

Vol I, Chap. 31 

i i 1 Nq ' 

+ 2 €o(c0q - w 2 ) 



n = n' — in" 

Mobility 

Vol. I, Chap. 41 

mx + fix = F 

Electrical conductivity 

Vol. I, Chap. 43 

r Nq 2 e r 

h- = ; <r = - 

m m 

Polarizability 

Vol II, Chap. 10 

Ppol — \ ' P 

Inside dielectrics 

Vol. II, Chap 1 1 

£kHMl = E + - — P 

3«o 
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We begin by recalling the machinery of the index of refraction for a gas. 
We suppose that there are TV particles per unit volume and that each particle be- 
haves as a harmonic oscillator. We use a model of an atom or molecule in which 
the electron is bound with a force proportional to its displacement (as though the 
electron were held in place by a spring). We emphasized that this was not a legiti- 
mate classical model of an atom, but we will show later that the correct quantum 
mechanical theory gives results equivalent to this model (in simple cases). In our 
earlier treatment, we did not include the possibility of a damping force in the atomic 
oscillators, but we will do so now. Such a force corresponds to a resistance to the 
motion, that is, to a force proportional to the velocity of the electron. Then the 
equation of motion is 

F = q e E ~ m(x + 7x + coqA'), (32.1) 

where a is the displacement parallel to the direction of E. (We are assuming an 
isotropic oscillator whose restoring force is the same in all directions. Also, we 
are taking, for the moment, a linearly polarized wave, so that E doesn’t change 
direction.) If the electric field acting on the atom varies sinusoidally with time, 
we write 

E = E 0 e tut . (32.2) 

The displacement will then oscillate with the same frequency, and we can let 

a - x t) e l<at 

Substituting x = /coa and x — — co 2 a, we can solve for a in terms of E: 


X = 2 E , (32.3) 

— CO "T ilf CO COq 

Knowing the displacement, we can calculate the acceleration x and find the 
radiated wave responsible for the index. This was the way we computed the index 
in Chapter 31 of Volume I. 

Now, however, we want to take a different approach. The induced dipole 
moment p of an atom is q e x or, using Eq. (32.3), 


P = 


q e /m 


CO 2 T /7co "T co 2 


Since p is proportional to E , we write 


(32.4) 


p = e 0 a(w)E, 


(32.5) 


where a is called the atomic polarizability* With this definition, we have 


a 


ql/tnto 

— CO 2 ~(- lit CO -}- COq 


(32 6 


The quantum mechanical solution for the motions of electrons in atoms 
gives a similar answer except with the following modifications. The atoms have 
several natural frequencies, each frequency with its own dissipation constant 
7. Also the effective “strength” of each mode is different, which we can represent 
by multiplying the polarizability for each frequency by a strength factor /, which 
is a number we expect to be of the order of 1 Representing the three parameters 
co, 7, and / by co^, 7*, and /* for each mode of oscillation, and summing over the 


* Throughout this chapter we follow the notation of Chapter 31 of Volume I, and let 
a represent the atomic polarizability as defined here. In the last chapter, we used a to 
represent the volume polarizability— the ratio of P to E In the notation of this chapter 
P = Nae 0 E (see Eq 32.8) 
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various modes, we modify Eq. (32.6) to read 


«(«) 


Jf 

m 


Z 


f k 

OJ 2 + /Y&CO + COofc 


(32 7) 


If N is the number of atoms per unit volume in the material, the polarization 
P is just Np = e 0 NaE, and is proportional to E : 


P = e u A r a(oj)£’. 


(32.8) 


In other words, when there is a sinusoidal electric field acting in a material, there 
is an induced dipole moment per unit volume which is proportional to the electric 
field — with a proportionality constant a that, we emphasize, depends upon the 
frequency. At very high frequencies, a is small; there is not much response. How- 
ever, at low frequencies there can be a strong response. Also, the proportionality 
constant is a complex number, which means that the polarization does not exactly 
follow the electric field, but may be shifted in phase to some extent At any rate, 
there is a polarization per unit volume whose magnitude is proportional to the 
strength of the electric field. 


32-2 Maxwell’s equations in a dielectric 

The existence of polarization in matter means that there are polarization 
charges and currents inside of the material, and these must be put into the complete 
Maxwell equations in order to find the fields We are going to solve Maxwell’s 
equations this nme m a situation in which the charges and currents are not zero, 
as in a vacuum, but are given implicitly by the polarization vector Our first 
step is to find explicitly the charge density p and current density /, averaged over 
a small volume of the same size we had in mind when we defined P . Then the 
p and j we need can be obtained from the polarization. 

We have seen in Chapter 10 that when the polarization P varies from place 
to place, there is a charge density given by 

Pp.,1 (32.9) 

At that time, we were dealing with static fields, but the same formula is valid also 
for time-varying fields However, when P varies with time, there are charges in 
motion, so there is also a polarization current. Each of the oscillating charges 
contributes a current equal to its charge q c , times its velocity v With N such 
charges per unit volume, the current density j is 

/ - Nq e v. 

Since we know that — dx/clt , then / = Nq c (dx/dt), which is just dP/dt. There- 
fore the current density from the varying polarization is 

- f 

Our problem is now direct and simple. We write Maxwell’s equations with 
the charge density and current density expressed in terms of P , using Eqs. (32.9) 
and (32 10). (We assume that there are no other currents and charges in the 
material.) We then relate P to E with Eq. (32.5), and we solve the equation for 
E and B — looking for the wave solutions 

Before we do this, we would like to make an historical note. Maxwell origi- 
nally wrote his equations in a form which was different from the one we have been 
using. Because the equations were written in this different form for many years — 
and are still written that way by many people — we will explain the difference In 
the early days, the mechanism of the dielectric constant was not fully and clearly 
appreciated. The nature of atoms was not understood, nor that there was a polar- 
ization of the material. So people did not appreciate that there was a contribution 
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to the charge density p from V • P. They thought only in terms of charges that 
were not bound to atoms (such as the charges that flow in wires or are rubbed 
off surfaces). 

Today, we prefer to let p represent the total charge density, including the part 
from the bound atomic charges. If we call that part p po i, we can write 

P Ppo 1 ~b Po t h or? 

where p ot hw is the charge density considered by Maxwell and refers to the charges 
not bound to individual atoms. We would then write 


V E = 


Ppol "b Po+h 
€() 


Substituting Pp 0 i from Eq. (32.9), 


V E = P '-± U - - ] VP 
e<> eo 

V (CqE" “b P) Pother* 


(32.11) 


The current density in the Maxwell equations for V X B also has, in general, 
contributions from bound atomic currents. We can therefore write 


/ ypol ~b 7oth(T) 

and the Maxwell equation becomes 


c 2 v X B = + ~ E ■ 

6o dt 


Using Eq. (32.10), we get 


CqC V X B — /other + + P )* 


(32 12) 


(32 13) 


Now you can see that if we were to define a new vector D by 

D ~ €q E + P 9 

the two field equations would become 


(32.14) 


V D = Pother (32.15) 

and 

e 0 c 2 V X B = y'other + * (32.16) 

These are actually the forms that Maxwell used for dielectrics. His two remaining 
equations were 


and 


V X E 


dB 

dt 9 


V B = 0, 


which are the same as we have been using. 

Maxwell and the other early workers also had a problem with magnetic 
materials (which we will take up soon) Because they did not know about the 
circulating currents responsible for atomic magnetism, they used a current density 
that was missing still another part Instead of Eq. (32.16), they actually wrote 

v X // = / + -- (32.17) 

where H differs from e 0 c 2 B because it includes the effects of atomic currents. 
(Then f represents what is left of the currents.) So Maxwell had jour field vectors— 
£, Z), B , and H — the D and H were hidden ways of not paying attention to what 
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was going on inside the material You will find the equations written this way in 
many places. 

To solve the equations, it is necessary to relate D and H to the other fields, 
and people used to write 


D = eE and B = fxH. (32.18) 

However, these relations are only approximately true for some materials and 
even then only if the fields are not changing rapidly with time. (For sinusoidally 
varying fields one often can write the equations this way by making e and p complex 
functions of the frequency, but not for an arbitrary time variation of the fields.) 
So there used to be all kinds of cheating in solving the equations. We think the 
right way is to keep the equations in terms of the fundamental quantities as we 
now understand them — and that’s how we have done it. 


32-3 Waves in a dielectric 


We want now to find out what kind of electromagnetic waves can exist in a 
dielectric material in which there are no extra charges other than those bound in 
atoms. So we take p = — V *7* and 7 = dP/dt. Maxwell’s equations then become 


V * P 

(a) V E - - — 
to 


(c) V X E = 


dB 

~dt 


(b) (£ + s) 

(d) V * B = 0 


(32.19) 


We can solve these equations as we have done before. We start by taking 
the curl of Eq. (32.19c): 

V X (V X E) = - ~ V X B. 

dt 

Next, we make use of the vector identity 

V X (V X E) = V(V • E) - V 2 E, 
and also substitute for V X B, using Eq. (32 19b); we get 

V(V ■ E) 


V 2 E = 


1 d 2 E 


j_ ar 

€ 0 C 2 dt 2 C 2 dt 2 


Using Eq. (32 19a) for v • E, we get 


V 2 E - 


i_ a^E 
c 2 dt 2 


1 / 

— V(V • P) + — -r^r 

€0 e 0 c 2 d t 2 


(32.20) 


So instead of the wave equation, we now get that the D’Alembertian of E is equal 
to two terms involving the polarization P. 

Since P depends on E, however, Eq. (32.20) can still have wave solutions. 
We will now limit ourselves to isotropic dielectrics, so that P is always in the same 
direction as E. Let’s try to find a solution for a wave going in the z-direction 
Then, the electric field might vary as . We will also suppose that the wave 

is polarized in the jc-direction — that the electric field has only an ^-component. 
We write 

E x = E 0 e‘ {at ~ kz \ (32.21) 

You know that any function of (z — vt) represents a wave that travels with 
the speed v. The exponent of Eq. (32.21) can be written as 



so, Eq. (32 21) represents a wave with the phase velocity 


i> P h = w/fc • 
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The index of refraction n is defined (see Chapter 31, Vol. I) by letting 

Vph = n' 

Thus Eq. (32 21) becomes 

E = nz ^ c ' ) 

So we can find n by finding what value of k is required if Eq. (32.21) is to satisfy 
the proper field equations, and then using 

n = — • (32.22) 

CO 

In an isotropic material, there will be only an x-component of the polarization; 
then P has no variation with the ;t-co ordinate, so V * P = 0, and we get rid of 
the first term on the right-hand side of Eq. (32.20) Also, since we are assuming a 
linear dielectric, P x will vary as e ltat , and d 2 P x /dt 2 = —oj 2 P x . The Laplacian in 
Eq. (32.20) becomes simply d 2 E x /dz 2 = ~k 2 E x , so we get 

-k 2 E x + ^E x = -~P*- (32.23) 

Now let us assume for the moment that since E is varying sinusoidally, we 
can set P proportional to E , as in Eq. (32.5). (We’ll come back to discuss this 
assumption later.) We write 

Px = e 0 NaE x . 

Then E x drops out of Eq. (32.23), and we find 

k 2 = ^ (1 + Na). (32.24) 

We have found that a wave like Eq. (32.21), with the wave number k given by 
Eq. (32 24), will satisfy the field equations. Using Eq. (32.22), the index n is given by 

n 2 = 1 + Na. (32.25) 

Let’s compare this formula with what we obtained in our theory of the index 
of a gas (Chapter 31, Vol. I). 

n = 

Taking a from Eq. (32.6), Eq 
« 2 = 1 

First, we have the new term in iYoj, because we are including the dissipation of 
the oscillators. Second, the left-hand side is n instead of n 2 , and there is an extra 
factor of 1/2. But notice that if A is small enough so that n is close to one (as it 
is for a gas), then Eq. (32.27) says that n 2 is one plus a small number: ^i 2 =1+6. 
We can then write n = \/l + e — 1 + e/2, and the two expressions are equiva- 
lent. Thus our new method gives for a gas the same result we found earlier. 

Now you might think that Eq. (32.27) should give the index of refraction 
for dense materials also. It needs to be modified, however, for several reasons. 
First, the derivation of this equation assumes that the polarizing field on each 
atom is the field E x . That assumption is not right, however, because in dense 
materials there is also the field produced by other atoms in the vicinity, which may 
be comparable to E x . We considered a similar problem when we studied the static 
fields in dielectrics. (See Chapter 11.) You will remember that we estimated the 
field at a single atom by imagining that it sat in a spherical hole in the surrounding 
dielectric. The field in such a hole — which we called the local field — is increased 


There, we got Eq (31.29), which is 

1 + iM L_ 

2 me 0 -c 2 + <o 2 
(32.25) would give us 


+ 


I 


Nqj 

we o _co 2 + /7o> + co§ 


(32.26) 


(32.27) 
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over the average field E by the amount P/ 3e 0 - (Remember, however, that this 
result is only strictly true in isotropic materials — including the special case of a 
cubic crystal.) 

The same arguments will hold for the electric field in a wave, so long as the 
wavelength of the wave is much longer than the spacing between atoms. Limiting 
ourselves to such cases, we write 

£iocai = E+ (32.28) 

oe 0 

This local field is the one that should be used for E in Eq. (32.3); that is, Eq.(32.8) 
should be rewritten : 

P = e 0 iVa£i ocal . (32.29) 

Using .Fi oca i from Eq. (32.28), we find 

P ~“N°( £ +w) 

or 

p <32 ' 30) 


In other words, for dense materials P is still proportional to E (for sinusoidal 
fields). However, the constant of proportionality is not e 0 Na, as we wrote below 
Eq. (32.23), but should be e 0 Na/[\ — (Na/2>)]. So we should correct Eq (32.25) 
to read 


n 


2 


1 + 


Na 

1 - (Na/ 3 ) 


(32.31) 


It will be more convenient if we rewrite this equation as 


3 


n 2 - 1 
n 2 + 2 


= Na, 


(32.32) 


which is algebraically equivalent. This is known as the Clausius-Mosotti equation. 

There is another complication in dense materials. Because neighboring atoms 
are so close, there are strong interactions between them. The internal modes of 
oscillation are, therefore, modified. The natural frequencies of the atomic oscilla- 
tions are spread out by the interactions, and they are usually quite heavily damped 
— the resistance coefficient becomes quite large. So the co 0 ’s and 7’s of the solid 
will be quite different from those of the free atoms. With these reservations, we 
can still represent a , at least approximately, by Eq. (32.7). We have then that 


3 ~ * _ Nqe y* fk 

«2 + 2 me 0 Y _ w 2 + nk0) + w 2 k 


(32.33) 


One final complication. If the dense material is a mixture of several compo- 
nents, each will contribute to the polarization. The total a will be the sum of the 
contributions from each component of the mixture [except for the inaccuracy of 
the local field approximation, Eq. (32.28), in ordered crystals — effects we discussed 
when analyzing ferroelectrics]. Writing N 3 as the number of atoms of each com- 
ponent per unit volume, we should replace Eq. (32.32) by 

3 ( = Z ( 32 - 34 > 

where each a 3 will be given by an expression like Eq. (32.7). Equation (32.34) 
completes our theory of the index of refraction. The quantity 3 (n 2 — 1 )/(n 2 + 2) 
is given by some complex function of frequency, which is the mean atomic polariz- 
ability a(co). The precise evaluation of a(co) (that is, finding/^, 7* and co 0 &) in dense 
substances is a difficult problem of quantum mechanics. It has been done from 
first principles only for a few especially simple substances. 
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32-4 The complex index of refraction 


We want to look now at the consequences of our result, Eq (32.33). First, 
we notice that a is complex, so the index n is going to be a complex number. What 
does that mean 9 Let's say that we write n as the sum of a real and an imaginary 
part: 

n = n R — ini, (32 35) 



/ 

/ 

/ 


Fig. 32-1. A graph of E x for some 
instant t , if n/ ~ nf?/27T. 


where nn and nr are real functions of to We write ini with a minus sign, so that nj 
will be a positive quantity in all ordinary optical materials. (In ordinary inactive 
materials — that are not, like lasers, light sources themselves — 7 is a positive number, 
and that makes the imaginary part of n negative.) Our plane wave of Eq. (32.21) 
is written m terms of n as 

E £ = E 0 e~ lOJ{t - nzl() 

Writing n as in Eq. (32.35), we would have 

E x = (32 36) 

The term e^i-n^tc) represents a wave travelling with the speed c/nn, so n fi 
represents what we normally think of as the index of refraction. But the amplitude 
of this wave is 

E 0 e - an * zlc 9 

which decreases exponentially with z A graph of the strength of the electric field 
at some instant as a function of z is shown in Fig. 32-1, for nr ~ nijlir. The 
imaginary part of the index represents the attenuation of the wave due to the 
energy losses in the atomic oscillators. The intensity of the wave is proportional 
to the square of the amplitude, so 


This is often written as 


Intensity 
Intensity * e^ z , 


where (5 = 2c oni/c is called the absorption coefficient. Thus we have in Eq (32.33) 
not only the theory of the index of refraction of materials, but the theory of their 
absorption of light as well. 

In what we usually consider to be transparent material, the quantity c/coni — 
which has the dimensions of a length — is quite large in comparison with the 
thickness of the material. 


32-5 The index of a mixture 

There is another prediction of our theory of the index of refraction that we 
can check against experiment. Suppose we consider a mixture of two materials. 
The index of the mixture is not the average of the two indexes, but should be 
given in terms of the sum of the two polarizabilities, as in Eq. (32.34). If we ask 
about the index of, say, a sugar solution, the total polarizability is the sum of the 
polarizability of the water and that of the sugar. Each must, of course, be cal- 
culated using for N the number per unit volume of the molecules of the particular 
kind. In other words, if a given solution has Ni molecules of water, whose polariz- 
ability is «!, and N 2 molecules of sucrose (C 12 H 22 Oix), whose polarizability is 
a 2 , we should have that 

3 ( 5 -^ 4 ) = Niai + Nza 2 - (32 37) 

We can use this formula to test our theory against experiment by measuring 
the index for various concentrations of sucrose in water. We are making several 
assumptions here, however. Our formula assumes that there is no chemical action 
when the sucrose is dissolved and that the disturbances to the individual atomic 
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Table 32-2 

Refractive index of sucrose solutions, and comparison with predictions of Eq. (32.37). 


Data from Handbook 


A 

B 

C 

Fraction of sucrose 

density 

n 

by weight 

(gm/cm 3 ) 

at 20°C 

0 a 

0.9982 

1.333 

0.30 

1.1270 

1.3811 

0.50 

1.2296 

1.4200 

0 85 

1.4454 

! 1.5033 

1.00 b 

1.588 

1.5577° 


D 

E 

F 

f G 

Moles of 
sucrose* 1 
per liter, 

Moles of 

water e 
per liter, 

\« 2 + 2/ 

N ia i 

N 2 /N q 

N]/No 



0 

55.5 

0617 

0 617 

0.970 

43.8 ! 

0 698 j 

0.487 

1.798 

; 34 15 

0.759 

0.379 

3.59 

12 02 

0.886 

■ 0.1335 

4 64 

0 

0.960 

: o 


H 


N 2 a 2 


0 

0.211 

0.380 

0.752 

0.960 


a pure water 
c average (see text) 

6 molecular weight of water = 18 


b sugar crystals 

d molecular weight of sucrose = 342 


oscillators are not too different for various concentrations. So our result is certainly 
only approximate. Anyway, let’s see how good it is. 

We have picked the example of a sugar solution because there is a good table 
of measurements of the index of refraction in the Handbook of Chemistry and 
Physics and also because sugar is a molecular crystal that goes into solution with- 
out ionizing or otherwise changing its chemical state. 

We give in the first three columns of Table 32-2 the data from the handbook. 
Column A is the percent of sucrose by weight, column B is the measured density 
(gm/cm 3 ), and column C is the measured index of refraction for light whose 
wavelength is 589.3 millimicrons. For pure sugar we have taken the measured 
index of sugar crystals. The crystals are not isotropic, so the measured index is 
different along different directions. The handbook gives three values: 

«i = 1.5376, n 2 = 1.5651, n 3 = 1.5705. 

We have taken the average. 

Now we could try to compute n for each concentration, but we don’t know 
what value to take for ai or a 2 * Let’s test the theory this way: We will assume 
that the polarizability of water (a^) is the same at all concentrations and compute 
the polarizability of sucrose by using the experiment of values for n and solving 
Eq. (38.27) for a 2 . If the theory is correct, we should get the same a 2 for all 
concentrations. 

First, we need to know N r and N 2 : let’s express them m terms of Avogadro’s 
number, N 0 . Let’s take one liter (1000 cm 3 ) for our unit of volume. Then NJNq is 
the weight per liter divided by the gram-molecular weight. And the weight per 
liter is the density (multiplied by 1000 to get grams per liter) times the fractional 
weight of either the sucrose or the water. In this way, we get N 2 /N {) and A^/TVq 
as in columns D and E of the table. 

In column F we have computed 3 (n 2 — 1 )/(n 2 + 2) from the experimental 
values of n in column C. For pure water, 3 (n 2 — 1 )/(n 2 + 2) is 0.617, which is 
equal to just N I#!. We can then fill in the rest of Column G, since for each row 
row G/E may be in the same ratio — namely, 0.617:55.5. Subtracting column G 
from column F, we get the contribution N 2 a 2 of the sucrose, shown in column H 
Dividing these entries by the values of N 2 /N 0 in column D, we get the value of 
N 0 a 2 shown in column J 

From our theory we would expect all the values of N 0 <x 2 to be the same They 
are not exactly equal, but pretty close. We can conclude that our ideas are fairly 
correct. Even more, we find that the polarizability of the sugar molecule doesn’t 
seem to depend much on its surroundings — its polarizability is nearly the same in a 
dilute solution as it is in the crystal. 


J 

N( } a 2 

(gm/hter) 


0 213 
0 211 

0 210 
0.207 
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32-6 Waves in metals 


The theory we have worked out in this chapter for solid materials can also 
be applied to good conductors, like metals, with very little modification. In metals 
some of the electrons have no binding force holding them to any particular atom; 
it is these “free” electrons which are responsible for the conductivity. There are 
other electrons which are bound, and the theory above is directly applicable to 
them. Their influence, however, is usually swamped by the effects of the con- 
duction electrons. We will consider now only the effects of the free electrons 

If there is no restoring force on an electron — but still some resistance to its 
motion — its equation of motion differs from Eq. (32.1) only because the term in 
loIx is lacking. So all we have to do is set = 0 in the rest of our derivations — 
except that there is one more difference. The reason that we had to distinguish 
between the average field and the local field in a dielectric is that in an insulator 
each of the dipoles is fixed in position, so that it has a definite relationship to the 
position of the others. But because the conduction electrons in a metal move 
around all over the place, the field on them on the average is just the average field 
E. So the correction we made to Eq. (32 5) by using Eq. (32.28) should not be 
made for conduction electrons Therefore the formula for the index of refraction 
for metals should look like Eq. (32.27), except with co 0 set equal to zero, namely, 


2 , , Nqt 1 

n = 1 H ~y j — 

me 0 -to- + /Tto 


(32 38) 


''drift 



This is only the contribution from the conduction electrons, which we will assume 
is the major term for metals 

Now we even know how to find what value to use for 7 , because it is related 
to the conductivity of the metal. In Chapter 43 of Volume I we discussed how the 
conductivity of a metal comes from the diffusion of the free electrons through the 
crystal. The electrons go on a jagged path from one scattering to the next, and 
between scatterings they move freely except for an acceleration due to any average 
electric field (as shown in Fig 32-2). We found m Chapter 43 of Volume I that 
the average drift velocity is just the acceleration times the average time r between 
collisions. The acceleration is q c E/m , so 


Fig. 32-2. 
electron. 


The motion of a 


free 


f'drift 


q e E 

m 


(32.39) 


This formula assumed that E was constant, so that r <(nft was a steady velocity. 
Since there is no average acceleration, the drag force is equal to the applied force. 
We have defined 7 by saying that ymv is the drag force [see Eq. (32.1)], which is 
q e E; therefore we have that 

7 = - • (32.40) 

T 


Although we cannot easily measure r directly, we can determine it by measur- 
ing the conductivity of the metal. It is found experimentally that an electric field E 
in a metal produces a current with the density j proportional to E (for isotropic 
materials): 

j = <rE . 


The proportionality constant a is called the conductivity. This is just what we expect 
from Eq. (32.39) if we set 

j = 

Then 


a 


Nqt 

m 


(32.41) 


So r — and therefore 7 — can be related to the observed electrical conductivity. 
Using Eqs (32.40) and (32 41) ? we can rewrite our formula for the index, Eq. 
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(32.42) 


(32.38), in the following form: 

n 2 

where 


1 + 


O'/co 


ico(l + /cor) 


T 7 


mo* 

A ^ 2 


This is a convenient formula for the index of refraction of metals. 


(32.43) 


32-7 Low-frequency and high-frequency approximations ; the skin depth and the 
plasma frequency 


Our result, Eq. (32.42), for the index of refraction for metals predicts quite 
different characteristics for wave propagation at different frequencies. Let’s first 
see what happens at very low frequencies. If co is small enough, we can approximate 
Eq. (32.42) by 

n 2 = -i — • (32.44) 

Now, as you can check by taking the square,* 



so for low frequencies, 

n = V<x72e^(l - /). (32.45) 


The real and imaginary parts of n have the same magnitude. With such a large 
imaginary part to n , the wave is rapidly attenuated in the metal. Referring to 
Eq. (32.36), the amplitude of a wave going in the z-direction decreases as 


Let’s write this as 


exp[— vcrco 2e 0 c 2 z], 
e~ z! \ 


(32.46) 

(32.47) 


where 8 is then the distance in which the wave amplitude decreases by the factor 
e~ } = 1/2.72 — or roughly one-third. The amplitude of such a wave as a function 
of z is shown in Fig. 32-3. Since electromagnetic waves will penetrate into a 
metal only this distance, 8 is called the skin depth . It is given by 

8 = \/2eoC 2 /<TGj. (32.48) 



Fig. 32-3. The amplitude of a trans- 
verse electromagnetic wave as a function 
of distance into a metal. 


Now what do we mean by “low” frequencies? Looking at Eq. (32.42), we 
see that it can be approximated by Eq. (32.44) only if c or is much less than one 
and if (joe 0 /a is also much less than one — that is, our low-frequency approximation 
applies when 

1 

co « - 

T 

and 

0>«-- (32.49) 

Co 


Let’s see what frequencies these correspond to for a typical metal like copper. 
We compute r by using Eq. (32.43), and c r/e 0 , by using the measured conductivity. 
We take the following data from a handbook: 


<t = 5.76 X 10 7 (ohm-meter) -1 , 
atomic weight = 63.5 grams, 
density = 8.9 grams — cm - / 

Avogadro’s number = 6.02 X 10 2,3 (gram atomic weight) -1 


* Or writing — i = e _17r/2 ; V — i ~ e~ lirl 4 = cos7t/4 — /sin7r/4, which gives the 
same result. 
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If we assume that there is one free electron per atom, then the number of electrons 
per cubic meter is 

N = 8.5 X 10 28 meter”\ 


Using 


we get 


<le 

= 1.6 

X 

10“ 

19 coulomb, 

^0 

= 8.85 

X 

10‘ 

“ 12 farad-meter” 1 

m 

= 9.11 

X 

10 

" 31 kgm, 

r 

= 2.4 

X 

10“ 

14 sec, 

1 

= 4.1 

X 

10 1: 

* sec” 1 . 


T 


— = 6.5 X 10 18 sec 
e o 


So for frequencies less than about 10 12 cycles per second, copper will have the 
“low-frequency” behavior we describe (that means for waves whose free-space 
wavelength is longer than 0.3 millimeters— very short radio waves!). 

For these waves, the skin depth in copper is 


8 


to.028 m 2 -sec~ 1 
co 


For microwaves of 10,000 megacycles per second (3-cm waves) 

8 = 6.7 X 10 -4 cm. 


The wave penetrates a very small distance. 

We can see from this why in studying cavities (or waveguides) we needed to 
worry only about the fields inside the cavity, and not in the metal or outside the 
cavity. Also, we see why the losses in a cavity are reduced by a thin plating of 
silver or gold. The losses come from the current, which are appreciable only in a 
thin layer equal to the skin depth. 

Suppose we look now at the index of a metal like copper at high frequencies. 
For very high frequencies cor is much greater than one, and Eq. (32.42) is well 
approximated by 

n 2 = 1 ■ (32.50) 

e 0 w 2 r 


For waves of high frequencies the index of a metal becomes real — and less than 
one! This is also evident from Eq. (32.38) if the dissipation term with 7 is neglected, 
as can be done for very large co. Equation (32.38) gives 


n 


2 


Nq 2 e 

me 0 co- 


(32.51) 


which is, of course, the same as Eq. (32.50). We have seen before the quantity 
Nqe/me { ) , which we called the square of the plasma frequency (Section 7-3): 


co 


2 

V 


Nql : 
e {) m 


so we can write Eq. (32.50) or Eq. (32.51) as 


n 


2 


1 - 



2 


The plasma frequency is a kind of “critical” frequency. 

For co < (Op the index of a metal has an imaginary part, and waves are 
attenuated; but for co » co p the index is real, and the metal becomes transparent. 
You know, of course, that metals are reasonably transparent to x-rays. But 
some metals are even transparent in the ultraviolet. In Table 32-3 we give for 
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several metals the experimental observed wavelength at which they begin to become 
transparent. In the second column we give the calculated critical wavelength 
\ p = 2irc/a) p . Considering that the experimental wavelength is not too well 
defined, the fit of the theory is fairly good. 

You may wonder why the plasma frequency oo p should have anything to do 
with the propagation of electromagnetic waves in metals. The plasma frequency 
came up in Chapter 7 as the natural frequency of density oscillations of the free 
electrons. (A clump of electrons is repelled by electric forces, and the inertia of the 
electrons leads to an oscillation of density.) So longitudinal plasma waves are 
resonant at c o p . But we are now talking about transverse electromagnetic waves, 
and we have found that transverse waves are absorbed for frequencies below co p . 
(It’s an interesting and not accidental coincidence.) 

Although we have been talking about wave propagation in metals, you ap- 
preciate by this time the universality of the phenomena of physics — that it doesn’t 
make any difference whether the free electrons are in a metal or whether they are 
in the plasma of the ionosphere of the earth, or in the atmosphere of a star. To 
understand radio propagation in the ionosphere, we can use the same expressions — 
using, of course, the proper values for N and r. We can see now why long radio 
waves are absorbed or reflected by the ionosphere, whereas short waves go right 
through. (Short waves must be used for communication with satellites.) 

We have talked about the high- and low-frequency extremes for wave propaga- 
tion in metals. For the in-between frequencies the full-blown formula of Eq. 
(32.42) must be used. In general, the index will have real and imaginary parts; 
the wave is attenuated as it propagates into the metal. For very thin layers, metals 
are somewhat transparent even at optical frequencies. As an example, special 
goggles for people who work around high-temperature furnaces are made by 
evaporating a thin layer of gold on glass. The visible light is transmitted fairly 
well — with a strong green tinge — but the infrared is strongly absorbed. 

Finally, it cannot have escaped the reader that many of these formulas re- 
semble in some ways those for the dielectric constant k discussed in Chapter 10. 
The dielectric constant k measures the response of the material to a constant field, 
that is, for w = 0. If you look carefully at the definition of n and k you see that 
k is simply the limit of n 2 as o> — > 0. Indeed, placing co — 0 and n 2 = k in equa- 
tions of this chapter will reproduce the equations of the theory of the dielectric 
constant of Chapter 11. 


Table 32-3* 

Wavelengths below which the metal 
becomes transparent 


Metal 

X (experimental) 

X p =■ 2irc/i 

Ll 1 

1550 A 

1550 A 

Na 

2100 

2090 

K 

3150 i 

2870 

Rb 

3400 ! 

3220 


* From: C. Kittel, Introduction to Solid 
State Physics , John Wiley and Sons, Inc., 
New York, 2nd ed., 1956, p. 266. 
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33 


ileflcction from Surfaces 


33-1 Reflection and refraction of light 

The subject of this chapter is the reflection and refraction of light — or electro- 
magnetic waves in general — at surfaces. We have already discussed the laws of 
reflection and refraction in Chapter 35 of Volume I. Here’s what we found out 
there : 

1. The angle of reflection is equal to the angle of incidence. With the angles 
defined as shown in Fig. 33-1, 

0 r = 6 % . (33.1) 


2. The product n sin 9 is the same for the incident and transmitted beams 
(Snell’s law). 


n i sin 9 % = n 2 sin 9 t . 


(33.2) 


3. The intensity of the reflected light depends on the angle of incidence and 
also on the direction of polarization. For E perpendicular to the plane of 
incidence, the reflection coefficient R x is 


I r = sin 2 (6 t — d t ) 
I, sin 2 (6 t + d t ) 


(33.3) 


For E parallel to the plane of incidence, the reflection coefficient R\\ is 


R = [r = tan 2 {d t - dt) _ 

11 1 1 tan 2 (d t + d t ) 

4. For normal incidence (any polarization, of course!), 

Ir = / n 2 - «A 2 

7 , \n 2 + nj 


(33.4) 


(33.5) 


(Earlier, we used i for the incident angle and r for the refracted angle Since we 
can’t use r for both “refracted” and “reflected” angles, we are now using 6 % = 
incident angle, 0, = reflected angle, and 9 t = transmitted angle.) 

Our earlier discussion is really about as far as anyone would normally need 
to go with the subject, but we are going to do it all over again a different way 
Why' 1 One reason is that we assumed before that the indexes were real (no ab- 
sorption in the materials) But another reason is that you should know how to 
deal with what happens to waves at surfaces from the point of view of Maxwell’s 
equations. We’ll get the same answers as before, but now from a straightforward 
solution of the wave problem, rather than by some clever arguments. 

We want to emphasize that the amplitude of a surface reflection is not a 
property of the material , as is the index of refraction It is a “surface property,” 
one that depends precisely on how the surface is made. A thin layer of extraneous 
junk on the surface between two materials of indices n \ and n 2 will usually change 
the reflection. (There are all kinds of possibilities of interference here — like the 
colors of oil films Suitable thickness can even reduce the reflected amplitude to 
zero for a given frequency; that’s how coated lenses are made.) The formulas 
we will derive are correct only if the change of index is sudden — within a distance 
very small compared with one wavelength. For light, the wavelength is about 
5000 A, so by a “smooth” surface we mean one in which the conditions change in 
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Fig. 33-1. Reflection and refraction 
of light waves at a surface. (The wave 
directions are normal to the wave crests.) 
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going a distance of only a few atoms (or a few angstroms). Our equations will 
work for light for highly polished surfaces. In general, if the index changes grad- 
ually over a distance of several wavelengths, there is very little reflection at all. 


33-2 Waves in dense materials 



Fig. 33-2. For a wave moving in the 
direction k, the phase at any point P is 
U of — k • r). 


First, we remind you about the convenient way of describing a sinusoidal 
plane wave we used in Chapter 36 of Volume I. Any field component in the wave 
(we use E as an example) can be written in the form 

E = E Q e n “ l ~ k ‘ r) , (33.6) 

where E represents the amplitude at the point r (from the origin) at the time /. 
The vector k points in the direction the wave is travelling, and its magnitude 
\k\ = k = 2ir/\ is the wave number. The phase velocity of the wave is = co//c, 
for a light wave in a material of index n, y p i, = c/n, so 


Suppose k is in the z-direction, then k - r is just kz, as we have often used it For 
k in any other direction, we should replace z by />, the distance from the origin 
in the ^-direction; that is, we should replace kz by kr k , which is just k r . (See 
Fig. 33-2.) So Eq. (33.6) is a convenient representation of a wave m any direction. 
We must remember, of course, that 


k ' r = k x x + k y y + k z z, 

where k x , k u , and k z are the components of k along the three axes. In fact, we 
pointed out once that (go, k x , k y , k z ) is a four-vector, and that its scalar product 
with (/, x, y, z) is an invariant. So the phase of a wave is an invariant, and Eq. 
(33.6) could be written 

E = E 0 e lkuXu . 


But we don’t need to be that fancy now. 

For a sinusoidal E, as in Eq. (33.6), dE/dt is the same as iwE, and dE/dx is 

— ik x E , and so on for the other components. You can see why it is very convenient 
to use the form in Eq. (33 6) when working with differential equations — dilferentia- 
tions are replaced by multiplications. One further useful point: The operation 
V = (d/dx, d/dy, d/dz) gets replaced by the three multiplications ( — ik x , — ik Vi 

— ik z ). But these three factors transform as the components of the vector k , so 
the operator v gets replaced by multiplication with —ik: 

d 

et lU; 

V -» — ik. (33.8) 

This remains true for any V operation — whether it is the gradient, or the diver- 
gence, or the curl. For instance, the z-component of V X E is 

dE y __ d_E x . 
dx dy 

If both E y and E x vary as e^ k r , then we get 

ik x Ey ikyE x , 

which is, you see, the z-component of — ik X E. 

So we have the very useful general fact that whenever you have to take the 
gradient of a vector that varies as a wave in three dimensions (they are an important 
part of physics), you can always take the derivations quickly and almost without 
thinking by remembering that the operation V is equivalent to multiplication by 
-ik. 
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For instance, the Faraday equation 


becomes for a wave 
This tells us that 


V X E 


dB 

~dl 


— ik X E ~ —iooB. 


k X E 
oo 


(33.9) 


which corresponds to the result we found earlier for waves in free space — that B , 
in a wave, is at right angles to E and to the wave direction. (In free space, oo/k — 
c.) You can remember the sign in Eq. (33 9) from the fact that k is in the direction 
of PoyntingY vector S = c 2 E X B. 

If you use the same rule with the other Maxwell equations, you get again the 
results of the last chapter and, in particular, that 

2 2 

k k = k 2 = • (33 10) 


But since we know that, we won’t do it again. 

If you want to entertain yourself, you can try the following terrifying problem 
that was the ultimate test for graduate students back in 1890: solve Maxwell’s 
equations for plane waves in an anisotropic crystal, that is, when the polarization 
P is related to the electric field E by a tensor of polarizability. You should, of 
course, choose your axes along the principal axes of the tensor, so that the relations 
are simplest (then P x = a a E x , P y = a ft E u , and P z — a c E z ), but let the waves 
have an arbitrary direction and polarization. You should be able to find the rela- 
tions between E and B, and how k varies with direction and wave polarization. 
Then you will understand the optics of an anisotropic crystal. It would be best 
to start with the simpler case of a birefringent crystal — like calcite — for which 
two of the polarizabilities are equal (say, at, = a c ), and see if you can understand 
why you see double when you look through such a crystal If you can do that, 
then try the hardest case, in which all three a’s are different. Then you will know 
whether you are up to the level of a graduate student of 1890. In this chapter, 
however, we will consider only isotropic substances 



We know from experience that when a plane wave arrives at the boundary 
between two different materials — say, air and glass, or water and oil — there is a 
wave reflected and a wave transmitted Suppose we assume no more than that and 
see what we can work out. We choose our axes with the yz-p \ ane in the surface 
and the x^-plane perpendicular to the incident wave surfaces, as shown in Fig. 33-3. 
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The electric vector of the incident wave can then be written as 

E l = E 0 e l(ut ~ kr \ (33.11) 

Since k is perpendicular to the z-axis, 

k- r = k x x + k y y, (33 12) 

We write the reflected wave as 

E r = E f {) e lia} ' t - t ' , - r \ (33.13) 

so that its frequency is co', its wave number is k\ and its amplitude is (We 
know, of course, that the frequency is the same and the magnitude of k is the same 
as for the incident wave, but we are not going to assume even that. We will let it 
come out of the mathematical machinery.) Finally, we write for the transmitted 
wave, 

E t = (33.14) 


We know that one of Maxwell’s equations gives Eq (33.9), so for each of the 
waves we have 

B t = h * , B r = k X , E ‘ , B, = k -^ 7 ^ t - (33.15) 



Fig. 33-4. A boundary condition 
Ey 2 — Ey\ is obtained from Eds = 0. 


Also, if we call the indexes of the two media ni and n 2 , we have from Eq. (33.10) 

k 2 = k 2 + kt = -~ x - ■ (33 16) 

Since the reflected wave is in the same material, then 

/2 2 


k' 2 = w ni 


whereas for the transmitted wave, 


2 W" 2 «| 


k"* = 


(33.17) 


(33.18) 


33-3 The boundary conditions 

All we have done so far is to describe the three waves; our problem now is 
to work out the parameters of the reflected and transmitted waves in terms of 
those of the incident wave. How can we do that? The three waves we have de- 
scribed satisfy Maxwell’s equations in the uniform material, but Maxwell’s equa- 
tions must also be satisfied at the boundary between the two different materials. 
So we must now look at what happens right at the boundary. We will find that 
Maxwell’s equations demand that the three waves fit together in a certain way. 

As an example of what we mean, the y-component of the electric field E must 
be the same on both sides of the boundary. This is required by Faraday’s law, 


^ X E = — — 


BB 

dt : 


(33.19) 


as we can see in the following way. Consider a little rectangular loop F which 
straddles the boundary, as shown in Fig 33-4. Equation (33.19) says that the line 
integral of E around T is equal to the rate of change of the flux of B through the 
loop: 

r 

d_ 
d~t 


l 


E‘ds = - 


f B " 


da . 


Now imagine that the rectangle is very narrow, so that the loop encloses an in- 
finitesimal area. If B remains finite (and there's no reason it should be infinite 
at the boundary!) the flux through the area is zero So the line integral of E must 
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be zero. If E u \ and E u 2 are the components of the field on the two sides of the 
boundary and if the length of the rectangle is /, we have 

E, n I - E„ 2 l = 0 
or 

E « i - E u 2 , (33.20) 

as we have said. This gives us one relation among the fields of the three waves. 

The procedure of working out the consequences of Maxwell’s equations at 
the boundary is called “determining the boundary conditions.” Ordinarily, it is 
done by finding as many equations like Eq. (33 20) as one can, by making argu- 
ments about little rectangles like V in Fig. 33-4, or by using little gaussian surfaces 
that straddle the boundary Although that is a perfectly good way of proceeding, 
it gives the impression that the problem of dealing with a boundary is different 
for every different physical problem 

For example, in a problem of heat flow across a boundary, how are the tem- 
peratures on the two sides related? Well, you could argue, for one thing, that the 
heat flow to the boundary from one side would have to equal the flow away from 
the other side. It is usually possible, and generally quite useful, to work out the 
boundary conditions by making such physical arguments. There may be times, 
however, when in working on some problem you have only some equations, and 
you may not see right away what physical arguments to use. So although we are 
at the moment interested only in an electromagnetic problem, where we can make 
the physical arguments, we want to show you a method that can be used for any 
problem — a general way of finding what happens at a boundary directly from the 
differential equations 

We begin by writing all the Maxwell equations for a dielectric — and this time 
we are very specific and write out explicitly all the components: 


X E = 


X P 


Co 


t)E r 

v dx dy 


dE z 
7 )T 


dP, <w\, sp z 

dx dy dz 


(33.21) 
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dt 
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dBy 
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(33.22a) 


(33.22b) 
(33 22c) 


(33 23) 


(33.24a) 


(33.24b) 


(33.24c) 
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Fig. 33-5. The fields in the transition 
region (3) between two different ma- 
terials in regions (1) and (2). 


Now these equations must all hold in region 1 (to the left of the boundary) 
and in region 2 (to the right of the boundary). We have already written the solu- 
tions in regions 1 and 2. Finally, they must also be satisfied in the boundary, which 
we can call region 3. Although we usually think of the boundary as being sharply 
discontinuous, in reality it is not. The physical properties change very rapidly 
but not infinitely fast. In any case, we can imagine that there is a very rapid, but 
continuous , transition of the index between region 1 and 2, in a short distance we 
can call region 3. Also, any field quantity like P T , or E lf , etc., will make a similar 
kind of transition in region 3. In this region, the differential equations must still 
be satisfied, and it is by following the differential equations in this region that we 
can arrive at the needed “boundary conditions.” 

For instance, suppose that we have a boundary between vacuum (region 1) 
and glass (region 2). There is nothing to polarize in the vacuum, so P i = 0. 
Let's say there is some polarization P 2 m the glass. Between the vacuum and the 
glass there is a smooth, but rapid, transition If we look at any component of 
P , say P Z9 it might vary as drawn in Fig. 33-5(a). Suppose now we take the first 
of our equations, Eq (33.21). It involves derivatives of the components of P with 
respect to x, y, and z. The y- and z-denvatives are not interesting; nothing spec- 
tacular is happening in those directions. But the ^-derivative of P x will have some 
very large values in region 3, because of the tremendous slope of P x . The derivative 
<)P x /dx will have a sharp spike at the boundary, as shown in Fig. 33-5(b). If we 
imagine squashing the boundary to an even thinner layer, the spike would get 
much higher If the boundary is really sharp for the waves we are interested in, 
the magnitude of dP x /dx in region 3 will be much, much greater than any contribu- 
tions we might have from the variation of P m the wave away from the boundary— 
so we ignore any variations other than those due to the boundary. 

Now how can Eq. (33 21) be satisfied if there is a whopping big spike on the 
right-hand side? Only if there is an equally whopping big spike on the other side. 
Something on the left-hand side must also be big. The only candidate is 3E x /dx, 
because the variations with y and z are only those small effects in the wave we just 
mentioned. So — e 0 (dE/dx) must be as drawn in Fig. 33-5(c)— just a copy of 
3P x /dx. We have that 

c dE* _ _ dP* J 

0 dx dx * 

If we integrate this equation with respect to x across region 3, we conclude that 

£q(Ex 2 ~~ E xl ) = ~{P X 2 F xl ). (33.25) 

In other words, the jump in e 0 E x in going from region 1 to region 2 must be equal 
to the jump in ~P X . 

We can rewrite Eq. (33.25) as 

£ o E x 2 -I- P x 2 = € 0 E xl + P x i, (33.26) 

which says that the quantity (e {) E x + P x ) has equal values in region 2 and region 1. 
People say: the quantity (e 0 E x + P x ) is continuous across the boundary. We have, 
in this way, one of our boundary conditions. 

Although we took as an illustration the case in which P x was zero because 
region 1 was a vacuum, it is clear that the same argument applies for any two 
materials in the two regions, so Eq. (33.26) is true in general. 

Let’s now go through the rest of Maxwell’s equations and see what each of 
them tells us. We take next Eq. (33.22a). There are no .x-derivattves, so it doesn’t 
tell us anything. (Remember that the fields themselves do not get especially large 
at the boundary; only the derivatives with respect to x can become so huge that 
they dominate the equation.) Next, we look at Eq. (33 22b). Ah T There is an 
x-derivative! We have dE z /dx on the left-hand side. Suppose it has a huge de- 
rivative But wait a moment! There is nothing on the right-hand side to match it 
with; therefore E z cannot have any jump in going from region 1 to region 2. 
[If it did, there would be a spike on the left of Eq. (33.22a) but none on the right, 
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and the equation would be false ] So we have a new condition: 

F 2 = E zl . (33.27) 

By the same argument, Eq (33.22c) gives 

E v 2 = E h i. (33.28) 

This last result is just what we got in Eq. (33 20) by a line integral argument. 

We go on to Eq. (33 23) The only term that could have a spike is dB x /dx. 
But there’s nothing on the right to match it, so we conclude that 

B r 2 = B rl . (33.29) 


On to the last of Maxwell’s equations! Equation (33 24a) gives nothing, 
because there are no a - derivatives Equation (33 23b) has one, —c 2 3B z /dx, but 
again, there is nothing to match it with. We get 


B z 2 - B zl . (33.30) 

The last equation is quite similar, and gives 

B v 2 - By !. (33.31) 


The last three equations gives us that B 2 = B\. We want to emphasize, 
however, that we get this result only when the materials on both sides of the 
boundary are nonmagnetic — or rather, when we can neglect any magnetic effects 
of the materials. This can usually be done for most materials, except ferromagnetic 
o^es (We will treat the magnetic properties of materials in some later chapters.) 

| Our program has netted us the six relations between the fields in region 1 and 
those in region 2. We have put them all together in Table 33-1. We can now use 
them to match the waves in the two regions. We want to emphasize, however, that 
the idea we have just used will work in any physical situation in which you have 
differential equations and you want a solution that crosses a sharp boundary 
between two regions where some property changes. For our present purposes, 
we could have easily derived the same equations by using arguments about the 
fluxes and circulations at the boundary. (You might see whether you can get the 
same result that way.) But now you have seen a method that will work in case you 
ever get stuck and don't see any easy argument about the physics of what is happen- 
ing at the boundary — you can just work with the equations. 


Table 33-1 

Boundary conditions at the surface of 
dielectric 

(cqE I + P l)x = (tvE'2 + P'j)s 

(E i) v = (E 2 ) v 
(E\) z = (E 2 ) z 
B i - B> 

(The surface is in the vc-plane) 


33-4 The reflected and transmitted waves 

Now we are ready to apply our boundary conditions to the waves we wrote 
down in Section 33-2. We had: 


E t = 


(33 32) 

E r = 

i(w' t — hzX—kyij) 

(33.33) 

E t = 

pff >(co" t—k'x'x— h'y’y) 

/z. o c , 

(33.34) 

B, = 

k X E, 

(33.35) 

CO 

Br = 

X E r 

(33.36) 

0)' 

Bt = 

k" X E t 
u" 

(33 37) 


We have one further bit of knowledge: E is perpendicular to its propagation 
vector k for each wave. 
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Fig. 33-6. Polarization of the re- 
flected and transmitted waves when the 
E-field of the incident wave is perpendicu- 
lar to the plane of incidence. 


The results will depend on the direction of the E-vector (the “polarization”) 
of the incoming wave. The analysis is much simplified if we treat separately the case 
of an incident wave with its E-vector parallel to the “plane of incidence” (that is, 
the .xy-plane) and the case of an incident wave with the E-vector perpendicular to 
the plane of incidence. A wave of any other polarization is just a linear combina- 
tion of two such waves. In other words, the reflected and transmitted intensities 
are different for different polarizations, and it is easiest to pick the two simplest 
cases and treat them separately. 

We will carry through the analysis for an incoming wave polarized per- 
pendicular to the plane of incidence and then just give you the result for the other. 
We are cheating a little by taking the simplest case, but the principle is the same 
for both. So we take that E % has only a z-component, and since all the E-vectors 
are in the same direction we can leave off the vector signs. 

So long as both materials are isotropic, the induced oscillations of charges in 
the material will also be in the z-direction, and the E-field of the transmitted and 
radiated waves will have only z-components. So for all the waves, E x and E y 
and P x and P y are zero. The waves will have their E- and E-vectors as drawn in 
Fig. 33-6 (We are cutting a corner here on our original plan of getting everything 
from the equations. This result would also come out of the boundary conditions, 
but we can save a lot of algebra by using the physical argument When you have 
some spare time, see if you can get the same result from the equations. It is clear 
that what we have said agrees with the equations; it is just that we have not shown 
that there are no other possibilities.) 

Now our boundary conditions, Eqs. (33 26) through (33.31), give relations 
between the components of E and E in regions 1 and 2. For region 2 we have only 
the transmitted wave, but in region 1 we have two waves. Which one do we use? 
The fields in region 1 are, of course, the superposition of the fields of the incident 
and reflected waves. (Since each satisfies Maxwell’s equations, so does the sum.) 
So when we use the boundary conditions, we must use that 


' E x = E l + E r , E 2 = £*, 
and similarly for the E’s. 

For the polarization we are considering, Eqs. (33.26) and (33.28) give us no 
new information; only Eq (33.27) is useful. It says that 

E t + Er = E t 

at the boundary , that is, for x = 0. So we have that 

= E[je ,(aJ " (33.38) 

which must be true for all t and for all y. Suppose we look first at y = 0. Then we 
have 

E 0 e Mt + = El{e lu "* 

This equation says that two oscillating terms are equal to a third oscillation. 
That can happen only if all the oscillations have the same frequency. (It is im- 
possible for three — or any number — of such terms with different frequencies to 
add to zero for all times.) So 

co" ~ cu' = co. (33.39) 


As we knew all along, the frequencies of the reflected and transmitted waves are 
the same as that of the incident wave. 

We should really have saved ourselves some trouble by putting that in at the 
beginning, but we wanted to show you that it can also be got out of the equations. 
When you are doing a real problem, it is usually the best thing to put everything you 
know into the works right at the start and save yourself a lot of trouble. 

By definition, the magnitude of k is given by k 2 = n 2 co 2 /c 2 , so we have also 

that 


k" 2 = k’ 2 = k 2 
n\ n\ n\ 


(33.40) 
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Now look at Eq. (33.38) for t = 0. Using again the same kind of argument 
we have just made, but this time based on the fact that the equation must hold 
for all values of y , we get that 

*" = k' v = k y . (33.41) 

From Eq. (33.40), k f 2 - k 2 , so 

k f f 4- k ( y = k* 4- k^. 

Combining this with Eq. (33.41), we have that 

k tl — k 2 

or that k' = ±/c x . The positive sign makes no sense; that would not give a 
reflected wave, but another incident wave, and we said at the start that we were 
solving the problem of only one incident wave. So we have 

k' r = -k x . (33 42) 

The two equations (33.41) and (33.42) give us that the angle of reflection is equal 
to the angle of incidence, as we expected. (See Fig. 33—3 ) The reflected wave is 

E , - (33.43) 


For the transmitted wave we already have that 


and 


k 


yy 


k " 2 _ k 2 . 
nf ’ 


so we can solve these to find k". We get 


L //2 


= k " 2 - k " 2 = 


n\ 


k 2 

Ky. 


(33.44) 


(33.45) 


Suppose for a moment that n x and n> are real numbers (that the imaginary 
parts of the indexes are very small). Then all the k’s are also real numbers, and 
from Fig. 33-3 we find that 

^7 = sin 0 ,, jp = sin 6 t . (33.46) 

From (33.44) we get that 

n 2 sin 6 t = tti sin 6 t> (33.47) 

which is Snell’s law of refraction — again, something we already knew. If the 
indexes are not real, the wave numbers are complex, and we have to use Eq. (33.45). 
[We could still define the angles 6 t and 6 t by Eq. (33.46), and Snell’s law, Eq. (33.47), 
would be true in general. But then the “angles” also are complex numbers, thereby 
losing their simple geometrical interpretation as angles. It is best then to describe 
the behavior of the waves by their complex k x or k'f values ] 

So far, we haven’t found anything new. We have just had the simple-minded 
delight of getting some obvious answers from a complicated mathematical ma- 
chinery. Now we are ready to find the amplitudes of the waves which we have 
not yet known. Using our results for the co’s and k’s, the exponential factors in 
Eq. (33.38) can be cancelled, and we get 

£ 0 + E' {) = Ef (33 48) 

Since both E{ } and E[{ are unknown, we need one more relationship. We must 
use another of the boundary conditions. The equations for E x and E y are no help, 
because all the £’s have only a z-component So we must use the conditions on 
B. Let’s try Eq. (33 29): 

Ex2 “ Bxi. 
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From Eqs. (33.35) through (33.37), 


B rl = 


CO 


R - K > E ' 

L>xr f ’ 

CO 


a;/£, 


CO 


Recalling that to'' = co' = co and k” = k! n = k f/ , we get that 

E {) + E{, = £{f. 


But this is just Eq. (33 48) all over again 1 We’ve just wasted time getting something 
we already knew. 

We could try Eq. (33.30), B z2 = B z but there are no z-components of B ] 
So there’s only one equation left: Eq. (33.31), B ij2 = B, n . For the three waves. 


B vi = - 


k x E, 


Byr 


k'xkk 

Co' 


B yt ~ 


k"E, 


(33.49) 


Putting for E 1 , and E t the wave expression for x = 0 (to be at the boundary), 
the boundary condition is 


CO 


Lr f , t 

^ x rv t—kyy ) 

/ 

CO 


hft ,, 

ev/ u 

co' 7 



Fig, 33-7. Polarization of the waves 
when the E-field of the incident wave is 
parallel to the plane of incidence. 


Again all co’s and ky s are equal, so this reduces to 

k x E y) + k x E' 0 = k'JE’i (33.50) 

This gives us an equation for the E’s that is different from Eq. (33 48). With the 
two, we can solve for Eq and Eq. Remembering that k\ = — k x , we get 

E' a = Yf+% E <>’ (33 51 ) 

Ik 

EH = E °- C33.52) 

These, together with Eq. (33.45) or Eq. (33 46) for k”, give us what we wanted to 
know. We will discuss the consequences of this result in the next section. 

If we begin with a wave polarized with its ZT-vector parallel to the plane of 
incidence, E will have both x- and j-components, as shown in Fig. 33-7. The 
algebra is straightforward but more complicated (The work can be somewhat 
reduced by expressing things in this case in terms of the magnetic fields, which are 
all in the z-direction.) One finds that 


[£.'>! 

_ njk x : - ■ njk" 

\E 0 \ 

(33 53) 


n\k r + nik'J 



EH 1 

lnin 2 k x 

\Eo\- 

(33 54) 


n 2 k x + n\k x 




Let’s see whether our results agree with those we got earlier Equation (33 3) 
is the result we worked out in Chapter 35 of Volume 1 for the ratio of the intensity 
of the reflected wave to the intensity of the incident wave Then, however, we were 
considering only real indexes For real indexes (and A’s), we can write 


k x = k cos d % = cos 
c 

kx = k ft cos 6 t — cos 0 t 
Substituting in Eq. (33.51), we have 

Eq __ Hi cos 6, — n 2 cos 6 t 
Eq n i cos d t + n 2 cos 6 ( ’ 


(33.55) 
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which does not look the same as Eq. (33.3). It will, however, if we use Snell’s law 
to get rid of the rC s. Setting n 2 = n x sin 0,,/sin 0,, and multiplying the numerator 
and denominator by sin 6 t) we get 

E' n _ cos 0, sin Q t — sin 0, cos 6 t 
E 0 cos 0, sin 0/ + sin Q % cos 0 t 


The numerator and denominator are just the sines of (0 t — 6 t ) and (0* + d t ); 
we get 


Eq = sin (0, - 6 t ) 
Eq sin (0, + dt) 


(33.56) 


Since Eq and E 0 are in the same material, the intensities are proportional to the 
squares of the electric fields, and we get the same result as before. Similarly, Eq. 
(33.53) is the same as Eq. (33.4). 

For waves which arrive at normal incidence, Q t = 0 and B t = 0. Equation 
(33.56) gives 0/0, which is not very useful. We can, however, go back to Eq. 
(33.55), which gives 


[r 



(33.57) 


This result, naturally, applies for “either” polarization, since for normal incidence 
there is no special “plane of incidence.” 


33-5 Reflection from metals 


We can now use our results to understand the interesting phenomenon of 
reflection from metals. Why is it that metals are shiny? We saw in the last chapter 
that metals have an index of refraction which, for some frequencies, has a large 
imaginary part. Let’s see what we would get for the reflected intensity when light 
shines from air (with n = 1) onto a material with n = — inj. Then Eq. (33.55) 
gives (for normal incidence) 

Eq = 1 + inj * 

Eq 1 - in t * 


For the intensity of the reflected wave, we want the square of the absolute values 
of Eq and E {) : 

Ir _ \E'o\ 2 _ |1 + «/| 2 
/, |£ 0 |-’ |1 -/«/!*’ 
or 


[r _ 1 H~ nj 

II 1 + nj 


(33 58) 


For a material with an index which is a pure imaginary number, there is 100 per- 
cent reflection 1 

Metals do not reflect 100 percent, but many do reflect visible light very well. 
In other words, the imaginary part of their indexes is very large But we have seen 
that a large imaginary part of the index means a strong absorption. So there is a 
general rule that if any material gets to be a very good absorber at any frequency, 
the waves are strongly reflected at the surface and very little gets inside to be ab- 
sorbed You can see this effect with strong dyes Pure crystals of the strongest 
dyes have a “metallic” shine. Probably you have noticed that at the edge of a bottle 
of purple ink the dried dye will give a golden metallic reflection, or that dried red 
ink will sometimes give a greenish metallic reflection. Red ink absorbs out the 
greens of transmitted light, so if the ink is very concentrated, it will exhibit a strong 
surface reflection for the frequencies of green light. 

You can easily show this effect by coating a glass plate with red ink and 
letting it dry. If you direct a beam of white light at the back of the plate, as shown 
in Fig. 33-8, there will be a transmitted beam of red light and a reflected beam of 
green light. 



Fig. 33-8. A material which absorbs 
light strongly at the frequency a> also 
reflects light of that frequency. 
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Fig. 33-9. Total internal reflection. 


33-6 Total internal reflection 

If light goes from a material like glass, with a real index n greater than 1, 
toward, say, air, with an index n 2 equal to 1, Snell's law says that 

sin 6 t = n sin 0*. 

The angle 6 t of the transmitted wave becomes 90° when the incident angle 0 t is 
equal to the “critical angle” 6 ( given by 

n sin 6 C = 1. (33.59) 


What happens for 0, greater than the critical angle 7 You know that there is total 
internal reflection. But how does that come about 7 

Let's go back to Eq (33.45) which gives the wave numbei k" for the trans- 
mitted wave. We would have 




Fig. 33-10. If there is a small gap, 
internal reflection is not “total”; a trans- 
mitted wave appears beyond the gap. 


Now k, f = k sin 0 Z and k = om/c , so 

2 

k x 2 = ~j (1 - n 2 sin 2 0,) . 

If n sin 0, is greater than one, k " 2 is negative and k" is a pure imaginary, say 
±ik[. You know by now what that means' The “transmitted” wave (Eq. 33.34) 
will have the form 

E t = E f Je ±klx e l{G}t - l> i J \ 

The wave amplitude either grows or drops off exponentially with increasing a*. 
Clearly, what we want here is the negative sign. Then the amplitude of the wave 
to the right of the boundary will go as shown in Fig. 33-9. Notice that k f is of 
the order c o/c — which is An, the free-space wavelength of the light. When light is 
totally reflected from the inside of a glass-air surface, there are fields in the air, 
but they extend beyond the surface only a distance of the order of the wavelength 
of the light 

We can now see how to answer the following question: If a light wave in glass 
arrives at the surface at a large enough angle, it is reflected, if another piece of 
glass is brought up to the surface (so that the “surface” in effect disappears) the 
light is transmitted. Exactly when does this happen? Surely there must be con- 
tinuous change from total reflection to no reflection 1 The answer, of course, is 
that if the air gap is so small that the exponential tail of the wave in the air has an 
appreciable strength at the second piece of glass, it will shake the electrons there 
and generate a new wave, as shown in Fig. 33-10. Some light will be transmitted. 
(Clearly, our solution is incomplete, we should solve all the equations again for a 
thin layer of air between two regions of glass.) 
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Fig. 33-1 1. A demonstration of the penetration of internally reflected waves. 


This transmission effect can be observed with ordinary light only if the air 
gap is very small (of the order of the wavelength of light, like 10~ 5 cm), but it is 
easily demonstrated with three-centimeter waves. Then the exponentially de- 
creasing field extends several centimeters. A microwave apparatus that shows the 
effect is drawn in Fig. 33-11 Waves from a small three-centimeter transmitter are 
directed at a 45° prism of paraffin. The index of refraction of paraffin for these 
frequencies is 1.50, and therefore the critical angle is 41.5°. So the wave is totally 
reflected from the 45° face and is picked up by detector A , as indicated in 
Fig. 33-1 1(a). If a second paraffin prism is placed in contact with the first, as 
shown in part (b) of the figure, the wave passes straight through and is picked up 
at detector B . If a gap of a few centimeters is left between the two prisms, as in 
part (c), there are both transmitted and reflected waves. The electric field outside 
the 45° face of the prism in Fig. 33-1 1(a) can also be shown by bringing detector 
B to within a few centimeters of the surface. 
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34 


The Magnetism of Matter 


34-1 Diamagnetism and paramagnetism 

In this chapter we are going to talk about the magnetic properties of materials. 
The material which has the most striking magnetic properties is, of course, iron. 
Similar magnetic properties are shared also by the elements nickel, cobalt, and — at 
sufficiently low temperatures (below 16°C) — by gadolinium, as well as by a number 
of peculiar alloys. That kind of magnetism, called ferromagnetism , is sufficiently 
striking and complicated that we will discuss it in a special chapter. However, 
all ordinary substances do show some magnetic effects, although very small 
ones — a thousand to a million times less than the effects in ferromagnetic materials. 
Here we are going to describe ordinary magnetism, that is to say, the magnetism 
of substances other than the ferromagnetic ones. 

This small magnetism is of two kinds. Some materials are attracted toward 
magnetic fields; others are repelled. Unlike the electrical effect in matter, which 
always causes dielectrics to be attracted, there are two signs to the magnetic 
effect. These two signs can be easily shown with the help of a strong electromagnet 
which has one sharply pointed pole piece and one flat pole piece, as drawn in 
Fig. 34-1. The magnetic field is much stronger near the pointed pole than near the 
flat pole. If a small piece of material is fastened to a long string and suspended 
between the poles, there will, in general, be a small force on it. This small force 
can be seen by the slight displacement of the hanging material when the magnet 
is turned on. The few ferromagnetic materials are attracted very strongly toward 
the pointed pole; all other materials feel only a very weak force. Some are weakly 
attracted to the pointed pole; and some are weakly repelled. 


34-1 Diamagnetism and 
paramagnetism 

34-2 Magnetic moments and angular 
momentum 

34-3 The precession of atomic 
magnets 

34-4 Diamagnetism 

34-5 Larmor’s theorem 

34-6 Classical physics gives neither 
diamagnetism nor 
paramagnetism 

34-7 Angular momentum in quantum 
mechanics 

34-8 The magnetic energy of atoms 

Review: Section 15-1, “The forces on 
a current loop; energy of a 
dipole.” 



The effect is most easily seen with a small cylinder of bismuth, which is 
repelled from the high-field region. Substances which are repelled in this way are 
called diamagnetic. Bismuth is one of the strongest diamagnetic materials, but 
even with it, the effect is still quite weak. Diamagnetism is always very weak. 
If a small piece of aluminum is suspended between the poles, there is also a weak 
force, but toward the pointed pole. Substances like aluminum are called para- 
magnetic . (In such an experiment, eddy-current forces arise when the magnet is 
turned on and off, and these can give off strong impulses. You must be careful 
to look for the net displacement after the hanging object settles down.) 
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We want now to describe briefly the mechanisms of these two effects. 
First, in many substances the atoms have no permanent magnetic moments, 
or rather, all the magnets within each atom balance out so that the net moment 
of the atom is zero. The electron spins and orbital motions all exactly balance 
out, so that any particular atom has no average magnetic moment, in these cir- 
cumstances, when you turn on a magnetic field little extra currents are generated 
inside the atom by induction. According to Lenz’s law, these currents are in 
such a direction as to oppose the increasing field. So the induced magnetic mo- 
ments of the atoms are directed opposite to the magnetic field. This is the mech- 
anism of diamagnetism. 

Then there are some substances for which the atoms do have a permanent 
magnetic moment — in which the electron spins and orbits have a net circulating 
current that is not zero So besides the diamagnetic effect (which is always present), 
there is also the possibility of lining up the individual atomic magnetic moments 
In this case, the moments try to line up with the magnetic field (in the way the 
permanent dipoles of a dielectric are lined up by the electric field), and the induced 
magnetism tends to enhance the magnetic field. These are the paramagnetic sub- 
stances. Paramagnetism is generally fairly weak because the lining-up forces are 
relatively small compared with the forces from the thermal motions which try to 
derange the order. It also follows that paramagnetism is usually sensitive to the 
temperature (The paramagnetism arising from the spins of the electrons re- 
sponsible for conduction in a metal constitutes an exception. We will not be 
discussing this phenomenon here.) For ordinary paramagnetism, the lower the 
temperature, the stronger the effect. There is more limng-up at low temperatures 
when the deranging effects of the collisions are less. Diamagnetism, on the other 
hand, is more or less independent of the temperature. In any substance with built-in 
magnetic moments there is a diamagnetic as well as a paramagnetic effect, but the 
paramagnetic effect usually dominates. 

In Chapter 11 we described a ferroelectric material, in which all the electric 
dipoles get lined up by their own mutual electric fields It is also possible to imagine 
the magnetic analog of ferroelectricity, in which all the atomic moments would 
line up and lock together. If you make calculations of how this should happen, 
you will find that because the magnetic forces are so much smaller than the electric 
forces, thermal motions should knock out this alignment even at temperatures as 
low as a few tenths of a degree Kelvin. So it would be impossible at room tempera- 
ture to have any permanent lining up of the magnets. 

On the other hand, this is exactly what does happen in iron — it does get lined 
up. There is an effective force between the magnetic moments of the different atoms 
of iron which is much, much greater than the direct magnetic interaction It is an 
indirect effect which can be explained only by quantum mechanics. It is about 
ten thousand times stronger than the direct magnetic interaction, and is what lines 
up the moments in ferromagnetic materials. We discuss this special interaction 
in a later chapter. 

Now that we have tried to give you a qualitative explanation of diamagnetism 
and paramagnetism, we must correct ourselves and say that ii is not possible to 
understand the magnetic effects of materials in any honest way from the point 
of view of classical physics. Such magnetic effects are a completely quantum- 
mechanical phenomenon. It is, however, possible to make some phoney classical 
arguments and to get some idea of what is going on. We might put it this way. 
You can make some classical arguments and get guesses as to the behavior of the 
material, but these arguments are not “legal” in any sense because it is absolutely 
essential that quantum mechanics be involved in every one of these magnetic 
phenomena. On the other hand, there are situations, such as in a plasma or a 
region of space with many free electrons, where the electrons do obey ihe laws 
of classical mechanics And in those circumstances, some of the theorems from clas- 
sical magnetism are worth while Also, the classical arguments are of some value 
for historical reasons. The first few times that people were able to guess at the mean- 
ing and behavior of magnetic materials, they used classical arguments Finally, 
as we have already illustrated, classical mechanics can give us some useful guesses 
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as to what might happen — even though the really honest way to study this subject 
would be to learn quantum mechanics first and then to understand the magnetism 
in terms of quantum mechanics. 

On the other hand, we don’t want to wait until we learn quantum mechanics 
inside out to understand a simple thing like diamagnetism We will have to 
lean on the classical mechanics as kind of half showing what happens, realizing, 
however, that the arguments are really not correct. We therefore make a series of 
theorems about classical magnetism that will confuse you because they will prove 
different things. Except for the last theorem, every one of them will be wrong 
Furthermore, they will all be wrong as a description of the physical world, because 
quantum mechanics is left out. 


34-2 Magnetic moments and angular momentum 


The first theorem we want to prove from classical mechanics is the following: 
If an electron is moving in a circular orbit (for example, revolving around a nucleus 
under the influence of a central force), there is a definite ratio between the magnetic 
moment and the angular momentum. Let’s call J the angular momentum and 
/x the magnetic moment of the electron in the orbit. The magnitude of the angular 
momentum is the mass of the electron times the velocity times the radius (See 
Fig. 34-2.) It is directed perpendicular to the plane of the orbit. 

J — mvr . (34.1) 



(This is, of course, a nonrelativistic formula, but it is a good approximation for F: g. 34-2. For any circular orbit the 
atoms, because for the electrons involved v/c is generally of the order of e 2 /he = magnetic moment fx is q/2 m times the 
1/137, or about 1 percent ) angular momentum J. 

The magnetic moment of the same orbit is the current times the area. (See 
Section 14-5 ) The current is the charge per unit time which passes any point on 
the orbit, namely, the charge q times the frequency of rotation. The frequency is the 
velocity divided by the circumference of the orbit; so 


/ - 


q 


V 

2i rr 


The area is 7 rr 2 , so the magnetic moment is 




qvr 

~T 


(34 2) 


It is also directed perpendicular to the plane of the orbit, 
same direction: 


* = 2m 3 (0rblt) ' 


So J and fx are in the 


(34.3) 


Their ratio depends neither on the velocity nor on the radius. For any particle 
moving in a circular orbit the magnetic moment is equal to q/2m times the angular 
momentum. For an electron, the charge is negative — we can call it ~q t \ so for 
an electron 

/x = — -^ r - J (electron orbit). (34.4) 

2m 


That’s what we would expect classically and, miraculously enough, it is also 
true quantum-mechanically It’s one of those things. However, if you keep going 
with the classical physics, you find other places where it gives the wrong answers, 
and it is a great game to try to remember which things are right and which things 
are wrong. We might as well give you immediately what is true in general in 
quantum mechanics. First, Eq. (34 4) is true for orbital motion , but that’s not the 
only magnetism that exists. The electron also has a spin rotation about its own 
axis (something like the earth rotating on its axis), and as a result of that spin it 
has both an angular momentum and a magnetic moment But for reasons that are 
purely quantum-mechanical — there is no classical explanation — the ratio of /x 
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to / for the electron spin is twice as large as it is for orbital motion of the spinning 
electron : 

fi = — q ~ e - J (electron spin). (34 5) 

m 

In any atom there are, generally speaking, several electrons and some combina- 
tion of spin and orbit rotations which builds up a total angular momentum and a 
total magnetic moment. Although there is no classical reason why it should be so, 
it is always true in quantum mechanics that (for an isolated atom) the direction of 
the magnetic moment is exactly opposite to the direction of the angular momentum. 
The ratio of the two is not necessarily either — qjm or ~q e /2m , but somewhere in 
between, because there is a mixture of the contributions from the orbits and the 
spins. We can write 



where g is a factor which is characteristic of the state of the atom. It would be 1 
for a pure orbital moment, or 2 for a pure spin moment, or some other number 
in between for a complicated system like an atom. This formula does not, of course, 
tell us very much. It says that the magnetic moment is parallel to the angular mo- 
mentum, but can have any magnitude. The form of Eq. (34.6) is convenient, how- 
ever, because g — called the “Lande g-factor” — is a dimensionless constant whose 
magnitude is of the order of one. It is one of the jobs of quantum mechanics to 
predict the g-factor for any particular atomic state. 

You might also be interested in what happens in nuclei. In nuclei there are 
protons and neutrons which may move around in some kind of orbit and at the 
same time, like an electron, have an intrinsic spin. Again the magnetic moment 
is parallel to the angular momentum. Only now the order of magnitude of the 
ratio of the two is what you would expect for a proton going around in a circle, 
with m in Eq. (34.3) equal to the proton mass Therefore it is usual to write for 
nuclei 

- - * (a-) j ■ 

where m p is the mass of the proton, and g — called the nuclear g-factor — is a number 
near one, to be determined for each nucleus. 

Another important difference for a nucleus is that the spin magnetic moment 
of the proton does not have a g-factor of 2, as the electron does For a proton, 
g = 2(2 79). Surprisingly enough, the neutron also has a spin magnetic moment, 
and its magnetic moment relative to its angular momentum is 2(— 1.93). The 
neutron, in other words, is not exactly “neutral” in the magnetic sense. It is like 
a little magnet, and it has the kind of magnetic moment that a rotating negative 
charge would have. 


34-3 The precession of atomic magnets 

One of the consequences of having the magnetic moment proportional to the 
angular momentum is that an atomic magnet placed in a magnetic field will precess. 
First we will argue classically. Suppose that we have the magnetic moment p 
suspended freely in a uniform magnetic field. It will feel a torque r, equal to 
fx X B, which tries to bring it in line with the field direction But the atomic 
magnet is a gyroscope — it has the angular momentum /. Therefore the torque 
due to the magnetic field will not cause the magnet to line up. Instead, the magnet 
will precess , as we saw when we analyzed a gyroscope in Chapter 20 of Volume I. 
The angular momentum — and with it the magnetic moment — precesses about an 
axis parallel to the magnetic field. We can find the rate of precession by the same 
method we used in Chapter 20 of the first volume. 

Suppose that in a small time At the angular momentum changes from J to /', 
as drawn in Fig. 34-3, staying always at the same angle 6 with respect to the direc- 
tion of the magnetic field B. Let's call ai p the angular velocity of the precession, 
so that in the time At the angle of precession is co p At. From the geometry of the 
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figure, we see that the change of angular momentum in the time At is 


AJ = (J sin 6)(cjo p At). 

So the rate of change of the angular momentum is 

~ = ov/sin 6, (34.8) 

which must be equal to the torque: 

r = j uB sin 0. 

The angular velocity of precession is then 

a > p = j B. 

Substituting jjl/J from Eq. (34.6), we see that for an atomic system 


the precession frequency is proportional to B. It is handy to remember that 
for an atom (or electron) 

A = — = (1.4 megacycles/gauss)gi?, (34.12) 

Z7T 

and that for a nucleus 

fp = — (0.76 kilocycles /gauss)#/?. (34.13) 

lit 

(The formulas for atoms and nuclei are different only because of the different 
conventions for g for the two cases.) 

According to the classical theory, then, the electron orbits — and spins — in 
an atom should precess in a magnetic field. Is it also true quantum-mechanically? 
It is essentially true, but the meaning of the “precession” is different. In quantum 
mechanics one cannot talk about the direction of the angular momentum in the 
same sense as one does classically, nevertheless, there is a very close analogy — so 
close that we continue to call it “precession.” We will discuss it later when we talk 
about the quantum-mechanical point of view. 


(34.9) 

(34.10) 

(34.11) 


34-4 Diamagnetism 

Next we want to look at afomagnetism from the classical point of view. It 
can be worked out in several ways, but one of the nice ways is the following. 
Suppose that we slowly turn on a magnetic field in the vicinity of an atom. As 
the magnetic field changes an electric field is generated by magnetic induction. 
From Faraday’s law, the line integral of E around any closed path is the rate of 
change of the magnetic flux through the path. Suppose we pick a path V which is 
a circle of radius r concentric with the center of the atom, as shown in Fig. 34-4. 
The average tangential electric field E around this path is given by 

Elrrr = — ~ (Birr 2 ), 

and there is a circulating electric field whose strength is 

r dB 

E= -iiT 


The induced electric field acting on an electron in the atom produces a torque 
equal to ~q e Er, which must equal the rate of change of the angular momentum 
dJ/dt : 

dJ _ q e r 2 dB 
dt 2 dt 


(34.14) 



Fig. 34-3, An object with angular 
momentum J and a parallel magnetic 
moment fx placed in a magnetic field B 
precesses with the angular velocity to p . 


B 



Fig. 34-4. The induced electric 
forces on the electrons in an atom. 
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Integrating with respect to time from zero field, we find that the change in angular 
momentum due to turning on the field is 

AJ = B. (34.15) 

This is the extra angular momentum from the twist given to the electrons as the 
field is turned on. 

This added angular momentum makes an extra magnetic moment which, 
because it is an orbital motion, is just —q e /2m times the angular momentum. The 
induced diamagnetic moment is 

2 2 

Am = — 3^- AJ — -ff- B. (34.16) 

2m 4 m 

The minus sign (as you can see is right by using Lenz’s law) means that the added 
moment is opposite to the magnetic field. 

We would like to write Eq (34.16) a little differently. The r 2 which appears 
is the radius from an axis through the atom parallel to B , so if B is along the z-direc- 
tion, it is x 2 -f- y 2 . If we consider spherically symmetric atoms (or average over 
atoms with their natural axes in all directions) the average of x 2 + y 2 is 2/3 of 
the average of the square of the true radial distance from the center point of the 
atom. It is therefore usually more convenient to write Eq. (34.16) as 

An = -^(r 2 )^. (34.17) 

In any case, we have found an induced atomic moment proportional to the 
magnetic field B and opposing it. This is diamagnetism of matter. It is this magnetic 
effect that is responsible for the small force on a piece of bismuth in a nonuniform 
magnetic field. (You could compute the force by working out the energy of the 
induced moments in the field and seeing how the energy changes as the material 
is moved into or out of the high-field region.) 

We are still left with the problem: What is the mean square radius, {/* 2 ) tU ? 
Classical mechanics cannot supply an answer. We must go back and start over 
with quantum mechanics. In an atom we cannot really say where an electron is, 
but only know the probability that it will be at some place If we interpret (r 2 ) iU 
to mean the average of the square of the distance from the center for the probability 
distribution, the diamagnetic moment given by quantum mechanics is just the 
same as formula (34.17). This equation, of course, is the moment for one electron. 
The total moment is given by the sum over all the electrons in the atom. The 
surprising thing is that the classical argument and quantum mechanics give the 
same answer, although, as we shall see, the classical argument that gives Eq. (34.17) 
is not really valid in classical mechanics. 

The same diamagnetic effect occurs even when an atom already has a perma- 
nent moment. Then the system will precess in the magnetic field. As the whole 
atom precesses, it takes up an additional small angular velocity, and that slow 
turning gives a small current which represents a correction to the magnetic moment. 
This is just the diamagnetic effect represented in another way. But we don't 
really have to worry about that when we talk about paramagnetism. If the dia- 
magnetic effect is first computed, as we have done here, we don't have to worry 
about the fact that there is an extra little current from the precession. That has 
already been included in the diamagnetic term. 

34-5 Larmor’s theorem 

We can already conclude something from our results so far First of all, in 
the classical theory the moment fi was always proportional to 7, with a given con- 
stant of proportionality for a particular atom There wasn’t any spin of the 
electrons, and the constant of proportionality was always —q e /2m; that is to say, 
in Eq. (34.6) we should set g — I . The ratio of ju to / was independent of the in- 
ternal motion of the electrons. Thus, according to the classical theory, all systems 
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of electrons would precess with the same angular velocity. (This is not true in 
quantum mechanics.) This result is related to a theorem in classical mechanics 
that we would now like to prove Suppose we have a group of electrons which are 
all held together by attraction toward a central point — as the electrons are attracted 
by a nucleus. The electrons will also be interacting with each other, and can, in 
general, have complicated motions. Suppose you have solved for the motions 
with no magnetic field and then want to know what the motions would be with a 
weak magnetic field. The theorem says that the motion with a weak magnetic 
field is always one of the no-field solutions with an added rotation, about the axis 
of the field, with the angular velocity co/, = q e B/2m. (This is the same as 
if g = 1.) There are, of course, many possible motions. The point is that for 
every motion without the magnetic field there is a corresponding motion in the 
field, which is the original motion plus a uniform rotation This is called Larmor’s 
theorem, and co/ v is called the Larmor frequency. 

We would like to show how the theorem can be proved, but we will let you 
work out the details. Take, first, one electron in a central force field. The force on 
it is just F(r\ directed toward the center. If we now turn on a uniform magnetic 
field, there is an additional force, qv X B\ so the total force is 

F{r) + qv X B. (34.18) 

Now let’s look at the same system from a coordinate system rotating with angulai 
velocity co about an axis through the center of force and parallel to B. This is no 
longer an inertial system, so we have to put in the proper pseudoforces — the cen- 
trifugal and Coriolis forces we talked about in Chapter 19 of Volume I. We found 
there that in a frame rotating with angular velocity w, there is an apparent tangential 
force proportional to v r , the radial component of velocity: 

F t = -2 mwvr (34.19) 

And there is an apparent radial force which is given by 

F r = moj 2 r + 2moi)v h (34 20) 

where v t is the tangential component of the velocity, measured in the rotating 
frame. (The radial component v r for rotating and inertial frames is the same ) 
Now for small enough angular velocities (that is, if c or « v t ) 9 we can neglect 
the first term (centrifugal) in Eq. (34.20) in comparison with the second (Coriolis) 
Then Eqs. (34.19) and (34.20) can be written together as 

F = — 2 moo X v . (34.21) 

If we now combine a rotation and a magnetic field, we must add the force in 
Eq. (34.21) to that in Eq (34.18). The total force is 

F(r) + qv X B + 2 mv X to (34.22) 

[we reverse the cross product and the sign of Eq. (34 21) to get the last term]. 
Looking at our result, we see that if 

2 m to = ~qB 

the two terms on the right cancel, and in the moving frame the only force is F(r). 
The motion of the electron is just the same as with no magnetic field — and, of 
course, no rotation. We have proved Larmor’s theorem for one electron. Since 
the proof assumes a small to, it also means that the theorem is true only for weak 
magnetic fields. The only thing we could ask you to improve on is to take the case 
of many electrons mutually interacting with each other, but all in the same central 
field, and prove the same theorem. So no matter how complex an atom is, if it has 
a central field the theorem is true. But that’s the end of the classical mechanics, 
because it isn’t true in fact that the motions precess in that way. The precession 
frequency co p of Eq. (34.11) is only equal to wj, if g happens to be equal to 1. 
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34-6 Classical physics gives neither diamagnetism nor paramagnetism 

Now we would like to demonstrate that according to classical mechanics 
there can be no diamagnetism and no paramagnetism at all. It sounds crazy — first, 
we have proved that there are paramagnetism, diamagnetism, precessing orbits, 
and so on, and now we are going to prove that it is all wrong. Yes! — We are going 
to prove that if you follow the classical mechanics far enough, there are no such 
magnetic effects — they all cancel out. If you start a classical argument in a certain 
place and don’t go far enough, you can get any answer you want. But the only 
legitimate and correct proof shows that there is no magnetic effect whatever. 

It is a consequence of classical mechanics that if you have any kind of system — 
a gas with electrons, protons, and whatever — kept in a box so that the whole thing 
can’t turn, there will be no magnetic effect. It is possible to have a magnetic effect 
if you have an isolated system, like a star held together by itself, which can start 
rotating when you put on the magnetic field. But if you have a piece of material 
that is held in place so that it can’t start spinning, then there will be no magnetic 
effects. What we mean by holding down the spin is summarized this way: At a 
given temperature we suppose that there is only one state of thermal equilibrium 
The theorem then says that if you turn on a magnetic field and wait for the system 
to get into thermal equilibrium, there will be no paramagnetism or diamagnetism — 
there will be no induced magnetic moment. Proof: According to statistical me- 
chanics, the probability that a system will have any given state of motion is pro- 
portional to e ~ Ujk \ where U is the energy of that motion. Now what is the energy 
of motion? For a particle moving in a constant magnetic field, the energy is the 
ordinary potential energy plus mv 2 /l , with nothing additional for the magnetic 
field. [You know that the forces from electromagnetic fields are q(E + v X B), 
and that the rate of work F • v is just qE • v, which is not affected by the magnetic 
field ] So the energy of a system, whether it is in a magnetic field or not, is always 
given by the kinetic energy plus the potential energy. Since the probability of any 
motion depends only on the energy — that is, on the velocity and position — it is 
the same whether or not there is a magnetic field. For thermal equilibrium, there- 
fore, the magnetic field has no effect. If we have one system in a box, and then have 
another system in a second box, this time with a magnetic field, the probability 
of any particular velocity at any point in the first box is the same as in the second. 
If the first box has no average circulating current (which it will not have if it is in 
equilibrium with the stationary walls), there is no average magnetic moment. 
Since in the second box all the motions are the same, there is no average magnetic 
moment there either. Hence, if the temperature is kept constant and thermal 
equilibrium is re-established after the field is turned on, there can be no magnetic 
moment induced by the field — according to classical mechanics. We can only get a 
satisfactory understanding of magnetic phenomena from quantum mechanics. 

Unfortunately, we cannot assume that you have a thorough understanding of 
quantum mechanics, so this is hardly the place to discuss the matter. On the other 
hand, we don’t always have to learn something first by learning the exact rules and 
then by learning how they are applied in different cases. Almost every subject 
that we have taken up in this course has been treated in a different way. In the 
case of electricity, we wrote the Maxwell equations on “Page One” and then de- 
duced all the consequences. That’s one way. But we will not now try to begin a new 
“Page One,” writing the equations of quantum mechanics and deducing everything 
from them We will just have to tell you some of the consequences of quantum 
mechanics, before you learn where they come from. So here we go. 


34-7 Angular momentum in quantum mechanics 

We have already given you a relation between the magnetic moment and the 
angular momentum. That’s pleasant. But what do the magnetic moment and the 
angular momentum mean in quantum mechanics? In quantum mechanics it turns 
out to be best to define things like magnetic moments in terms of the other con- 
cepts such as energy, in order to make sure that one knows what it means. Now, 
34-8 



it is easy to define a magnetic moment in terms of energy, because the energy of 
a moment in a magnetic field is, in the classical theory, ft ■ B. Therefore, the follow- 
ing definition has been taken in quantum mechanics: If we calculate the energy of a 
system in a magnetic field and we find that it is proportional to the field strength 
(for small field), the coefficient is called the component of magnetic moment in 
the direction of the field. (We don’t have to get so elegant for our work now, we 
can still think of the magnetic moment in the ordinary, to some extent classical, 
sense.) 

Now we would like to discuss the idea of angular momentum in quantum 
mechanics — or rather, the characteristics of what, in quantum mechanics, is called 
angular momentum. You see, when you go to new kinds of laws, you can’t just 
assume that each word is going to mean exactly the same thing. You may think, 
say, “Oh, I know what angular momentum is. It’s that thing that is changed by a 
torque.*’ But what’s a torque? In quantum mechanics we have to have new 
definitions of old quantities. It would, therefore, be legally best to call it by some 
other name such as “quantangular momentum,” or something like that, because 
it is the angular momentum as defined in quantum mechanics But if we can find a 
quantity in quantum mechanics which is identical to our old idea of angular 
momentum when the system becomes large enough, there is no use in inventing 
an extra word. We might as well just call it angular momentum. With that under- 
standing, this odd thing that we are about to describe is angular momentum It 
is the thing which in a large system we recognize as angular momentum in classical 
mechanics. 

First, we take a system in which angular momentum is conserved, such as an 
atom all by itself in empty space. Now such a thing (like the earth spinning on its 
axis) could, in the ordinary sense, be spinning around any axis one wished to choose. 
And for a given spin, there could be many different “states,” all of the same 
energy, each “state” corresponding to a particular direction of the axis of the 
angular momentum. So in the classical theory, with a given angular momentum, 
there is an infinite number of possible states, all of the same energy. 

It turns out in quantum mechanics, however, that several strange things 
happen. First, the number of states in which such a system can exist is limited — 
there is only a finite number. If the system is small, the finite number is very small, 
and if the system is large, the finite number gets very, very large. Second, we 
cannot describe a “state” by giving the direction of its angular momentum, but 
only by giving the component of the angular momentum along some direction — say 
in the z-direction Classically, an object with a given total angular momentum 
J could have, for its z-component, any value from -\-J to —J. But quantum- 
mechanically, the z-component of angular momentum can have only certain discrete 
values. Any given system — a particular atom, or a nucleus, or anything — with a 
given energy, has a characteristic number y, and its z-component of angular mo- 
mentum can only be one of the following set of values: 


A 

0 — 1 )* 
o— 2)ft 

~(j ~ 2)h 

-u - m 

-A 


(34.23) 


The largest z-component is j times h\ the next smaller is one unit of h less, and so 
on down to ~jh. The number j is called “the spin of the system.” (Some people 
call it the “total angular momentum quantum number” ; but we’ll call it the “spin.”) 

You may be worried that what we are saying can only be true for some “spe- 
cial” z-axis But that is not so. For a system whose spin is j, the component of 
angular momentum along any axis can have only one of the values in (34.23) 
Although it is quite mysterious, we ask you just to accept it for the moment We 
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will come back and discuss the point later. You may at least be pleased to hear that 
the z-component goes from some number to minus the same number, so that we 
at least don’t have to decide which is the plus direction of the z-axis. (Certainly, if 
we said that it went from +/ to minus a different amount, that would be infinitely 
mysterious, because we wouldn’t have been able to define the z-axis, pointing the 
other way.) 

Now if the z-component of angular momentum must go down by integers 
from +/ to —j, then j must be an integer. No! Not quite; twice j must be 
an integer. It is only the difference between +/ and —j that must be an integer. So, 
in general, the spin j is either an integer or a half-integer, depending on whether 
2 j is even or odd. Take, for instance, a nucleus like lithium, which has a spin of 
three-halves, j = 3/2. Then the angular momentum around the z-axis, in units 
of h, is one of the following: 

+3/2 
+ 1/2 
- 1/2 
-3/2. 


There are four possible states, each of the same energy, if the nucleus is in empty 
space with no external fields. If we have a system whose spin is two, then the 
z-component of angular momentum has only the values, in units of h , 

2 

1 

0 

-1 

- 2 . 

If you count how many states there are for a given j\ there are (2 j + 1) possibilities. 
In other words, if you tell me the energy and also the spin y, it turns out that 
there are exactly (2j + 1) states with that energy, each state corresponding to one 
of the different possible values of the z-component of the angular momentum. 

We would like to add one other fact. If you pick out any atom of known j 
at random and measure the z-component of the angular momentum, then you may 
get any one of the possible values, and each of the values is equally likely. All of 
the states are in fact single states, and each is just as good as any other. Each one 
has the same ‘‘weight” in the world. (We are assuming that nothing has been done 
to sort out a special sample.) This fact has, incidentally, a simple classical analog. 
If you ask the same question classically: What is the likelihood of a particular 
z-component of angular momentum if you take a random sample of systems, all 
with the same total angular momentum? — the answer is that all values from the 
maximum to the minimum are equally likely. (You can easily work that out.) 
The classical result corresponds to the equal probability of the (2y + 1) possi- 
bilities in quantum mechanics. 

From what we have so far, we can get another interesting and somewhat 
surprising conclusion. In certain classical calculations the quantity that appears 
in the final result is the square of the magnitude of the angular momentum /—in 
other words, J * /. It turns out that it is often possible to guess at the correct 
quantum-mechanical formula by using the classical calculation and the following 
simple rule: Replace 7 2 = / ■ / by j(j + \)h 2 . This rule is commonly used, and 
usually gives the correct result, but not always. We can give the following argument 
to show why you might expect this rule to work. 

The scalar product J * J can be written as 

j-j = jI + j% + Jl 

Since it is a scalar, it should be the same for any orientation of the spin. Suppose 
we pick samples of any given atomic system at random and make measurements of 
Jl, or 7 2 , or 7 2 , the average value should be the same for each. (There is no special 
distinction for any one of the directions.) Therefore, the average of / ■ J is just 
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equal to three times the average of any component squared, say of J 2 ; 

(/•/) av = 3<yf>. 

But since J • / is the same for all orientations, its average is, of course, just its 
constant value; we have 

J ■ J = 3(J*) av . (34.24) 

If we now say that we will use the same equation for quantum mechanics, we 
can easily find (Ji) llv . We just have to take the sum of the (2 j + 1) possible values 
of J \ , and divide by the total number; 

(Jl) av = 1)2 ! + 1)2 + ( ~ j)2 h 2 . (34.25) 

For a system with a spin of 3/2, it goes like this: 

/f2x _ (3/2) 2 + (1/2) 2 + (— 1/2) 2 + ( — 3/2) 2 , 2 _ 5 i2 

rz/av ~ ^ h 4 ^ * 

We conclude that 

/■ / = 3<yf) dV = z%h 2 - + i )h 2 . 

We will leave it for you to show that Eq. (34.25), together with Eq. (34.24), gives 
the general result 

J-J = JU+ m 2 . (34.26) 

Although we would think classically that the largest possible value of the z-com- 
ponent of J is just the magnitude of J — namely, \/J- J — quantum mechanically 
the maximum of J z is always a little less than that, because jh is always less than 
V ]{j + 1) h. The angular momentum is never “completely along the z-direction.” 


34-8 The magnetic energy of atoms 

Now we want to talk again about the magnetic moment. We have said that in 
quantum mechanics the magnetic moment of a particular atomic system can be 
written in terms of the angular momentum by Eq. (34.6); 





(34.27) 


where — q e and m are the charge and mass of the electron. 

An atomic magnet placed in an external magnetic field will have an extra 
magnetic energy which depends on the component of its magnetic moment along 
the field direction. We know that 


U m ag = B. (34.28) 

Choosing our z-axis along the direction of B , 

U s n . lK = (34.29) 

Using Eq. (34.27), we have that 

Um * e = 8 (/«) JzB - 

Quantum mechanics says that J z can have only certain values: jh , (j — 1 )h, , 

— jh . Therefore, the magnetic energy of an atomic system is not arbitrary; it can 
have only certain values. Its maximum value, for instance, is 

* (£) w- 
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Fig. 34-5. The possible magnetic en- 
ergies of an atomic system with a spin of 
3 ! 2 in a magnetic field B. 



Fig. 34-6. The two possible energy 
states of an electron in a magnetic field B. 


The quantity q e h/2m is usually given the name “the Bohr magneton” and written 


q e h 

= 2 ^’ 

The possible values of the magnetic energy are 




where J z /h takes on the possible values j, (j — 1), (j — 2), . . . , (— j + 1), —j. 

In other words, the energy of an atomic system is changed when it is put in a 
magnetic field by an amount that is proportional to the field, and proportional to 
J z . We say that the energy of an atomic system is “split into 2 \j + 1 levels” by 
a magnetic field. For instance, an atom whose energy is U 0 outside a magnetic 
field and whose j is 3/2, will have four possible energies when placed in a field. 
We can show these energies by an energy-level diagram like that drawn in Fig 
34-5. Any particular atom can have only one of the four possible energies in any 
given field B. That is what quantum mechanics says about the behavior of an 
atomic system in a magnetic field. 

The simplest “atomic” system is a single electron. The spin of an electron is 
1/2, so there are two possible states. J z = h/2 and J z = —h/ 2. For an electron 
at rest (no orbital motion), the spin magnetic moment has a g-value of 2, so the 
magnetic energy can be either *= imB . The possible energies in a magnetic field are 
shown in Fig. 34-6. Speaking loosely we say that the electron either has its spin 
“up” (along the field) or “down” (opposite the field). 

For systems with higher spins, there are more states. We can think that the 
spin is “up” or “down” or cocked at some “angle” in between, depending on the 
value of J z . 

We will use these quantum mechanical results to discuss the magnetic prop- 
erties of materials in the next chapter. 
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Paramagnetism and Magnetic Resonance 


35-1 Quantized magnetic states 

In the last chapter we described how in quantum mechanics the angular 
momentum of a thing does not have an arbitrary direction, but its component 
along a given axis can take on only certain equally spaced, discrete values. It is 
a shocking and peculiar thing. You may think that perhaps we should not go 
into such things until your minds are more advanced and ready to accept this 
kind of an idea. Actually, your minds will never become more advanced — in 
the sense of being able to accept such a thing easily. There isn’t any descriptive 
way of making it intelligible that isn’t so subtle and advanced in its own form 
that it is more complicated than the thing you were trying to explain. The behavior 
of matter on a small scale — as we have remarked many times — is different from 
anything that you are used to and is very strange indeed. As we proceed with 
classical physics, it is a good idea to try to get a growing acquaintance with the 
behavior of things on a small scale, at first as a kind of experience without any 
deep understanding. Understanding of these matters comes very slowly, if at all. 
Of course, one does get better able to know what is going to happen in a quantum- 
mechanical situation — if that is what understanding means — but one never gets a 
comfortable feeling that these quantum-mechanical rules are “natural.” Of course 
they are , but they are not natural to our own experience at an ordinary level. We 
should explain that the attitude that we are going to take with regard to this rule 
about angular momentum is quite different from many of the other things we have 
talked about. We are not going to try to “explain” it, but we must at least tell you 
what happens; it would be dishonest to describe the magnetic properties of materials 
without mentioning the fact that the classical description of magnetism — of 
angular momentum and magnetic moments — is incorrect. 

One of the most shocking and disturbing features about quantum mechanics 
is that if you take the angular momentum along any particular axis you find that 
it is always an integer or half-integer times h. This is so no matter which axis you 
take. The subtleties involved in that curious fact — that you can take any other axis 
and find that the component for it is also locked to the same set of values — we will 
leave to a later chapter, when you will experience the delight of seeing how this 
apparent paradox is ultimately resolved. 

We will now just accept the fact that for every atomic system there is a number 
y, called the spin of the system — which must be an integer or a half-integer — and 
that the component of the angular momentum along any particular axis will 
always have one of the following values between -\-jh and — jh: 
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We have also mentioned that every simple atomic system has a magnetic 
moment which has the same direction as the angular momentum. This is true not 
only for atoms and nuclei but also for the fundamental particles. Each funda- 
mental particle has its own characteristic value of j and its magnetic moment. 
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Fig. 35-1 . An atomic system with spin 
I has (2/ + 1 ) possible energy values in a 
magnetic field B. The energy splitting is 
proportional to B for small fields. 



B 




(For some particles, both are zero.) What we mean by “the magnetic moment” 
in this statement is that the energy of the system in a magnetic field, say in 
the z-direction, can be written as — ju Z B for small magnetic fields. We must have the 
condition that the field should not be too great, otherwise it could disturb 
the internal motions of the system and the energy would not be a measure 
of the magnetic moment that was there before the field was turned on. But if the 
field is sufficiently weak, the field changes the energy by the amount 

A U - -j u z B, (35.2) 

with the understanding that in this equation we are to replace \x z by 



where J z has one of the values in Eq. (35.1). 

Suppose we take a system with a spin j = 3/2. Without a magnetic field, the 
system has four different possible states corresponding to the different values of 
J z , all of which have exactly the same energy. But the moment we turn on the mag- 
netic field, there is an additional energy of interaction which separates these states 
into four slightly different energy levels. The energies of these levels are given by a 
certain energy proportional to B , multiplied by h times 3/2, 1 /2, — 1/2, and —3/2 — 
the values of J z . The splitting of the energy levels for atomic systems with spins of 
1/2, 1, and 3/2 are shown in the diagrams of Fig. 35-1. (Remember that for any 
arrangement of electrons the magnetic moment is always directed opposite to the 
angular momentum.) 

You will notice from the diagrams that the “center of gravity” of the energy 
levels is the same with and without a magnetic field. Also notice that the spacings 
from one level to the next are always equal for a given particle in a given magnetic 
field. We are going to write the energy spacing, for a given magnetic field B , as 
ho ) v — which is just a definition of cc p . Using Eqs. (35.2) and (35.3), we have 


hUp = 8 i hB 

"i> = g 2^ B ■ (35 ‘ 4) 
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The quantity g(q/2m) is just the ratio of the magnetic moment to the angular 
momentum — it is a property of the particle. Equation (35.4) is the same formula 
that we got in Chapter 34 for the angular velocity of precession in a magnetic 
field, for a gyroscope whose angular momentum is J and whose magnetic moment 
is ft. 




Fig. 35-2. The experiment of Stern and Gerlach. 


35-2 The Stern-Gerlach experiment 

The fact that the angular momentum is quantized is such a surprising thing 
that we will talk a little bit about it historically. It was a shock from the moment 
it was discovered (although it was expected theoretically). It was first observed in 
an experiment done in 1922 by Stern and Gerlach. If you wish, you can consider 
the experiment of Stern-Gerlach as a direct justification for a belief in the quantiza- 
tion of angular momentum. Stern and Gerlach devised an experiment for measur- 
ing the magnetic moment of individual silver atoms. They produced a beam of 
silver atoms by evaporating silver in a hot oven and letting some of them come out 
through a series of small holes. This beam was directed between the pole tips 
of a special magnet, as shown in Fig. 35-2. Their idea was the following. If 
the silver atom has a magnetic moment / x, then in a magnetic field B it has an energy 
— fji z B , where z is the direction of the magnetic field. In the classical theory, fi z 
would be equal to the magnetic moment times the cosine of the angle between the 
moment and the magnetic field, so the extra energy in the field would be 

A U — — fiBcosd. (35.5) 

Of course, as the atoms come out of the oven, their magnetic moments would 
point in every possible direction, so there would be all values of 6. Now if the 
magnetic field varies very rapidly with z — if there is a strong field gradient — then 
the magnetic energy will also vary with position, and there will be a force on the 
magnetic moments whose direction will depend on whether cosine 9 is positive or 
negative. The atoms will be pulled up or down by a force proportional to the 
derivative of the magnetic energy; from the principle of virtual work, 

F z = —— = ju cos 0 — • (35.6) 

dz dz 

Stern and Gerlach made their magnet with a very sharp edge on one of the 
pole tips in order to produce a very rapid variation of the magnetic field. The beam 
of silver atoms was directed right along this sharp edge, so that the atoms would 
feel a vertical force in the inhomogeneous field. A silver atom with its magnetic 
moment directed horizontally would have no force on it and would go straight 
past the magnet. An atom whose magnetic moment was exactly vertical would 
have a force pulling it up toward the sharp edge of the magnet. An atom whose 
magnetic moment was pointed downward would feel a downward push. Thus, 
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as they left the magnet, the atoms would be spread out according to their vertical 
components of magnetic moment. In the classical theory all angles are possible, 
so that when the silver atoms are collected by deposition on a glass plate, one should 
expect a smear of silver along a vertical line. The height of the line would be pro- 
portional to the magnitude of the magnetic moment. The abject failure of classical 
ideas was completely revealed when Stern and Gerlach saw what actually happened. 
They found on the glass plate two distinct spots. The silver atoms had formed 
two beams. 

That a beam of atoms whose spins would apparently be randomly oriented 
gets split up into two separate beams is most miraculous. How does the magnetic 
moment know that it is only allowed to take on certain components in the direction 
of the magnetic field 9 Well, that was really the beginning of the discovery of the 
quantization of angular momentum, and instead of trying to give you a theoretical 
explanation, we will just say that you are stuck with the result of this experiment 
just as the physicists of that day had to accept the result when the experiment was 
done. It is an experimental fact that the energy of an atom in a magnetic field 
takes on a series of individual values. For each of these values the energy is pro- 
portional to the field strength. So in a region where the field varies, the principle 
of virtual work tells us that the possible magnetic force on the atoms will have a 
set of separate values, the force is different for each state, so the beam of atoms is 
split into a small number of separate beams. From a measurement of the deflection 
of the beams, one can find the strength of the magnetic moment. 


35-3 The Rabi molecular-beam method 

We would now like to describe an improved apparatus for the measurement 
of magnetic moments which was developed by I. 1 Rabi and his collaborators. 
In the Stern-Gerlach experiment the deflection of atoms is very small, and the 
measurement of the magnetic moment is not very precise. Rabi’s technique per- 
mits a fantastic precision in the measurement of the magnetic moments. The 
method is based on the fact that the original energy of the atoms in a magnetic 
field is split up into a finite number of energy levels. That the energy of an atom 
in the magnetic field can have only certain discrete energies is really not more 
surprising than the fact that atoms in general have only certain discrete energy 
levels — something we mentioned often in Volume I. Why should the same thing 
not hold for atoms in a magnetic field? It does. But it is the attempt to correlate 
this with the idea of an oriented magnetic moment that brings out some of the 
strange implications of quantum mechanics. 

When an atom has two levels which differ in energy by the amount Ail, it 
can make a transition from the upper level to the lower level by emitting a light 
quantum of frequency to, where 

hco = A U (35.7) 

The same thing can happen with atoms in a magnetic field. Only then, the energy 
differences are so small that the frequency does not correspond to light, but to 
microwaves or to radiofrequencies. The transitions from the lower energy level 
to an upper energy level of an atom can also take place with the absorption of light 
or, in the case of atoms m a magnetic field, by the absorption of microwave energy. 
Thus if we have an atom in a magnetic field, we can cause transitions from one state 
to another by applying an additional electromagnetic field of the proper frequency. 
In other words, if we have an atom in a strong magnetic field and we “tickle'’ 
the atom with a weak varying electromagnetic field, there will be a certain prob- 
ability of knocking it to another level if the frequency is near to the co in Eq. (35.7). 
For an atom in a magnetic field, this frequency is just what we have earlier called 
oo p and it is given in terms of the magnetic field by Eq (35.4). If the atom is tickled 
with the wrong frequency, the chance of causing a transition is very small. Thus 
there is a sharp resonance at co p in the probability of causing a transition. By 
measuring the frequency of this resonance in a known magnetic field B , we can 
measure the quantity g(q/2m ) — and hence the ^-factor — with great precision. 
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It is interesting that one comes to the same conclusion from a classical point 
of view. According to the classical picture, when we place a small gyroscope with 
a magnetic moment \i and an angular momentum J in an external magnetic field, 
the gyroscope will precess about an axis parallel to the magnetic field. (See Fig 
35-3.) Suppose we ask: How can we change the angle of the classical gyroscope 
with respect to the field — namely, with respect to the z-axis? The magnetic field 
produces a torque around a horizontal axis. Such a torque you would think is 
trying to line up the magnet with the field, but it only causes the precession. If we 
want to change the angle of the gyroscope with respect to the z-axis, we must 
exert a torque on it about the z-axis. If we apply a torque which goes in the same 
direction as the precession, the angle of the gyroscope will change to give a smaller 
component of J in the z-direction In Fig. 35-3, the angle between J and the 
z-axis would increase. If we try to hinder the precession, J moves toward the 
vertical. 

For our precessing atom in a uniform magnetic field, how can we apply the 
kind of torque we want? The answer is: with a weak magnetic field from the side 
You might at first think that the direction of this magnetic field would have to 
rotate with the precession of the magnetic moment, so that it was always at right 
angles to the moment, as indicated by the field B f in Fig. 35-4(a). Such a field 
works very well, but an alternating horizontal field is almost as good. If we have 
a small horizontal field which is always in the .v-direction (plus or minus) and 
which oscillates with the frequency co p , then on each one-half cycle the torque on 
the magnetic moment reverses, so that it has a cumulative effect which is almost 
as effective as a rotating magnetic field. Classically, then, we would expect the 
component of the magnetic moment along the z-direction to change if we have a 
very weak oscillating magnetic field at a frequency which is exactly Classically, 
of course, ji z would change continuously, but in quantum mechanics the z-com- 
ponent of the magnetic moment cannot adjust continuously. It must jump suddenly 
from one value to another. We have made the comparison between the con- 
sequences of classical mechanics and quantum mechanics to give you some clue 
as to what might happen classically and how it is related to what actually happens 
in quantum mechanics. You will notice, incidentally, that the expected resonant 
frequency is the same in both cases. 

One additional remark: From what we have said about quantum mechanics, 
there is no apparent reason why there couldn’t also be transitions at the frequency 
2o) v . It happens that there isn’t any analog of this in the classical case, and also 
it doesn’t happen in the quantum theory either — at least not for the particular 
method of inducing the transitions that we have described. With an oscillating 
horizontal magnetic field, the probability that a frequency 2co p would cause a jump 
of two steps at once is zero. It is only at the frequency co p that transitions, either 
upward or downward, are likely to occur. 

Now we are ready to describe Rabi’s method for measuring magnetic mo- 
ments. We will consider here only the operation for atoms with a spin of 1/2 A 
diagram of the apparatus is shown m Fig. 35-5. There is an oven which gives out 
a stream of neutral atoms which passes down a line of three magnets. Magnet 1 



Fig. 35-3. The classical precession of 
an atom with the magnetic moment ft and 
the angular momentum J. 



Fig. 35—4. The angle of precession of 
an atomic magnet can be changed by a 
horizontal magnetic field always at right 
angles to ft, as in (a), or by an oscillating 
field, as in (b). 



Fig. 35—5. The Rabi molecular-beam apparatus. 
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Fig. 35-6. The current of atoms in 
the beam decreases when co = cj p . 


is just like the one in Fig. 35-2, and has a field with a strong field gradient — say, 
with &B z /dz positive. If the atoms have a magnetic moment, they will be deflected 
downward if J z = + h/2, or upward if J z — — h/2 (since for electrons fx is directed 
opposite to /). If we consider only those atoms which can get through the slit 
Si, there are two possible trajectories, as shown. Atoms with J z — +/?/ 2 must 
go along curve a to get through the slit, and those with J z — —h/2 must go along 
curve b. Atoms which start out from the oven along other paths will not get 
through the slit. 

Magnet 2 has a uniform field. There are no forces on the atoms in this 
region, so they go straight through and enter magnet 3. Magnet 3 is just like 
magnet 1 but with the field inverted , so that dB z /dz has the opposite sign. The 
atoms with J z — -\-h/2 (we say “with spin up”), that felt a downward push in 
magnet 1, get an upward push in magnet 3; they continue on the path a and go 
through slit S 2 to a detector. The atoms with J z = — h/2 (“with spin down”) 
also have opposite forces m magnets 1 and 3 and go along the path b , which also 
takes them through slit S 2 to the detector. 

The detector may be made in various ways, depending on the atom being 
measured. For example, for atoms of an alkali metal like sodium, the detector can 
be a thin, hot tungsten wire connected to a sensitive current meter. When sodium 
atoms land on the wire, they are evaporated off as Na + ions, leaving an electron 
behind. There is a current from the wire proportional to the number of sodium 
atoms arriving per second. 

In the gap of magnet 2 there is a set of coils that produces a small horizontal 
magnetic field B f . The coils are driven with a current which oscillates at a variable 
frequency co. So between the poles of magnet 2 there is a strong, constant, 
vertical field B 0 and a weak, oscillating, horizontal field B ' . 

Suppose now that the frequency co of the oscillating field is set at aj }> — the 
“precession” frequency of the atoms in the field B. The alternating field will cause 
some of the atoms passing by to make transitions from one J z to the other An 
atom whose spin was initially “up” ( J z = +h/2) may be flipped “down” 
(J z = —ft/ 2). Now this atom has the direction of its magnetic moment reversed, 
so it will feel a downward force in magnet 3 and will move along the path a f , 
shown m Fig. 35-5. It will no longer get through the slit S 2 to the detector. 
Similarly, some of the atoms whose spins were initially down ( J z = —h/2) will 
have their spins flipped up (J z = -\-h/2) as they pass through magnet 2. They 
will then go along the path b f and will not get to the detector. 

If the oscillating field B ' has a frequency appreciably different from w p , it will 
not cause any spin flips, and the atoms will follow their undisturbed paths to 
the detector. So you can see that the “precession” frequency oo p of the atoms 
in the field B () can be found by varying the frequency co of the field B f until a de- 
crease is observed in the current of atoms arriving at the detector. A decrease in 
the current will occur when co is “in resonance” with co„. A plot of the detector 
current as a function of co might look like the one shown in Fig. 35-6. Knowing 
co p , we can obtain the g-value of the atom. 

Such atomic-beam or, as they are usually called, “molecular” beam resonance 
experiments are a beautiful and delicate way of measuring the magnetic properties 
of atomic objects. The resonance frequency co p can be determined with great 
precision — in fact, with a greater precision than we can measure the magnetic 
field B o, which we must know to find g. 


35-4 The paramagnetism of bulk materials 

We would like now to describe the phenomenon of the paramagnetism of 
bulk materials Suppose we have a substance whose atoms have permanent mag- 
netic moments, for example a crystal like copper sulfate. In the crystal there are 
copper ions whose inner electron shells have a net angular momentum and a net 
magnetic moment. So the copper ion is an object which has a permanent magnetic 
moment. Let’s say just a word about which atoms have magnetic moments and 
which ones don’t. Any atom, like sodium for instance, which has an odd number 
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of electrons, will have a magnetic moment. Sodium has one electron in its un- 
filled shell. This electron gives the atom a spin and a magnetic moment. Ordinarily, 
however, when compounds are formed the extra electrons in the outside shell are 
coupled together with other electrons whose spin directions are exactly opposite, 
so that all the angular momenta and magnetic moments of the valence electrons 
usually cancel out. That’s why, in general, molecules do not have a magnetic 
moment. Of course if you have a gas of sodium atoms, there is no such cancella- 
tion.* Also, if you have what is called in chemistry a “free radical” — an object 
with an odd number of valence electrons — then the bonds are not completely 
satisfied, and there is a net angular momentum. 

In most bulk materials there is a net magnetic moment only if there are atoms 
present whose inner electron shell is not filled. Then there can be a net angular 
momentum and a magnetic moment. Such atoms are found in the “transition 
element” part of the periodic table — for instance, chromium, manganese, iron, 
nickel, cobalt, palladium, and platinum are elements of this kind. Also, all of the 
rare earth elements have unfilled inner shells and permanent magnetic moments. 
There are a couple of other strange things that also happen to have magnetic 
moments, such as liquid oxygen, but we will leave it to the chemistry department 
to explain the reason. 

Now suppose that we have a box full of atoms or molecules with permanent 
moments — say a gas, or a liquid, or a crystal. We would like to know what happens 
if we apply an external magnetic field. With no magnetic field, the atoms are kicked 
around by the thermal motions, and the moments wind up pointing in all directions. 
But when there is a magnetic field, it acts to line up the little magnets; then there 
are more moments lying toward the field than away from it. The material is 
“magnetized.” 

We define the magnetization M of a material as the net magnetic moment per 
unit volume, by which we mean the vector sum of all the atomic magnetic moments 
in a unit volume. If there are N atoms per unit volume and their average moment 
is (mXiv then M can be written as N times the average atomic moment: 

M = (35.8) 


The definition of M corresponds to the definition of the electric polarization P 
of Chapter 10. 

The classical theory of paramagnetism is just like the theory of the dielectric 
constant we showed you in Chapter 11. One assumes that each of the atoms has a 
magnetic moment ju, which always has the same magnitude but which can point 
in any direction. In a field B , the magnetic energy is —p • B = ~jxB cos 6, where 
6 is the angle between the moment and the field. From statistical mechanics, the 
relative probability of having any angle is e ~ cncr *y lkT , so angles near zero are 
more likely than angles near tt. Proceeding exactly as we did in Section 1 1-3, we 
find that for small magnetic fields Mis directed parallel to B and has the magnitude 


M = 


NjA 2 B 
3 kT 


(35 9) 


[See Eq. (1 1.20).] This approximate formula is correct only for pB/kT much less 
than one. 

We find that the induced magnetization — the magnetic moment per unit 
volume — is proportional to the magnetic field. This is the phenomenon of para- 
magnetism. Y ou will see that the effect is stronger at lower temperatures and weaker 
at higher temperatures. When we put a field on a substance, it develops, for small 
fields, a magnetic moment proportional to the field. The ratio of M to B (for small 
fields) is called the magnetic susceptibility . 

Now we want to look at paramagnetism from the point of view of quantum 
mechanics. We take first the case of an atom with a spin of 1/2. In the absence of 


* Ordinary Na vapor is mostly monatomic, although there are also some molecules of 
Na2- 
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a magnetic field the atoms have a certain energy, but in a magnetic field there are 
two possible energies, one for each value of J z . For J z = +&/ 2, the energy is 
changed by the magnetic field by the amount 



(The energy shift A U is positive for an atom because the electron charge is negative.) 
For J z = —h/2, the energy is changed by the amount 



(35.11) 

To save writing, let’s set 

( 1 

Mo - 8 \2 m) ' 2 ’ 

(35.12) 

then 

A U = ±jU 0 Z?. 

(35.13) 


The meaning of {x () is clear: — fjL 0 is the z-component of the magnetic moment in 
the up-spin case, and H-/z 0 is the z-component of the magnetic moment in the 
down-spin case. 

Now statistical mechanics tells us that the probability that an atom is in one 
state or another is proportional to 

(Energy of stat c)!kT 


With no magnetic field the two states have the same energy ; so when there is equilib- 
rium in a magnetic field, the probabilities are proportional to 

e~ w / kT . (35.14) 

The number of atoms per unit volume with spin up is 

N up = ae~^ B l kt , (35.15) 

and the number with spin down is 

N down = ae+** Blkt . (35.16) 

The constant a is to be determined so that 

^up d" ^down = (35.17) 


the total number of atoms per unit volume. So we get that 

- N 
^ BjkT -j- q — Mo BjkT 


(35.18) 


What we are interested in is the average magnetic moment along the z-axis. 
The atoms with spin up will contribute a moment of — ju 0 , and those with spin 
down will have a moment of 0 , so the average moment is 




^up( Mo) d~ ^down( d - Mo) 

_ 


(35.19) 


The magnetic moment per unit volume M is then N(fx) tiy . Using Eqs. (35.15), 
(35.16), and (35.17), we get that 


e ~\-t*oBlkT g—tiQlifkT 

M = Nfl0 e +M 0 BjkT + e -r 0 BjkT 


(35.20) 


This is the quantum-mechanical formula for M for atoms with j — 1/2. Inciden- 
tally, this formula can also be written somewhat more concisely in terms of the 
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hyperbolic tangent function: 


M = 


N/jlq tanh 


Mo B 
kT 


(35.21) 


A plot of M as a function of B is given in Fig. 35.7. When B gets very large, 
the hyperbolic tangent approaches 1, and M approaches the limiting value Nfi {) 
So at high fields, the magnetization saturates. We can see why that is; at high 
enough fields the moments are all lined up in the same direction In other words, 
they are all m the spin-down state, and each atom contributes the moment g {) 

In most normal cases — say, for typical moments, room temperatures, and 
the fields one can normally get (like 10,000 gauss) — the ratio mo B/kT is about 0.02. 
One must go to very low temperatures to see the saturation. For normal tempera- 
tures, we can usually replace tanh x by x, and write 

M = (35 22) 

Just as we saw in the classical theory, M is proportional to B. In fact, the 
formula is almost exactly the same, except that there seems to be a factor of 1/3 
missing. But we still need to relate the ju 0 in our quantum formula to the \x that 
appears in the classical result, Eq (35.9). 

In the classical formula, what appears is u 2 = m * the square of the vector 
magnetic moment, or 

M /* = (* J J (35 23) 



Fig. 35-7. The variation of the para- 
magnetic magnetization with the magnetic 
field strength 8. 


We pointed out in the last chapter that you can very likely get the right answer 
from a classical calculation by replacing /• / by j(J + 1 )h 2 . In our particular 
example, we have j = 1/2, so 


jo + m 2 = 'ih 2 . 


Substituting this for / * / in Eq. (35.23), we get 



or in terms of Mo> defined in Eq. (35.12), we get 


M * M 


3/4 


Substituting this for \x 2 in the classical formula, Eq. (35.9), does indeed reproduce 
the correct quantum formula, Eq. (35.22). 

The quantum theory of paramagnetism is easily extended to atoms of any 
spin j. The low-field magnetization is 


where 


M = 


Ng 2 


j(j + 1) fx%B 
3 kT ’ 


(35.24) 


M/i 


2m 


(35.25) 


is a combination of constants with the dimensions of a magnetic moment. Most 
atoms have moments of roughly this size. It is called the Bohr magneton. The 
spin magnetic moment of the electron is almost exactly one Bohr magneton. 


35-5 Cooling by adiabatic demagnetization 

There is a very interesting special application of paramagnetism. At very 
low temperatures it is possible to line up the atomic magnets in a strong field. 
It is then possible to get down to extremely low temperatures by a process called 
adiabatic demagnetization. We can take a paramagnetic salt (for example, one 
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containing a number of rare-earth atoms likepraseodynium-ammonium-nitrate), 
and start by cooling it down with liquid helium to one or two degrees absolute in a 
strong magnetic field. Then the factor pB/kT is larger than 1 — say more like 2 or 3. 
Most of the spins are lined up, and the magnetization is nearly saturated. Let’s 
say, to make it easy, that the field is very powerful and the temperature is very low, 
so that nearly all the atoms are lined up. Then you isolate the salt thermally (say, 
by removing the liquid helium and leaving a good vacuum) and turn off the mag- 
netic field. The temperature of the salt goes way down. 

Now if you were to turn off the field suddenly , the jiggling and shaking of the 
atoms in the crystal lattice would gradually knock all the spins out of alignment. 
Some of them would be up and some down. But if there is no field (and disregard- 
ing the interactions between the atomic magnets, which will make only a slight 
error), it takes no energy to turn over the atomic magnets. They could randomize 
their spins without any energy change and, therefore, without any temperature 
change. 

Suppose, however, that while the atomic magnets are being flipped over by the 
thermal motion there is still some magnetic field present. Then it requires some 
work to flip them over opposite to the field — they must do work against the field. 
This takes energy from the thermal motions and lowers the temperature. So if the 
strong magnetic field is not removed too rapidly, the temperature of the salt will 
decrease — it is cooled by the demagnetization. From the quantum-mechanical 
view, when the field is strong all the atoms are in the lowest state, because the odds 
against any being in the upper state are impossibly big. But as the field is lowered, 
it gets more and more likely that thermal fluctuations will knock an atom into the 
upper state. When that happens, the atom absorbs the energy A U = p 0 B. So if 
the field is turned off slowly, the magnetic transitions can take energy out of the 
thermal vibrations of the crystal, cooling it off. It is possible in this way to go from 
a temperature of a few degrees absolute down to a temperature of a few thou- 
sandths of a degree. 

Would you like to make something even colder than that? It turns out that 
Nature has provided a way. We have already mentioned that there are also mag- 
netic moments for the atomic nuclei. Our formulas for paramagnetism work just 
as well for nuclei, except that the moments of nuclei are roughly a thousand times 
smaller. [They are of the order of magnitude of qh/2m Vi where m p is the proton 
mass, so they are smaller by the ratio of the masses of the electron and proton.] 
With such magnetic moments, even at a temperature of 2°K, the factor pB/kT 
is only a few parts in a thousand. But if we use the paramagnetic demagnetiza- 
tion process to get down to a temperature of a few thousandths of a degree, 
pB/kT becomes a number near 1 — at these low temperatures we can begin to 
saturate the nuclear moments. That is good luck, because we can then use 
the adiabatic demagnetization of the nuclear magnetism to reach still lower 
temperatures. Thus it is possible to do two stages of magnetic cooling. First we 
use adiabatic demagnetization of paramagnetic ions to reach a few thousandths of 
a degree. Then we use the cold paramagnetic salt to cool some material which has 
a strong nuclear magnetism. Finally, when we remove the magnetic field from this 
material, its temperature will go down to within a millionth of a degree of absolute 
zero — if we have done everything very carefully. 


35-6 Nuclear magnetic resonance 

We have said that atomic paramagnetism is very small and that nuclear mag- 
netism is even a thousand times smaller. Yet it is relatively easy to observe the 
nuclear magnetism by the phenomenon of “nuclear magnetic resonance.” Suppose 
we take a substance like water, in which all of the electron spins are exactly bal- 
anced so that their net magnetic moment is zero. The molecules will still have a 
very, very tiny magnetic moment due to the nuclear magnetic moment of the hydro- 
gen nuclei. Suppose we put a small sample of water in a magnetic field B. Since 
the protons (of the hydrogen) have a spin of 1/2, they will have two possible 
energy states. If the water is in thermal equilibrium, there will be slightly more 
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protons in the lower energy states — with their moments directed parallel to the 
field. There is a small net magnetic moment per unit volume. Since the proton 
moment is only about one-thousandth of an atomic moment, the magnetization 
which goes as fx 2 — using Eq. (35.22) — is only about one-millionth as strong as 
typical atomic paramagnetism. (That’s why we have to pick a material with no 
atomic magnetism.) If you work it out, the difference between the number of 
protons with spin up and with spin down is only one part in 10 8 , so the effect 
is indeed very small! It can still be observed, however, in the following way. 

Suppose we surround the water sample with a small coil that produces a 
small horizontal oscillating magnetic field. If this field oscillates at the frequency 
co p , it will induce transitions between the two energy states — just as we described 
for the Rabi experiment in Section 35-3. When a proton flips from an upper 
energy state to a lower one, it will give up the energy (x z B which, as we have seen, 
is equal to hw v . If it flips from the lower energy state to the upper one, it will 
absorb the energy hw p from the coil. Since there are slightly more protons in the 
lower state than in the upper one, there will be a net absorption of energy from the 
coil. Although the effect is very small, the slight energy absorption can be seen 
with a sensitive electronic amplifier. 

Just as in the Rabi molecular-beam experiment, the energy absorption will be 
seen only when the oscillating field is in resonance, that is, when 



It is often more convenient to search for the resonance by varying B while keeping 
co fixed. The energy absorption will evidently appear when 


A typical nuclear magnetic resonance apparatus is shown in Fig. 35-8. A 
high-frequency oscillator drives a small coil placed between the poles of a large 
electromagnet. Two small auxiliary coils around the pole tips are driven with a 
60-cycle current so that the magnetic field is “wobbled” about its average value by 
a very small amount. As an example, say that the main current of the magnet is set 
to give a field of 5000 gauss, and the auxiliary coils produce a variation of ± 1 gauss 
about this value. If the oscillator is set at 21.2 megacycles per second, it will then be 
at the proton resonance each time the field sweeps through 5000 gauss [using Eq. 
(34.13) with g = 5.58 for the proton]. 

The circuit of the oscillator is arranged to give an additional output signal 
proportional to any change in the power being absorbed from the oscillator. This 
signal is fed to the vertical deflection amplifier of an oscilloscope. The horizontal 
sweep of the oscilloscope is triggered once during each cycle of the field-wobbling 
frequency. (More usually, the horizontal deflection is made to follow m proportion 
to the wobbling field.) 

Before the water sample is placed inside the high-frequency coil, the power 
drawn from the oscillator is some value. (It doesn’t change with the magnetic field ) 
When a small bottle of water is placed in the coil, however, a signal appears on the 
oscilloscope, as shown in the figure. We see a picture of the power being absorbed 
by the flipping over of the protons! 

In practice, it is difficult to know how to set the main magnet to exactly 5000 
gauss. What one does is to adjust the main magnet current until the resonance 
signal appears on the oscilloscope. It turns out that this is now the most con- 
venient way to make an accurate measurement of the strength of a magnetic field. 
Of course, at some time someone had to measure accurately the magnetic field and 
frequency to determine the g-value of the proton. But now that this has been done, 
a proton resonance apparatus like that of the figure can be used as a “proton reso- 
nance magnetometer.” 

We should say a word about the shape of the signal. If we were to wobble the 
magnetic field very slowly, we would expect to see a normal resonance curve. 
The energy absorption would read a maximum when co p arrived exactly at the 



Fig 35-8. A nuclear magnetic reso- 
nance apparatus. 
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oscillator frequency. There would be some absorption at nearby frequencies 
because all the protons are not in exactly the same field — and different fields mean 
slightly different resonant frequencies. 

One might wonder, incidentally, whether at the resonance frequency we should 
see any signal at all. Shouldn’t we expect the high-frequency field to equalize the 
populations of the two states — so that there should be no signal except when the 
water is first put in? Not exactly, because although we are trying to equalize the 
two populations, the thermal motions on their part are trying to keep the proper 
ratios for the temperature T. If we sit at the resonance, the power being absorbed 
by the nuclei is just what is being lost to the thermal motions. There is, however, 
relatively little “thermal contact” between the proton magnetic moments and the 
atomic motions The protons are relatively isolated down in the center of the 
electron distributions. So m pure water, the resonance signal is, in fact, usually 
too small to be seen. To increase the absorption, it is necessary to increase the 
“thermal contact.” This is usually done by adding a little iron oxide to the water. 
The iron atoms are like small magnets; as they jiggle around in their thermal dance, 
they make tiny jiggling magnetic fields at the protons. These varying fields “couple” 
the proton magnets to the atomic vibrations and tend to establish thermal equi- 
librium. It is through this “coupling” that protons in the higher energy states can 
lose their energy so that they are again capable of absorbing energy from the 
oscillator. 

In practice the output signal of a nuclear resonance apparatus does not look 
like a normal resonance curve. It is usually a more complicated signal with oscilla- 
tions — like the one drawn in the figure. Such signal shapes appear because of the 
changing fields. The explanation should be given in terms of quantum mechanics, 
but it can be shown that in such experiments the classical ideas of precessing mo- 
ments always give the correct answer. Classically, we would say that when we ar- 
rive at resonance we start driving a lot of the precessing nuclear magnets synchro- 
nously. In so doing, we make them precess together. These nuclear magnets, all 
rotating together, will set up an induced emf in the oscillator coil at the frequency 
co p . But because the magnetic field is increasing with time, the precession frequency 
is increasing also, and the induced voltage is soon at a frequency a little higher than 
the oscillator frequency. As the induced emf goes alternately in phase and out of 
phase with the oscillator, the “absorbed” power goes alternately positive and 
negative. So on the oscilloscope we see the beat note between the proton frequency 
and the oscillator frequency. Because the proton frequencies are not all identical 
(different protons are in slightly different fields) and also possibly because of the 
disturbance from the iron oxide in the water, the freely precessing moments soon 
get out of phase, and the beat signal disappears. 

These phenomena of magnetic resonance have been put to use in many ways 
as tools for finding out new things about matter — especially in chemistry and 
nuclear physics. It goes without saying that the numerical values of the magnetic 
moments of nuclei tell us something about their structure. In chemistry, much has 
been learned from the structure (or shape) of the resonances. Because of magnetic 
fields produced by nearby nuclei, the exact position of a nuclear resonance is 
shifted somewhat, depending on the environment in which any particular nucleus 
finds itself. Measuring these shifts helps determine which atoms are near which 
other ones and helps to elucidate the details of the structure of molecules Equally 
important is the electron spin resonance of free radicals. Although not present 
to any very large extent in equilibrium, such radicals are often intermediate states 
of chemical reactions. A measurement of an electron spin resonance is a delicate 
test for the presence of free radicals and is often the key to understanding the 
mechanism of certain chemical reactions. 
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Ferromagnetism 


36-1 Magnetization currents 


In this chapter we will discuss some materials in which the net effect of the 
magnetic moments in the material is much greater than in the case of paramagnetism 
or diamagnetism. The phenomenon is called ferromagnetism. In paramagnetic and 
diamagnetic materials the induced magnetic moments are usually so weak that 
we don’t have to worry about the additional fields produced by the magnetic 
moments. For ferromagnetic materials, however, the magnetic moments induced 
by applied magnetic fields are quite enormous and have a great effect on the fields 
themselves. In fact, the induced moments are so strong that they are often the 
dominant effect in producing the observed fields. So one of the things we will 
have to worry about is the mathematical theory of large induced magnetic moments. 
That is, of course, just a technical question. The real problem is, why are the 
magnetic moments so strong — how does it all work? We will come to that question 
in a little while. 

Finding the magnetic fields of ferromagnetic materials is something like the 
problem of finding the electrostatic field in the presence of dielectrics. You will 
remember that we first described the internal properties of a dielectric in terms of 
a vector field P, the dipole moment per unit volume. Then we figured out that the 
effects of this polarization are equivalent to a charge density p po i given by the di- 
vergence of P: 

Ppol = -VP. (36.1) 
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36-5 Electromagnets 

36-6 Spontaneous magnetization 


Review: Chapter 10, Dielectrics 

Chapter 17, The Law of In- 
duction 


The total charge in any situation can be written as the sum of this polarization 
charge plus all other charges, whose density we write* Pother- Then the Maxwell 
equation which relates the divergence of E to the charge density becomes 


^7 . /T — — Ppol i Pother ^ 

to to 

or 

V • E = — - P + ^2^ . 

to to 

We can then pull out the polarization part of the charge and put it on the other 
side of the equation, to get the new law 


V * (€qE + P) — Pother- (36.2) 

The new law says the divergence of the quantity (e 0 E + P ) is equal to the density 
of the other charges. 

Pulling E and P together as in Eq. (36.2), of course, is useful only if we know 
some relation between them. We have seen that the theory which relates the 
induced electric dipole moment to the field was a relatively complicated business 
and can really only be applied to certain simple situations, and even then as an 
approximation. We would like to remind you of one of the approximate ideas 
we used. To find the induced dipole moment of an atom inside a dielectric, it is 
necessary to know the electric field that acts on an individual atom. We made the 
approximation — which is not too bad in many cases — that the field on the atom 


* If all of the “other” charges were on conductors, pother would be the same as our 
p free of Chapter 10. 






Fig. 36-1. The electric field in a 
cavity in a dielectric depends on the 
shape of the cavity. 


is the same as it would be at the center of the small hole which would be left if we 
took out the atom (keeping the dipole moments of all the neighboring atoms the 
same). You will also remember that the electric field in a hole in a polarized di- 
electric depends on the shape of the hole. We summarize our earlier results in 
Fig. 36-1. For a thin, disc-shaped hole perpendicular to the polarization, the 
electric field in the hole is given by 

P 

^hole = ^dielectric ~ 9 

Co 

which we showed by using Gauss’ law. On the other hand, in a needle-shaped 
slot parallel to the polarization, we showed — by using the fact that the curl of E is 
zero— that the electric fields inside and outside of the slot are the same. Finally, 
we found that for a spherical hole the electric field was one-third of the way between 
the field of the slot and the field of the disc: 

1 P 

£hoie = ^dielectric + ~ — (spherical hole). (36.3) 

d Co 

This was the field we used in thinking about what happens to an atom insidf a 
polarized dielectric. 

Now we have to discuss the analog of all this for the case of magnetism. 
One simple, short-cut way of doing this is to say the M, the magnetic moment per 
unit volume, is just like P, the electric dipole moment per unit volume, and that, 
therefore, the negative of the divergence of M is equivalent to a “magnetic charge 
density” p m — whatever that may mean. The trouble is, of course, that there isn’t 
any such thing as a “magnetic charge” in the physical world. As we know, the 
divergence of B is always zero. But that does not stop us from making an artificial 
analog and writing 

V • M = -p^, (36.4) 

where it is to be understood that p m is purely mathematical. Then we could make 
a complete analogy with the electrostatic case and use all our old equations from 
electrostatics. People have often done something like that. In fact, historically, 
people even believed that the analogy was right. They believed that the quantity 
p m represented the density of “magnetic poles.” These days, however, we know 
that the magnetization of materials comes from circulating currents within the 
atoms — either from the spinning electrons or from the motion of the electrons in 
the atom. It is therefore nicer from a physical point of view to describe things 
realistically in terms of the atomic currents, rather than in terms of a density of 
some mythical “magnetic poles.” Incidentally, these currents are sometimes called 
“Amperian” currents, because Ampere first suggested that the magnetism of 
matter came from circulating atomic currents. 

The actual microscopic current density in magnetized matter is, of course, 
very complicated. Its value depends on where you look in the atom — it’s large in 
some places and small in others; it goes one way in one part of the atom and the 
opposite way in another part (just as the microscopic electric field varies enor- 
mously inside a dielectric). In many practical problems, however, we are interested 
only in the fields outside of the matter or in the average magnetic field inside of the 
matter — where we mean an average taken over many, many atoms. It is only for 
such macroscopic problems that it is convenient to describe the magnetic state of 
the matter in terms of Af, the average dipole moment per unit volume. What we 
want to show now is that the atomic currents of magnetized matter can give rise 
to certain large-scale currents which are related to M. 

What we are going to do, then, is to separate the current density j — which is 
the real source of the magnetic fields — into various parts : one part to describe the 
circulating currents of the atomic magnets, and the other parts to describe what 
other currents there may be. It is usually most convenient to separate the currents 
into three parts. In Chapter 32 we made a distinction between the currents which 
flow freely on conductors and the ones which are due to the back and forth motions 
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of the bound charges in dielectrics. In Section 32-2 we wrote 


7 


— 7poi + 7o 


th 


ers 


where j po i represented the currents from the motion of the bound charges in di- 
electrics and 7 ot hor took care of all other currents. Now we want to go further. 
We want to separate y othpr into one part, j mA g , which describes the average currents 
inside of magnetized materials, and an additional term which we can call j con a for 
whatever is left over. The last term will generally refer to currents in conductors, 
but it may also include other currents — for example the currents from charges 
moving freely through empty space. So we will write for the total current density: 

7 7pol "T 7mag + 7cond* (36.5) 

Of course it is this total current which belongs in the Maxwell equation for the 
curl of B: 

c 2 VXB=^ + |- (36.6) 

Co ot 

Now we have to relate the current / mag to the magnetization vector M. So 
that you can see where we are going, we will tell you that the result is going to 
be that 

7mag = v X M. (36.7) 


If we are given the magnetization vector M everywhere in a magnetic material, 
the circulation current density is given by the curl of M. Let’s see if we can under- 
stand why this is so. 

First, let’s take the case of a cylindrical rod which has a uniform magnetization 
parallel to its axis. Physically, we know that such a uniform magnetization really 
means a uniform density of atomic circulating currents everywhere inside the 
material. Suppose we try to imagine what the actual currents would look like in 
a cross section of the material. We would expect to see currents something like 
those shown in Fig. 36-2, Each atomic current goes around and around in a little 
circle, with all the circulating currents going around in the same direction. Now 
what is the effective current of such a thing? Well, in most of the bar there is no 
effect at all, because right next to each current there is another current going in 
the opposite direction. If we imagine a small surface — but one still quite a bit 
larger than a single atom — such as is indicated in Fig. 36-2 by the line AB, 
the net current through such a surface is zero. There is no net current any- 
where inside the material. Note, however, that at the surface of the material there 
are atomic currents which are not cancelled by neighboring currents going the 
other way. At the surface there is a net current always going in the same direction 
around the rod. Now you see why we said earlier that a uniformly magnetized 
rod is equivalent to a long solenoid carrying an electric current. 

How does this view fit with Eq. (36. 7) 9 First, inside the material the magne- 
tization Mis constant, so all its derivatives are zero. This agrees with our geometric 
picture. At the surface, however, M is not really constant — it is constant up to 
the edge and then suddenly collapses to zero. So, right at the surface there are 
terrific gradients which, according to (36.7), will give a high current density. 
Suppose we look at what happens near the point C in Fig. 36-2. Taking the x- 
and ^-directions as in the figure, the magnetization Mis in the z-direction. Writing 
out the components of Eq. (36.7), we have 



Fig. 36-2. Schematic diagram of the 
circulating atomic currents as seen in a 
cross section of an iron rod magnetized in 
the z-direction. 


SM Z 

dy 

dM z 

dx 


i.jmag)y 


(36.8) 


At the point C, the derivative dM z /dy is zero, but dM z /dx is large and positive. 
Equation (36.7) says that there is a large current density in the minus ^-direction. 
This agrees with our picture of a surface current going around the bar. 
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Now we want to find the current density for a more complicated case in which 
the magnetization varies from point to point in a material. It is easy to see quali- 
tatively that if the magnetization is different in two neighboring regions, there will 
not be a perfect cancellation of the circulating currents so that there will be a net 
current in ihe volume of the material. It is this effect that we want to work out 
quantitatively. 

First, we need to recall the results of Section 14-5 that a circulating current 
I has a magnetic moment /x given by 


M = IA, 


(36.9) 


Fig. 36-3. The dipole moment ju of a 
current loop is /A. 



where A is the area of the current loop (see Fig. 36-3). Now let’s consider a small 
rectangular block inside of a magnetized material, as sketched in Fig. 36-4. We 
take the block so small that we can consider that the magnetization is uniform 
inside it. If this block has a magnetization M z in the z-direction, the net effect 
will be the same as a surface current going around on the vertical faces, as shown. 
We can find the magnitude of these currents from Eq. (36.9). The total magnetic 
moment of the block is equal to the magnetization times the volume: 

fi = M z {abc ), 

from which we get (remembering that the area of the loop is ac) 

I = M z b. 

In other words, the current per unit length (vertically) on each of the vertical 
surfaces is equal to M z . 


Fig. 36-4. A small magnetized block 
is equivalent to a circulating surface 
current. 


Fig. 36-5. If the magnetization of 
two neighboring blocks is not the same, 
there is a net surface current in between. 



x 


Now suppose that we imagine two such little blocks next to each other, as 
shown in Fig. 36-5. Because block 2 is slightly displaced from block 1, it will have 
a slightly different vertical component of magnetization, which we call M z + A M z . 
Now on the surface between the two blocks there will be two contributions to the 
total current. Block 1 will produce a current I x flowing in the positive y-direction, 
and block 2 will produce a surface current / 2 flowing in the negative y-direction. 
The total surface current in the positive y-direction is the sum: 

I = lx - h = M z b - (M z + A M z )b 

- -A M z b. 


We can write AM Z as the derivative of M z in the .v-direction times the displacement 
from block 1 to block 2, which is just a: 


A M z 


dM z 

dx 


a. 


The current flowing between the two blocks is then 


I = 


dM z 

dx 


ab . 
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To relate the current / to an average volume current density y, we must realize 
that this current I is really spread over a certain cross-sectional area. If we imagine 
the whole volume of the material to be filled with such little blocks, one such side 
face (perpendicular to the x-axis) can be associated with each block.* Then we 
see that the area to be associated with the current / is just the area ab of one of 
the front faces. We get the result 

. __ I dM z 

Jv ~ ab~ dx 


We have at least the beginning of the curl of M. 

There should be another term in j y from the variation of the ^-component of 
the magnetization with z. This contribution to j will come from the surface between 
two little blocks stacked one on top of the other, as shown in Fig. 36-6. Using 
the same arguments we have just made, you can show that this surface will con- 
tribute to j y the amount dM x /dz . These are the only surfaces which can contribute 
to the ^-component of the current so we have that the total current density m the 
y-direction is 

dM x dM z 
Jy = Tz W 

Working out the currents on the remaining faces of a cube — or using the fact 
that our z-direction is completely arbitrary — we can conclude that the vector 
current density is indeed given by the equation 

j - V X M. 



So if we choose to describe the magnetic situation in matter in terms of the Fig. 36-6. Two blocks, one above the 
average magnetic moment per unit volume M , we find that the circulating atomic other, may also contribute to / y . 

currents are equivalent to an average current density in matter given by Eq. (36.7). 

If the material is also a dielectric, there may be, in addition, a polarization current 
j v oi = dP/dt. And if the material is also a conductor, we may have a conduction 
current j imu \ as well. We can write the total current as 


j jc ond ~t“ ^ X M + 


dP 

dt 


(36.10) 


36-2 The field H 


Next, we want to insert the current as written in Eq. (36.10) into MaxwelPs 
equations. We get 


c 2 V X B = j- 
Co 


BE 

dt 


Co 


(y contl + V X M + 


dP 

dt J 


+ 


BE 
dt * 


We can move the term in M to the left-hand side: 


c 2 V X (b - -^) = ^ (e + — ) • (36.11) 

\ e 0 c 2 / e 0 dt \ Co/ 

As we remarked in Chapter 32, many people like to write ( E + P/e 0 ) as a new 
vector field D/e 0 . Similarly, it is often convenient to write ( B — M/e 0 c 2 ) as a 
single vector field. We choose to define a new vector field H by 


H - B 


Then Eq. (36.11) becomes 


M 

c 0 c 2 


CqC V X H ycond + 


dD 

dt 


(36.12) 

(36.13) 


It looks simple, but all the complexity is just hidden in the letters D and H. 


* Or, if you prefer, the current I in each face should be split 50-50 with the blocks on 
the two sides. 
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a: So 


Table 36-1 

Units of magnetic quantities 

[5] = weber/meter 2 = 10 4 gauss 
[//] = weber/meter 2 = 10 4 gauss 
or 10 4 oersted 
[M] = ampere/meter 
[//'] ~ ampere/meter 

Convenient conversions 

(gauss) — 10 4 B (weber/meter 2 ) 
(gauss) = H (oersted) 

= 0.0126 H' (amp/meter) 


Now we have to give you a warning. Most people who use the mks units 
have chosen to use a different definition of H. Calling their field H' (of course, 
they still call it H without the prime), it is defined by 

H ' = € 0 C 2 B - M. (36.14) 

(Also, they usually write e 0 c 2 as a new number l/u 0 : then they have one more 
constant to keep track of!) With this definition, Eq. (36.13) looks even simpler: 

V X H' =j cond + (36.15) 

at 

But the difficulties with this definition of H f are, first, that it doesn’t agree with the 
definition of people who don’t use the mks units, and second, that it makes H* 
and B have different units. We think it is more convenient for H to have the same 
units as B — rather than the units of M, as H f does. But if you are going to be an 
engineer and work on the design of transformers, magnets, and such, you will have 
to watch out. You will find many books which use for H the definition of Eq. 
(36.14) rather than our definition of Eq. (36.12), and many other books — especially 
handbooks about magnetic materials — that relate B and H the way we have done. 
You’ll have to be careful to figure out which convention they are using. 

One way to tell is by the units they use. Remember that in the mks system, 
B — and therefore our H — are measured with the unit: one weber per square meter, 
equal to 10,000 gauss. In the mks system, a magnetic moment (a current times an 
area) has the unit: one ampere-meter 2 . The magnetization M , then, has the unit: 
one ampere per meter. For H' the units are the same as for M. You can see that 
this also agrees with Eq. (36.15), since V has the dimensions of one over a length. 
People who are working with electromagnets also get in the habit of calling the 
unit of H (with the //' definition) “one ampere turn per meter” — thinking of the 
turns of wire on a winding. But a “turn” is really a dimensionless number, so that 
doesn’t need to confuse you. Since our H is equal to H'/e 0 c 2 , if you are using the 
mks system, H (in webers/meter 2 ) is equal to 47 t X 10~~ 7 times H ' (in amperes 
per meter). It is perhaps more convenient to remember that H (in gauss) = 
0.0126 H r (in amp/meter). 

There is one more horrible thing. Many people who use our definition of 
H have decided to call the units of H and B by different namesl Even though they 
have the same dimensions, they call the unit of B one gauss , and the unit of H one 
oersted (after Gauss and Oersted, of course). So, in many books you will find 
graphs with B plotted in gauss and H in oersteds. They are really the same unit — 
10“ 4 of the mks unit. We have summarized the confusion about magnetic units 
in Table 36-1. 


36-3 The magnetization curve 

Now we will look at some simple situations in which the magnetic field is 
constant, or in which the fields change slowly enough that we can neglect dD/dt in 
comparison with j CO m\- Then the fields obey the equations 

V B = 0, (36.16) 

V X H = y'condAoC 2 , (36.17) 

H = B - M/e 0 c 2 . (36.18) 

Suppose we have a torus (a donut) of iron wrapped with a coil of copper wire, 
as shown in Fig. 36-7(a). A current I flows in the wire. What is the magnetic 
field? The magnetic field will be mainly inside the iron; there, the lines of B will 

be circles, as drawn in Fig. 36-7(b). Since the flux of B is continuous, its divergence 

is zero, and Eq (36.16) is satisfied Next, we write Eq (36. 17) in another form by 
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integrating around the closed loop r drawn in Fig. 36-7(b). 
theorem, we have that 


H ds = 


€ 0 C Z 


,/cond ' & da , 
S 


From Stokes’s 


(36.19) 


where the integral of j is to be carried out over any surface S bounded by r. This 
surface is cut once by each turn of the winding. Each turn contributes the current 
/ to the integral, and, if there are N turns in all, the integral is NI. From the 
symmetry of our problem, B is the same all around the curve T ; if we assume that 
the magnetization, and therefore, the field H is also constant along T, Eq. (36.19) 
becomes 

NI 


HI = 


e 0 c z 


where / is the length of the curve r. So, 


1 NI 


2 / 


(36.20) 


It is because H is directly proportional to the magnetizing current in cases like 
this one that H is sometimes called the magnetizing field . 

Now all we need is an equation which relates H to B . But there isn’t any such 
equation! There is, of course, Eq. (36.18), but it is no help because there is no 
direct relation between M and B for a ferromagnetic material like iron. The mag- 
netization M depends on the whole past history of the iron, and not only on what 
B is at the moment. 

All is not lost, though. We can get solutions in certain simple cases. If we 
start out with unmagnetized iron — let’s say with iron that has been annealed at 
high temperatures — then in the simple geometry of the torus, all the iron will have 
the same magnetic history. Then we can say something about M — and therefore 
about the relation between B and H — from experimental measurements. The 
field B in the torus is, from Eq. (36.20), given as a constant times the current / 
in the winding. The field B can be measured by integrating over time the emf in 
the coil (or in an extra coil wound over the magnetizing coil shown in the figure). 
This emf is equal to the rate of change of the flux of B , so the integral of the emf 
with time is equal to B times the cross-sectional area of the torus. 

Figure 36-8 shows the relation between B and H, observed with a torus of 
soft iron. When the current is first turned on, B increases with increasing H along 
the curve a. Note the different scales on B and H\ initially, it takes only a relatively 
small H to make a large B . Why is B so much larger with the iron than it would 
be with air? Because there is a large magnetization M which is equivalent to a 
large surface current on the iron — the field B comes from the sum of this current 
and the conduction current in the winding. Why M should be so large, we will 
discuss later. 

At higher values of H , the magnetization curve levels off. We say that the 
iron saturates. With the scales of our figure, the curve appears to become hori- 
zontal. Actually, it continues to rise slightly — for large fields, B becomes propor- 
tional to H, and with a unit slope. There is no further increase of M. Incidentally, 
we should point out that if the torus were made of some nonmagnetic material, 
M would be zero and B would equal H for all fields. 

The first thing we notice is that curve a in Fig. 36-8 — which is the so-called 
magnetization curve — is highly nonlinear. But it’s worse than that. If, after reaching 
saturation, we decrease the current in the coil to bring H back to zero, the magnetic 
field B falls along curve b. When H reaches zero, there is still some B left. Even 
with no magnetizing current there is a magnetic field in the iron — it has become 
permanently magnetized. If we now turn on a negative current in the coil, the 
B~H curve continues along b until the iron is saturated in the negative direction. 
If we then bring the current back to zero again, B goes along curve c. If we alternate 
the current between large positive and negative values, the B-H curve goes back 
and forth along very nearly the curves b and c. If we vary H in some arbitrary 

36-7 



Fig. 36-7. (a) A torus of iron wound 

with a coil of insulated wire, (b) Cross 
section of torus showing field lines. 



Fig. 36-8. Typical magnetization 
and hysteresis curves for soft iron. 



way, however, we can get more complicated curves which will, in general, lie 
somewhere between the curves b and c. The loop made by repeated oscillation 
of the fields is called a hysteresis loop of the iron. 

We see then that we cannot write a functional relationship like B = /(//), 
because the value of B at any instant depends not only on what H is at that time, 
but on its whole past history. Naturally, the magnetization and hysteresis curves 
are different for different substances. The shape of the curves depends critically on 
the chemical composition of the material, and also on the details of its preparation 
and subsequent physical treatment. We will discuss some of the physical explana- 
tions for these complications in the next chapter. 


36-4 Iron-core inductances 


One of the most important applications of magnetic materials is in electrical 
circuits — for example, in transformers, electric motors, and so on. One reason is 
that with iron we can control where the magnetic fields go, and also get much 
larger fields for a given electric current. For example, the typical “toroidal” 
inductance is made very much like the object shown in Fig. 36-7. For a given in- 
ductance, it can be much smaller in volume and use much less copper than an 
equivalent “air-core” inductance. For a given inductance, we get a much smaller 
resistance in the winding, so the inductance is more nearly “ideal” — particularly 
for low frequencies. It is very easy to understand, qualitatively, how such an 
inductance works. If / is the current in the winding, then the field H which is 
produced in the inside is proportional to /—as given by Eq. (36.20). The voltage 
V across the terminals is related to the magnetic field B. Neglecting the resistance 
of the winding, the voltage V is proportional to dB/dt. The inductance £, which 
is the ratio of T* to dl/dt (see Section 17-7), thus involves the relation between B 
and H in the iron. Since the B is so much bigger than the H , we get a large factor 
in the inductance. Physically, what happens is that a small current in the coil, 
which would ordinarily produce a small magnetic field, causes the little “slave” 
magnets in the iron to line up and produce a tremendously greater “magnetic” 
current than the external current in the winding. It is as if we had a lot more current 
going through the coil than we really have. When we reverse the current, all the 
little magnets flip over — all those internal currents reverse — and we get a much 
higher induced emf than we would get without the iron. If we want to calculate 
the inductance, we can do so through the energy — as described in Section 17-8. 
The rate at which energy is delivered from the current source is IV. The voltage V 
is the cross-sectional area A of the core, times N, times dB/dt . From Eq. (36.20), 
/ = (e 0 c 2 l/N)H. So we have 


dl[ 

dt 


= VI = (e 0 c 2 lA)H ~ • 


Integrating over time, we have 


U = ( e 0 c 2 lA ) / HdB . 


(36.21) 


Notice that IA is the volume of the torus, so we have shown that the energy density 
u ~ U/w ol in a magnetic material is given by 

u = e 0 c 2 / HdB. (36.22) 


An interesting feature is involved here. When we use alternating currents, 
the iron is driven around a hysteresis loop. Since B is not a single-valued function 
of H , the integral of / H dB around one complete cycle is not equal to zero. It 
is the area enclosed inside the hysteresis curve. Thus, the driving source delivers 
a certain net energy each cycle — an energy proportional to the area inside the 
hysteresis loop. And that energy is “lost.” It is lost from the electromagnetic 
goings on, but turns up as heat m the iron. It is called the hysteresis loss. To keep 
such energy losses small, we would like the hysteresis loop to be as narrow as 


36-8 



possible. One way to decrease the area of the loop is to reduce the maximum field 
that is reached during each cycle. For smaller maximum fields, we get a hysteresis 
curve like the one shown in Fig. 36-9. Also, special materials are designed to have 
a very narrow loop. The so-called transformer irons — which are iron alloys with 
a small amount of silicon — have been developed to have this property. 

When an inductance is run over a small hysteresis loop, the relationship 
between B and H can be approximated by a linear equation. People usually write 

B = pH. (36.23) 

The constant p is not the magnetic moment we have used before. It is called the 
permeability of the iron. (It is also sometimes called the “ relative permeability .”) 
The permeability of ordinary irons is typically several thousand. There are special 
alloys alike “supermalloy” which can have permeabilities as high as a million. 

If we use the approximation that B = pH in Eq. (36.21), we can write the 
energy in a toroidal inductance as 

U = (e 0 c 2 /A)n f HdH = ( e 0 c 2 lA (36.24) 


So the energy density is approximately 


u 



pH 2 . 


We can now set the energy of Eq. (36.24) equal to the energy <£/ 2 / 2 of an inductance, 
and solve for £. We get 

£ = ( € 0 c 2 lA)p * 

Using H/I from Eq. (36.20), we have 


= pN 2 A 
" e 0 c 2 / 


(36.25) 


The inductance is proportional to p. If you want inductances for such things as 
audio amplifiers, you will try to operate them on a hysteresis loop where the 
B-H relationship is as linear as possible. (You will remember that we spoke in 
Chapter 50, Vol. I, about the generation of harmonics in nonlinear systems.) 
For such purposes, Eq. (36.23) is a useful approximation. On the other hand, 
if you want to generate harmonics, you may use an inductance which is intention- 
ally operated in a highly nonlinear way. Then you will have to use the complete 
B-H curves, and analyze what happens by graphical or numerical methods. 

A “transformer” is often made by putting two coils on the same torus — or 
core — of a magnetic material. (For the larger transformers, the core is made with 
rectangular proportions for convenience.) Then a varying current in the “primary” 
winding causes the magnetic field in the core to change, which induces an emf in 
the “secondary” winding. Since the flux through each turn of both windings is 
the same, the emf’s in the two windings are in the same ratio as the number of 
turns on each. A voltage applied to the primary is transformed to a different 
voltage at the secondary. Since a certain net current around the core is needed to 
produce the required change in the magnetic field, the algebraic sum of the currents 
in the two windings will be fixed and equal to the required “magnetizing” current. 
If the current drawn from the secondary increases, the primary current must in- 
crease in proportion — there is a “transformation” of currents as well as voltage. 


36-5 Electromagnets 

Now let’s discuss a practical situation which is a little more complicated. 
Suppose we have an electromagnet of the rather standard form shown in Fig. 
36-10 — there is a “C-shaped” yoke of iron, with a coil of many turns of wire 
wrapped around the yoke. What is the magnetic field B in the gap? 


B i 
(gauss) 



Fig. 36-9. A hysteresis loop that 
doesn’t reach saturation. 



Fig. 36-10. An electromagnet. 
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Fig. 36-1 1. Cross section of an electromagnet. 


If the gap thickness is small compared with all the other dimensions, we can, 
as a first approximation, assume that the lines of B will go around through the 
loop, just as they did in the torus They will look more or less as shown in Fig. 
36-1 1(a). They tend to spread out somewhat in the gap, but if the gap is narrow, 
this will be a small effect. It is a fair approximation to assume that the flux of 
B through any cross section of the yoke is a constant If the yoke has a uniform 
cross-sectional area — and if we neglect any edge effects at the gaps or a Hue corners 
— we can say that B is uniform around the yoke. 

Also, B will have the same value in the gap. This follows from Eq. (36.16). 
Imagine the closed surface S , shown in Fig. 36-1 1(b), which has one face in the 
gap and the other in the iron. The total flux of B out of this surface must be zero. 
Calling Z?i the field in the gap and B 2 the field in the iron, we have that 

B — B 2 A 2 = 0* 



Fig. 36-12. Solving for the field in 
an electromagnet. 


Since A x = A 2 (to our approximation), it follows that B x = B 2 . 

Now let’s look at //. We can again use Eq. (36.19), taking the line integral 
around the curve r in Fig. 36-1 1(b). As before, the right-hand side is NI , the 
number of turns times the current. Now, however, H will be different in the iron 
and in the air. Calling H 2 the field in the iron and l 2 the path length around the 
yoke, this part of the curve will contribute the amount H 2 1 2 to the integral. Calling 
H i the field in the gap and l t the gap thickness, we get the contribution H x l x from 
the gap. We have that 

HJx + H 2 l 2 ~ — ~ • (36.26) 

Now we know something else: that in the air gap, the magnetization is negligi- 
ble, so that = H x . Since B x = B 2 , Eq. (36.26) becomes 

B 2 I x + H 2 l 2 — — * (36.27) 

to t 

We still have two unknowns. To find B 2 and H 2 , we need another relationship — 
namely, the one which relates B to H in the iron. 

If we can make the approximation that B 2 = ju// 2 , we can solve the equation 
algebraically. However, let’s do the general case, in which the magnetization curve 
of the iron is one like that shown in Fig. 36-8. What we want is the simultaneous 
solution of this functional relationship together with Eq. (36.27). We can find it 
by plotting a graph of Eq. (36.27) on the same graph with the magnetization curve, 
as is done in Fig. 36-12. Where the two curves intersect, we have our solution. 

For a given current /, the function (36.27) is the straight line marked / > 0 
in Fig. 36-12. The line intersects the //-axis ( B 2 = 0) at H 2 = NI/e {] c 2 l 23 and 
the slope is — / 2 //i. Different currents just shift the line horizontally. From Fig. 
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36-12, we see that for a given current there are several different solutions, depending 
on how you got there. If you have just built the magnet and turned the current 
up to /, the field B 2 (which is also B x ) will have the value given by point a . If 
you have run the current to some very high value and come down to /, the field 
will be given by point b. Or, if you have just had a high negative current in the 
magnet and then come up to /, the field is the one at point c. The field in the gap 
will depend on what you have done in the past. 

When the current in the magnet is zero, the relation between B 2 and H 2 in 
Eq. (36.27) is shown by the line marked / = 0 in the figure. There are still various 
possible solutions. If you have first saturated the iron, there may be a considerable 
residual field in the magnet as given by point d. You can take the coil off, and you 
have a permanent magnet. You can see that for a good permanent magnet, you 
would want a material with a wide hysteresis loop. Special alloys, such as Alnico 
V, have very wide loops. 


36-6 Spontaneous magnetization 

We now turn to the question of why it is that in ferromagnetic materials a 
small magnetic field produces such a large magnetization. The magnetization of 
ferromagnetic materials like iron and nickel comes from the magnetic moment 
of the electrons in the inner shell of the atom. Each electron has a magnetic moment 
p, equal to q/2m times its ^-factor, times its angular momentum /. For a single 
electron with no net orbital motion, g ~ 2, and the component of / in any direc- 
tion — say the z-direction — is ±h/2 y so the component of p along the z-axis is 

m 2 = = 0.928 X 10 -23 arap-m 2 . (36.28) 

In an iron atom, there are actually two electrons that contribute to the ferro- 
magnetism, so to keep the discussion simpler we will talk about nickel, which is 
ferromagnetic like iron but which has only one electron in the inner shell. (It is 
easy to extend the arguments to iron.) 

Now the point is that in the presence of an external field B , the atomic magnets 
tend to line up with the field, but are knocked about by thermal motions just as 
we described for paramagnetic materials. In the last chapter we found out that the 
balance between a magnetic field trying to line up the atomic magnets and the 
thermal motions trying to derange them produced the result that the mean mag- 
netic moment per unit volume will end up as 

M = Nn tanh • (36.29) 

By B a we mean the field acting at the atom, and kT is the Boltzmann energy. 
In the theory of paramagnetism we used for B a just B itself, neglecting the part of 
the field at any given atom contributed by the atoms nearby. In the ferromagnetic 
case, there is a complication. We shouldn’t use the average field in the iron for 
the B a acting on an individual atom. Instead, we must do as we did in the case of 
dielectrics — we have to find the local field acting at a single atom. For an exact 
calculation we should add up the fields at the atom in question contributed by all 
of the other atoms in the crystal lattice. But as we did for dielectrics, we will make 
the approximation that the field at an atom is the same as we would find in a small 
spherical hole in the material — assuming that the moments of the atoms in the 
neighborhood are not changed by the presence of the hole. 

Following the arguments we made in Chapter 11, we might think that we 
could write 

J? llo i,. = B + | (wrong!). 

J C()C“ 

But that is not light. We can , however, make use of the results of Chapter 1 1 if 
we make a careful comparison of the equations of Chapter 1 1 with the equations 
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for ferromagnetism in this chapter. Let’s put together the corresponding equations. 
For regions where there are no conduction currents or charges we have: 


Electrostatics Static ferromagnetism 

V-^£+^ = 0 V - = 0 

V X £ = 0 V X = 0 


(36.30) 


These two sets of equations can be thought of as analogous if we make the follow- 
ing purely mathematical correspondences: 


B - 


M 

e 0 c 2 


E+- 


B. 


This is the same as making the analogy 

E-> if, P 


M/c 2 . 


(36.31) 


In other words, if we write the equations of ferromagnetism as 



V X H = 0, 

they look like the equations of electrostatics. 

This purely algebraic correspondence has led to some confusion in the past. 
People tended to think that H was “ the magnetic field.” But, as we have seen, 
B and E are physically the fundamental fields, and H is a derived idea. So although 
the equations are analogous, the physics is not analogous. However, that doesn’t 
need to stop us from using the principle that the same equations have the same 
solutions. 

We can use our earlier results for the electric field inside of holes of various 
shapes in dielectrics — summarized in Fig. 36-1 — to find the field H inside of 
corresponding holes. Knowing //, we can determine B. For instance (using the 
results we summarized in Section 1), the field H in a needle-shaped hole parallel 
to M is the same as the H in the material, 


^fhole ^material* 

But since M in the hole is zero, we have 

r - n „ 

^hole "material 9 

e 0 c 2 


(36.33) 


On the other hand, for a disc-shaped hole, perpendicular to M, we have 


which translates into 


Or, in terms of B, 


^hole ^dielectric 


Co 


Hy 


hole 


H x 


material 


+ 


M 

e 0 c 2 


B 1 


'hole 


B , 


'material* 


(36.34) 


Finally, for a spherical hole, by making our analogy with Eq. (36.3) we would have 


or 


Hi 


hole 


-^material H - 


M 
3co c 1 


Bi 


hole 


= B 


material 


2 M 

3 e 0 c 2 


This result is quite different from what we got for E. 
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It is, of course, possible to get these results in a more physical way, by using 
the Maxwell equations directly. For example, Eq. (36.34) follows directly from 
V * B = 0. (You use a gaussian surface that is half in the material and half out.) 
Similarly, you can get Eq. (36.33) by using a line integral along a curve that goes 
up inside the hole and returns through the material. Physically, the field in the 
hole is reduced because of the surface currents — which are given by V X M. 
We will leave it for you to show that Eq. (36.35) can also be obtained by considering 
the effects of the surface currents on the boundary of the spherical cavity. 

In finding the equilibrium magnetization from Eq. (36.29), it turns out to be 
most convenient to deal with H; so write 


B & - H + X ~~ • (36.36) 

In the spherical hole approximation, we would have X = ^, but, as you will see, 
we will want later to use some other value, so we leave it as an adjustable parameter. 
Also, we will take all the fields in the same direction so that we won’t need to worry 
about the vector directions. If we were now to substitute Eq. (36.36) into Eq. 
(36.29), we would have one equation that relates the magnetization M to the mag- 
netizing field H : 

M = 

It is, however, an equation that cannot be solved explicitly, so we will do it graph- 
ically. 

Let’s put the problem in a generalized form by writing Eq. (36.29) as 


M 

Afsal 


= tanh x, 


(36.37) 


where A/ Siit is the saturation value of the magnetization, namely, Np, and x represents 
liBJkT. The dependence of M/M s . lt on x is shown by curve a in Fig. 36-13. 
We can also write x as a function of M — using Eq. (36.36) for B a — as 


x = 


pH ( jUXA/ S at\ Af 

W ~ kT^ Wc^kTJ * 


(36.38) 


For any given value of //, this is a straight-line relationship between Af/Af fiat and 
x. The x intercept is at x = pH/kT , and the slope is t^c 2 kT / p\M %A ^ For any 
particular H, we would have a line like the one marked b in Fig. 36-13. The 
intersection of curves a and b gives us the solution for Af/Af sat . We have solved 
the problem. 

Let’s look at how the solutions will go for various circumstances. We start 
with H = 0. There are two possible situations, shown by the lines b\ and b 2 
in Fig. 36-14. You will notice from Eq. (36.38) that the slope of the line is pro- 
portional to the absolute temperature T, So, at high temperatures we would have 
a line like b\. The solution is M/M^t — 0. When the magnetizing field H is zero, 
the magnetization is also zero. But at low temperatures , we would have a line like b 2 , 
and there are two solutions for — one with Af/Af sat = 0 and one with 

near one. It turns out that only the upper solution is stable — as you can 
see by considering small variations about these solutions. 

According to these ideas, then, a magnetic material should magnetize itself 
spontaneously at sufficiently low temperatures. In short, when the thermal motions 
are small enough, the coupling between the atomic magnets causes them all to 
line up parallel to each other — we have a permanently magnetized material anal- 
ogous to the ferroelectrics we discussed in Chapter 11. 

If we start at high temperatures and come down, there is a critical temperature, 
called the Curie temperature T c , where the ferromagnetic behavior suddenly sets in. 
This temperature corresponds to the line b s of Fig. 36-14, which is tangent to the 
curve a , and has, therefore, a slope of 1. The Curie temperature is given by 


e 0 c 2 kT c 
p\Af sa t 


= 1 . 


(36.39) 



Fig. 36-13. A graphical solution of 
Eqs. (36.37) and (36.38). 


M 11 

M sat _ 

HIGH T T C LOW T 



Fig. 36-14. Finding the magnetiza- 
tion when H — 0. 
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We can, if we wish, write Eq. (36.38) more simply in terms of T c as 




, Ic ( M \ 
kT ' T \M S J * 


(36.40) 


Now we want to see what happens for small magnetizing fields H. We can 
see from Fig. 36-14 how things will go if we shift our straight lines a little to the 
right. For the low-temperature case, the intersection point will move out a little 
bit along the low-slope part of curve a , and M will change relatively little. For the 
high-temperature case, however, the intersection point runs up the steep part of 
curve a, and M will change relatively rapidly. In fact, we can approximate this 
part of curve a by a straight line of unit slope, and write: 

M = V = * H + T «( ^ 

M sat * kT ^ T \M S J ' 


Now we can solve for M/M sat : 


M fiH 

M s&t ~ k(T - T c ) 


(36.41) 


We have a law that is something like the one we had for paramagnetism. For 
paramagnetism, we had 


M = }xB 

M s&t kT 


(36.42) 


One difference now is that we have the magnetization in terms of H , which includes 
some of the effects of the interaction of the atomic magnets, but the main difference 
is that the magnetization is inversely proportional to the difference between T 
and T c , instead of to the absolute temperate T, alone. Neglecting the interactions 
between neighboring atoms corresponds to taking X = 0, which from Eq. (36.39) 
means taking T c = 0. Then the results are just what we had in Chapter 35. 

We can check our theoretical picture with the experimental data for nickel. 
It is observed experimentally that the ferromagnetic behavior of nickel disappears 
when its temperature is raised above 631°K. We can compare this with T c cal- 
culated from Eq. (36.39). Remembering that A/ hllt = fxN, we have 


= X 


N t 

ke 0 c 2 


From the density and atomic weight of nickel, we get 


N = 9.1 X 10 28 m“ 3 . 


Taking m from Eq. (36.28), and setting X = we get 

T c = 0.24°K. 


There is a discrepancy of a factor of about 2600! Our theory of ferromagnetism 
fails completely. 

We can try to “patch up” the theory as Weiss did by saying that for some 
unknown reason X is not one-third, but (2600) X 3 — or about 900. It turns out 
that one gets similar values for other ferromagnetic materials like iron. To see 
what this means, let’s go back to Eq. (36.36). We see that a large X means that 
B a , the local field on the atom, appears to be much, much larger than we would 
think. In fact, writing H = B — M/e 0 c 2 , we have 


- B + 


(X - 1 )M 

€ 0 C 2 


According to our original idea — with X = ^ — the local magnetization M reduces 
the effective field B a by the amount — f Af/e 0 . Even if our model of a spherical 
hole were not very good, we would still expect some reduction. Instead, to explain 
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the phenomenon of ferromagnetism, we have to imagine that the magnetization 
of the field enhances the local field by some large factor — like one thousand or 
more. There doesn’t seem to be any reasonable way to manufacture such tremen- 
dous fields at an atom — nor even fields of the proper sign! Clearly, our “magnetic” 
theory of ferromagnetism is a dismal failure. We must conclude, then, that ferro- 
magnetism has to do with some nonmagnetic interaction between the spinning 
electrons in neighboring atoms. This interaction must generate a strong tendency 
for all of the nearby spins to line up in one direction. We will see later that it has 
to do with quantum mechanics and the Pauli exclusion principle. 

Finally, we look at what happens at low temperatures — for T < T c . We 
have seen that there will then be a spontaneous magnetization — even with H = 0 — 
given by the intersection of the curves a and b 2 of Fig. 36-14. If we solve for M 
for various temperatures — by varying the slope of the line b % — we get the theoretical 
curve shown in Fig. 36-15. This curve should be the same for all ferromagnetic 
materials for which the atomic moment comes from a single electron. The curves 
for other materials are only slightly different. 

In the limit, as T goes to absolute zero, M goes to Af hilt . As the temperature 
is increased, the magnetization decreases, falling to zero at the Curie temperature. 
The points in Fig. 36-15 are the experimental observations for nickel. They fit the 
theoretical curve fairly well. Even though we don’t understand the basic mecha- 
nism, the general features of the theory seem to be correct. 

Finally, there is one more disturbing discrepancy in our attempt to under- 
stand ferromagnetism. We have found that above some temperature the material 
should behave like a paramagnetic substance with a magnetization M propor- 
tional to H (or #), and that below that temperature it should become spontane- 
ously magnetized. But that’s not what we found when we measured the mag- 
netization curve for iron. It only became permanently magnetized ajter we had 
“magnetized” it. According to the ideas just discussed, it would magnetize itself! 
What is wrong? Well, it turns out that if you look at a small enough crystal of iron 
or nickel, it is indeed completely magnetized! But in large pieces of iron, there are 
many small regions or “domains” that are magnetized in different directions, so 
that on a large scale the average magnetization appears to be zero. In each small 
domain, however, the iron has a locked-in magnetization with M nearly equal to 
The consequences of this domain structure are that gross properties of 
large pieces of material are quite different from the microscopic properties that 
we have really been treating. We will take up in the next lecture the story of the 
practical behavior of bulk magnetic materials. 



Fig. 36-15. Spontaneous magnetiza- 
tion as a function of temperature for 
nickel. 
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37 


Magnetic Materials 


37-1 Understanding ferromagnetism 

In this chapter we will discuss the behavior and peculiarities of ferromagnetic 
materials and of other strange magnetic materials. Before proceeding to study 
magnetic materials, however, we will review very quickly some of the things about 
the general theory of magnets that we learned in the last chapter. 

First, we imagine the atomic currents inside the material that are responsible 
for the magnetism, and then describe them in terms of a volume current density 
7ma K = V X M, We emphasize that this is not supposed to represent the actual 
currents. When the magnetization is uniform the currents do not really cancel 
out precisely; that is, the whirling currents of one electron in one atom and the 
whirling currents of an electron m another atom do not overlap in such a way 
that the sum is exactly zero. Even within a single atom the distribution of 
magnetism is not smooth. For instance, in an iron atom the magnetization 
is distributed in a more or less spherical shell, not too close to the nucleus and 
not too far away. Thus, magnetism in matter is quite a complicated thing in its 
details; it is very irregular. However, we are obliged now to ignore this detailed 
complexity and discuss phenomena from a gross, average point of view. Then 
it is true that the average current in the interior region, over any finite area that 
is big compared with an atom, is zero when M = 0. So, what we mean by 
magnetization per unit volume and y mag and so on, at the level we are now 
considering, is an average over regions that are large compared with the space 
occupied by a single atom. 

In the last chapter, we also discovered that a ferromagnetic material has the 
following interesting property: above a certain temperature it is not strongly 
magnetic, whereas below this temperature it becomes magnetic. This fact is 
easily demonstrated. A piece of nickel wire at room temperature is attracted by a 
magnet. However, if we heat it above its Curie temperature with a gas flame, it 
becomes nonmagnetic and is not attracted toward the magnet — even when brought 
quite close to the magnet. If we let it lie near the magnet while it cools off, at the 
instant its temperature falls below the critical temperature it is suddenly attracted 
again by the magnet! 

The general theory of ferromagnetism that we will use supposes that the spin 
of the electron is responsible for the magnetization. The electron has spin one-half 
and carries one Bohr magneton of magnetic moment p = pb = q eh/2m. The 
electron spin can be pointed either “up” or “down.” Because the electron has a 
negative charge, when its spin is “up” it has a negative moment, and when its spin 
is “down” it has a positive moment. With our usual conventions, the moment p 
of the electron is opposite its spin. We have found that the energy of orientation 
of a magnetic dipole in a given applied field B is — p * B , but the energy of the 
spinning electrons depends on the neighboring spin alignments as well. In iron, 
if the moment of a nearby atom is “up,” there is a very strong tendency that the 
moment of the one next to it will also be “up.” That is what makes iron, cobalt, 
and nickel so strongly magnetic — the moments all want to be parallel. The first 
question we have to discuss is why . 

Soon after the development of quantum mechanics, it was noticed that there 
is a very strong apparent force — not a magnetic force or any other kind of actual 
force, but only an apparent force — trying to line the spins of nearby electrons 
opposite to one another. These forces are closely related to chemical valence forces. 
There is a principle in quantum mechanics — called the exclusion principle — that 
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two electrons cannot occupy exactly the same state, that they cannot be in exactly 
the same condition as to location and spin orientation.* For example, if they are 
at the same point, the only alternative is to have their spins opposite. So, if there 
is a region of space between atoms where electrons like to congregate (as in a chem- 
ical bond) and we want to put another electron on top of one already there, the 
only way to do it is to have the spin of the second one pointed opposite to the spin 
of the first one. To have the spins parallel is against the law, unless the electrons 
stay away from each other. This has the effect that a pair of parallel-spin electrons 
near to each other have much more energy than a pair of opposite-spin electrons; 
the net effect is as though there were a force trying to turn the spin over. Some- 
times this spin-turning force is called the exchange force , but that only makes it 
more mysterious — it is not a very good term. It is just because of the exclusion 
principle that electrons have a tendency to make their spins opposite. In fact, 
that is the explanation of the lack of magnetism in almost all substances! The 
spins of the free electrons on the outside of the atoms have tremendous tendency 
to balance in opposite directions. The problem is to explain why for materials 
like iron it is just the reverse of what we should expect. 

We have summarized the supposed alignment effect by adding a suitable term 
in the energy equation, by saying that if the electron magnets in the neighborhood 
have a mean magnetization M, then the moment of an electron has a strong 
tendency to be in the same direction as the average magnetization of the atoms in 
the neighborhood. Thus, we may write for the two possible spin orientations,! 


Spin “up” energy 


Spin “down” energy 
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When it was clear that quantum mechanics could supply a tremendous spin- 
orientating force — even if, apparently, of the wrong sign — it was suggested that 
ferromagnetism might have its origin in this same force, that due to the complexi- 
ties of iron and the large number of electrons involved, the sign of the interaction 
energy would come out the other way around. Since the time this was thought of— 
in about 1927 when quantum mechanics was first being understood — many people 
have been making various estimates and semicalculations, trying to get a theoretical 
prediction for X. The most recent calculations of the energy between the two elec- 
tron spins in iron — assuming that the interaction is a direct one between the two 
electrons m neighboring atoms — still give the wrong sign. The present understand- 
ing of this is again to assume that the complexity of the situation is somehow 
responsible and to hope that the next man who makes the calculation with a more 
complicated situation will get the right answer! 

It is believed that the up-spin of one of the electrons in the inside shell, which 
is making the magnetism, tends to make the conduction electrons which fly around 
the outside have the opposite spin. One might expect this to happen because the 
conduction electrons come into the same region as the “magnetic” electrons. Since 
they move around, they can carry their prejudice for being upside down over to 
the next atom; that is, one “magnetic” electron tries to force the conduction elec- 
trons to be opposite, and the conduction electron then makes the next “magnetic” 
electron opposite to it. The double interaction is equivalent to an interaction which 
tries to line up the two “magnetic” electrons. In other words, the tendency to make 
parallel spins is the result of an intermediary that tends to some extent to be op- 
posite to both. This mechanism does not require that the conduction electrons be 
completely “upside down.” They could just have a slight prejudice to be down, 
just enough to load the “magnetic” odds the other way. This is the mechanism that 


* See Chapter 43. 

f We write these equations with H = B — Af/eoc 2 instead of B to agree with the work 
of the last chapter. You might prefer to write U = ± fxB a = ± }i{B + X'M/eo r 2 ), where 
= X — 1 . It’s the same thing. 
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the people who have calculated such things now believe is responsible for ferro- 
magnetism. But we must emphasize that to this day nobody can calculate the 
magnitude of X simply by knowing that the material is number 26 in the periodic 
table. In short, we don’t thoroughly understand it. 

Now let us continue with the theory, and then come back later to discuss a 
certain error involved in the way we have set it up. If the magnetic moment of a 
certain electron is “up,” energy comes both from the external field and also from 
the tendency of the spins to be parallel. Since the energy is lower when the spins 
are parallel, the effect is sometimes thought of as due to an “effective internal 
field.” But remember, it is not due to a true magnetic force; it is an interaction 
that is more complicated. In any case, we take Eqs. (37.1) as the formulas for the 
energies of the two spin states of a “magnetic” electron. At a temperature T , the 
relative probability of these two states is proportional to e “ encrgy/ ^ r , which we 
can write as e ix , with x = p(H + \M/e 0 c 2 )/kT, Then, if we calculate the 
mean value of the magnetic moment, we find (as in the last chapter) that it is 

M = Nfi tanh x. (37.2) 

Now we would like to calculate the internal energy of the material. We note 
that the energy of an electron is exactly proportional to the magnetic moment, 
so that the calculation of the mean moment and the calculation of the mean energy 
are the same — except that in place of \x in Eq. (37.2) we would write —pB, which 
is ~tx(H + \A//e 0 c 2 ). The mean energy is then 

(U) av = -Nn (h + tanh x. 

Now this is not quite correct. The term \M/e 0 c 2 represents interactions of 
all possible pairs of atoms, and we must remember to count each pair only once . 
(When we consider the energy of one electron in the field of the rest and then the 
energy of a second electron in the field of the rest, we have counted part of the 
first energy once more.) Thus, we must divide the mutual interaction term by two, 
and our formula for the energy then turns out to be 

<£/).,„ = -Nh(h+ tanh x. (37.3) 

In the last chapter we discovered an interesting thing — that below a certain 
temperature the material finds a solution to the equations in which the magnetic 
moment is not zero , even with no external magnetizing field. When we set H ~ 0 
in Eq. (37.2), we found that 



where M^ lt = Njjl 9 and T c = fjL\M sllt /k€ 0 c 2 . When we solve this equation 
(graphically or otherwise), we find that the ratio M/M sat as a function of T/T c is 
a curve like that labeled “quantum theory” in Fig. 37-1 . The dashed curve marked 
“cobalt, nickel” shows the experimental results for crystals of these elements. 
The theory and experiment are in reasonably good agreement. The figure also 
shows the result of the classical theory in which the calculation is carried out 
assuming that the atomic magnets can have all possible orientations in space. 
You can see that this assumption gives a prediction that is not even close to the 
experimental facts. 

Even the quantum theory deviates from the observed behavior at both high 
and low temperatures. The reason for the deviations is that we have made a rather 
sloppy approximation in the theory: We have assumed that the energy of an 
atom depends upon the mean magnetization of its neighboring atoms. In other 
words, for each one that is “up” in the neighborhood of a given atom, there will 
be a contribution of energy due to that quantum mechanical alignment effect. 
But how many are there pointed “up”? On the average, that is measured by the 
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Fig. 37-1. The spontaneous magne- 
tization (H = 0) of ferromagnetic crystals 
as a function of temperature. [Permission 
from Encyclopaedia Britannica.] 






magnetization M — but only on the average. A particular atom somewhere might 
find all its neighbors “up.” Then its energy will be larger than the average. Another 
one might find some up and some down, perhaps averaging to zero, and it would 
have no energy from that term, and so on. What we ought to do is to use some more 
complicated kind of average, because the atoms in different places have different 
environments, and the numbers up and down are different for different ones. 
Instead of just taking one atom subjected to the average influence, we should 
take each one in its actual situation, compute its energy, and find the average 
energy . But how do we find out how many are “up” and how many are “down” 
in the neighborhood? That is, of course, just what we are trying to calculate — 
the number “up” and “down” — so we have a very complicated interconnected 
problem of correlations, a problem which has never been solved. It is an intriguing 
and exciting one which has existed for years and on which some of the greatest 
names in physics have written papers, but even they have not completely solved it. 

It turns out that at low temperatures, when almost all the atomic magnets are 
“up” and only a few are “down,” it is easy to solve; and at high temperatures, far 
above the Curie temperature T c when they are almost all random, it is again easy. 
It is often easy to calculate small departures from some simple, idealized situation, 
so it is fairly well understood why there are deviations from the simple theory at 
low temperature. It is also understood physically that for statistical reasons the 
magnetization should deviate at high temperatures. But the exact behavior near 
the Curie point has never been thoroughly figured out. That’s an interesting 
problem to work out some day if you want a problem that has never been solved. 


37-2 Thermodynamic properties 

In the last chapter we laid the groundwork necessary for calculating the 
thermodynamic properties of ferromagnetic materials. These are, naturally, related 
to the internal energy of the crystal, which includes interactions of the various 
spins, given by Eq. (37.3). For the energy of the spontaneous magnetization below 
the Curie point, we can set H ~ 0 in Eq. (37.3), and — noticing that tanh x = 
M/M sat — we find a mean energy proportional to M 2 : 


(t/)av = 


N»\M 2 

2e 0 c 2 M sat 
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Fig. 37-2. The energy per unit vol- 
ume and specific heat of a ferromagnetic 
crystal. 


If we now plot the energy due to the magnetism as a function of temperature, we 
get a curve which is the negative of the square of the curve of Fig. 37-1, as drawn 
in Fig. 37-2(a). If we were to measure then the specific heat of such a material 
we would obtain a curve which is the derivative of 37-2(a). It is shown in Fig. 
37-4 



37-2(b), It rises slowly with increasing temperature, but falls suddenly to zero at 
T = T c . The sharp drop is due to the change in slope of the magnetic energy and 
is reached right at the Curie point. So without any magnetic measurements at 
all we could have discovered that something was going on inside of iron or nickel 
by measuring this thermodynamic property. However, both experiment and 
improved theory (with fluctuations included) suggest that this simple curve is 
wrong and that the true situation is really more complicated. The curve goes 
higher at the peak and falls to zero somewhat slowly. Even if the temperature is 
high enough to randomize the spins on the average , there are still local regions 
where there is a certain amount of polarization, and in these regions the spins still 
have a little extra energy of interaction — which only dies out slowly as things get 
more and more random with further increases in temperature So the actual curve 
looks like Fig. 37-2(c). One of the challenges of theoretical physics today is to 
find an exact theoretical description of the character of the specific heat near the 
Curie transition — an intriguing problem which has not yet been solved. Naturally, 
this problem is very closely related to the shape of the magnetization curve in the 
same region. 

Now we want to describe some experiments, other than thermodynamic ones, 
which show that there is something right about our interpretation of magnetism 
When the material is magnetized to saturation at low enough temperatures, M is 
very nearly equal to M hnt — nearly all the spins are parallel, as well as their mag- 
netic moments. We can check this by an experiment. Suppose we suspend a bar 
magnet by a thin fiber and then surround it by a coil so that we can reverse the 
' magnetic field without touching the magnet or putting any torque on it. This is a 
'4 very difficult experiment because the magnetic forces are so enormous that any 
irregularities, any lopsidedness, or any lack of perfection in the iron will produce 
accidental torques. However, the experiment has been done under careful con- 
ditions in which such accidental torques are minimized. By means of the magnetic 
field from a coil that surrounds the bar, we turn all the atomic magnets over at 
once. When we do this we also change the angular momenta of all the spins from 
“up” to “down” (see Fig. 37-3). If angular momentum is to be conserved when the 
spins all turn over, the rest of the bar must have an opposite change in angular 
momentum. The whole magnet will start to spin. And sure enough, when we do 
the experiment, we find a slight turning of the magnet. We can measure the 
total angular momentum given to the whole magnet, and this is simply N times ft, 
the change in the angular momentum of each spin. The ratio of angular momentum 
to magnetic moment measured this way comes out to within about 10 percent of 
what we calculate. Actually, our calculations assume that the atomic magnets are 
due purely to the electron spin, but there is, in addition, some orbital motion also in 
most materials. The orbital motion is not completely free of the lattice and does 
not contribute much more than a few percent to the magnetism. As a matter of 
fact, the saturation magnetic field that one gets taking Af sat = Nfi and using the 
density of iron of 7.9 and the moment ju of the spinning electron is about 20,000 
gauss. But according to experiment, it is actually in the neighborhood of 21,500 
gauss. This is a typical magnitude of error — 5 or 10 percent — due to neglecting 
the contributions of the orbital moments that have not been included in making 
the analysis. Thus, a slight discrepancy with the gyromagnetic measurements is 
quite understandable. 

37-3 The hysteresis curve 

We have concluded from our theoretical analysis that a ferromagnetic material 
should spontaneously become magnetized below a certain temperature so that 
all the magnetism would be in the same direction. But we know that this is not true 
for an ordinary piece of unmagnetized iron. Why isn’t all iron magnetized? We 
can explain it with the help of Fig. 37-4. Suppose the iron were all a big single 
crystal of the shape shown in Fig. 37-4(a) and spontaneously magnetized all in one 
direction. Then there would be a considerable external magnetic field, which would 
have a lot of energy. We can reduce that field energy if we arrange that one side of 
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Fig. 37-3. When the magnetization 
of a bar of iron is reversed, the bar is 
given some angular velocity. 
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Fig. 37-4. The formation of domains 
in a single crystal of iron. [From Charles 
Kittel, Introduction to Solid State Physics , 
John Wiley and Sons, Inc., New York, 2nd 
ed., 1956.] 



the block is magnetized “up" and the other side magnetized “down," as in Fig. 
37-4(b). Then, of course, the fields outside the iron would extend over less volume, 
so there would be less energy there. 

Ah, but wait! In the layer between the two regions we have up-spinning 
electrons adjacent to down-spinning electrons. But ferromagnetism appears only 
in those materials for which the energy is reduced if the electrons are parallel rather 
than opposite. So, we have added some extra energy along the dotted line in Fig. 
37-4(b); this energy is sometimes called wall energy. A region having only one 
direction of magnetization is called a domain. At the interface — the “wall”— 
between two domains, where we have atoms on opposite sides which are spinning 
in different directions, there is an energy per unit area of the wall. We have de- 
scribed it as though two adjacent atoms were spinning exactly opposite, but it 
turns out that nature adjusts things so that the transition is more gradual. But 
we don’t need to worry about such fine details at this point. 

Now the question is: When is it better or worse to make a wall? The answer 
is that it depends on the size of the domains. Suppose that we were to scale up a 
block so that the whole thing was twice as big. The volume in the space outside 
filled with a given magnetic field strength would be eight times bigger, and the 
energy in the magnetic field, which is proportional to the volume, would also be 
eight times greater. But the surface area between two domains, which will give the 
wall energy, would be only four times as big. Therefore, if the piece of iron is big 
enough, it will pay to split it into more domains. This is why only the very tiny 
crystals can have but a single domain. Any large object — one more than about a 
hundredth of a millimeter in size — will have at least one domain wall; and any 
ordinary, “centimeter-size” object will be split into many domains, as shown in the 
figure. Splitting into domains goes on until the energy needed to put in one extra 
wall is as large as the energy decrease in the magnetic field outside the crystal. 

Actually nature has discovered still another way to lower the energy: It is not 
necessary to have the field go outside at all, if a little triangular region is magnetized 
sideways , as in Fig. 37-4(d).* Then with the arrangement of Figr x 37-4(d) we see 
that there is no external field, but instead only a little more domain wall. 

But that introduces a new kind of problem. It turns out that when a single 
crystal of iron is magnetized, it changes its length in the direction of magnetization, 
so an “ideal” cube with its magnetization, say, “up,” is no longer a perfect cube. 
The “vertical” dimension will be different from the “horizontal” dimension. This 
effect is called magnetostriction. Because of such geometric changes, the little 
triangular pieces of Fig. 37-4(d) do not, so to speak, “fit” into the available space 
anymore — the crystal has got too long one way and too short the other way. Of 
course, it does fit, really, but only by being squashed in; and this involves some 
mechanical stresses. So, this arrangement also introduces an extra energy. It 
is the balance of all these various energies which determines how the domains 
finally arrange themselves in their complicated fashion in a piece of unmagnetized 
iron. 

Now, what happens when we put on an external magnetic field? To take a 
simple case, consider a crystal whose domains are as shown in Fig. 37-4(d). If 
we apply an external magnetic field in the upward direction, in what manner does 
the crystal become magnetized? First, the middle domain wall can move over 
sideways (to the right) and reduce the energy. It moves over so that the region which 
is “up” becomes bigger than the region which is “down”. There are more elemen- 
tary magnets lined up with the field, and this gives a lower energy. So, for a piece 
of iron in weak fields — at the very beginning of magnetization — the domain walls 
begin to move and eat into the regions which are magnetized opposite to the field. 
As the field continues to increase, a whole crystal shifts gradually into a single 


* You may be wondering how spins that have to be either “up” or “down” can also 
be “sideways”! That’s a good question, but we won’t worry about it right now. We’ll 
simply adopt the classical point of view, thinking of the atomic magnets as classical 
dipoles which can be polarized sideways. Quantum mechanics requires considerable 
expertness to understand how things can be quantized both “up-and-down,” and “right- 
and-left,” all at the same time. 
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large domain which the external field helps to keep lined up. In a strong field the 
crystal “likes” to be all one way just because its energy in the applied field is reduced 
—it is no longer merely the crystal’s own external field which matters. 

What if the geometry is not so simple? What if the axes of the crystal and its 
spontaneous magnetization are in one direction, but we apply the magnetic field 
in some other direction — say at 45° ? We might think that domains would reform 
themselves with their magnetization parallel to the field, and then as before, they 
could all grow into one domain. But this is not easy for the iron to do, for the 
energy needed to magnetize a crystal depends on the direction of magnetization 
relative to the crystal axis . It is relatively easy to magnetize iron in a direction 
parallel to the crystal axes, but it takes more energy to magnetize it in some other 
direction — like 45° with respect to one of the axes. Therefore, if we apply a mag- 
netic field in such a direction, what happens first is that the domains which point 
along one of the preferred directions which is near to the applied field grow until 
the magnetization is all along one of these directions. Then with much stronger 
fields , the magnetization is gradually pulled around parallel to the field, as sketched 
in Fig. 37-5. 

In Fig. 37-6 are shown some observations of the magnetization curves of 
single crystals of iron. To understand them, we must first explain something about 
the notation that is used in describing directions in a crystal. There are many 
ways in which a crystal can be sliced so as to produce a face which is a plane of 
atoms. Everyone who has driven past an orchard or vineyard knows this — it is 
fascinating to watch. If you look one way, you see lines of trees — if you look an- 
other way, you see different lines of trees, and so on. In a similar way, a crystal 
has definite families of planes that hold many atoms, and the planes have this 
important characteristic (we consider a cubic crystal to make it easier): If we 
observe where the planes intersect the three coordinate axes — we find that the 
reciprocals of the three distances from the origin are in the ratio of simple whole 
numbers. These three whole numbers are taken as the definition of the planes. 
For example, in Fig. 37-7(a), a plane parallel to the _yz-plane is shown. This is 
called a [100] plane; the reciprocals of its intersection of the y- and z-axes are both 
zero. The direction perpendicular to such a plane (in a cubic crystal) is given the 
same set of numbers. It is easy to understand the idea in a cubic crystal, for then 
the indices [100] mean a vector which has a unit component in the ^-direction and 
none in the y- or z-directions. The [110] direction is in a direction 45° from the 
and >>-axes, as in Fig. 37-7(b); and the [111] direction is in the direction of the 
cube diagonal, as in Fig. 37-7(c). 
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Fig- 37-5. A magnetizing field H at 
an angle with respect to the crystal axis 
will gradually change the direction of the 
magnetization without changing its magni- 
tude. 



Fig. 37-6. The component of M par- 
allel to H, for different directions of H 
(with respect to the crystal axes). [From 
F. Bitter, Introduction to Ferromagnetism, 
McGraw-Hill Book Co., Inc., 1937.] 


Returning now to Fig. 37-6, we see the magnetization curves of a single 
crystal of iron for various directions. First, note that for very tiny fields — so weak 
that it is hard to see them on the scale at all — the magnetization increases extremely 
rapidly to quite large values. If the field is in the [100] direction — namely along 
one of those nice, easy directions of magnetization — the curve goes up to a high 
value, curves around a little, and then is saturated. What happened is that the 
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Fig. 37-7 . The way the crystal planes are labeled. 


domains which were already there are very easily removed. Only a small field is 
required to make the domain walls move and eat up all of the “wrong-way” 
domains. Single crystals of iron are enormously permeable (magnetic sense), 
much more so than ordinary polycrystalline iron. A perfect crystal magnetizes 
extremely easily. Why is it curved at all? Why doesn’t it just go right up to satura- 
tion? We are not sure. You might study that some day. We do understand why it 
is flat for high fields. When the whole block is a single domain, the extra magnetic 
field cannot make any more magnetization — it is already at AT Hat , with all the elec- 
trons lines up. 

Now, if we try to do the same thing in the [110] direction — which is at 45° 
to the crystal axes — what will happen? We turn on a little bit of field and the 
magnetization leaps up as the domains grow. Then as we increase the field some 
more, we find that it takes quite a lot of field to get up to saturation, because 
now the magnetization is turning away from an “easy” direction. If this explanation 
is correct, the point at which the [110] curve extrapolates back to the vertical axis 
should be at \/\/2 of the saturation value. It turns out, in fact, to be very, very 
close to l/\/2. Similarly, in the [1 1 1] direction — which is along the cube diagonal 
— we find, as we would expect, that the curve extrapolates back to nearly l/\/3 
of saturation. 

Figure 37-8 shows the corresponding situation for two other materials, nickel 
and cobalt. Nickel is different from iron. In nickel, it turns out that the [111] 
direction is the easy direction of magnetization. Cobalt has a hexagonal crystal 
form, and people have botched up the system of nomenclature for this case. They 
want to have three axes on the bottom of the hexagon and one perpendicular to 
these, so they have used four indices. The [0001] direction is the direction of the 
axis of the hexagon, and [1010] is perpendicular to that axis. We see that crystals 
of different metals behave in different ways. 

Now we must discuss a polycrystalline material, such as an ordinary piece of 
iron. Inside such materials there are many, many little crystals with their crystal- 
line axes pointing every which way. These are not the same as domains . Remember 
that the domains were all part of a single crystal , but in a piece of iron there are 


Fig. 37-8. Magnetization curves for 
single crystals of iron, nickel, and cobalt. 
[From Charles Kittel, Introduction to Solid 
State Physics , John Wiley and Sons, Inc., 
New York, 2nd ed., 1 956.] 
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many different crystals with axes at different orientations, as shown in Fig. 37-9. 
Within each of these crystals, there will also generally be some domains. When 
we apply a small magnetic field to a piece of polycrystalline materia], what happens 
is that the domain walls begin to move, and the domains which have a favorable 
direction of easy magnetization grow larger. This growth is reversible so long as 
the field stays very small — if we turn the field off, the magnetization will return to 
zero. This part of the magnetization curve is marked a in Fig. 37-10. 

For larger fields — in the region b of the magnetization curve shown — things 
get much more complicated. In every small crystal of the material, there are strains 
and dislocations; there are impurities, dirt, and imperfections. And at all but the 
smallest fields, the domain wall, in moving, gets stuck on these. There is an inter- 
action energy between the domain wall and a dislocation, or a grain boundary, 
or an impurity. So when the wall gets to one of them, it gets stuck; it sticks there 
at a certain field. But then if the field is raised some more, the wall suddenly snaps 
past. So the motion of the domain wall is not smooth the way it is in a perfect 
crystal — it gets hung up every once in a while and moves in jerks. If we were to 
look at the magnetization on a microscopic scale, we would see something like the 
insert of Fig. 37-10. 

Now the important thing is that these jerks in the magnetization can cause an 
energy loss. In the first place, when a boundary finally slips past an impediment, 
it moves very quickly to the next one, since the field is already above what would 
be required for the unimpeded motion. The rapid motion means that there are 
rapidly changing magnetic fields which produce eddy currents in the crystal. These 
currents loose energy in heating the metal. A second effect is that when a domain 
suddenly changes, part of the crystal changes its dimensions from the magneto- 
striction. Each sudden shift of a domain wall sets up a little sound wave that carries 
away energy. Because of such effects, the second part of magnetization curve 
is irreversible , and there is energy being lost. This is the origin of the hysteresis 
effect, because to move a boundary wall forward — snap — and then to move it back- 
ward — snap — produces a different result. It’s like “jerky” friction, and it takes 
energy. 

Eventually, for high enough fields, when we have moved all the domain walls 
and magnetized each crystal in its best direction, there are still some crystallites 
which happen to have their easy directions of magnetization not in the direction 
of our external magnetic field. Then it takes a lot of extra field to turn those 
magnetic moments around. So the magnetization increases slowly, but smoothly, 
for high fields — namely in the region marked c in the figure. The magnetization 
does not come sharply to its saturation value, because in the last part of the curve 
the atomic magnets are turning in the strong field. So we see why the magnetization 
curve of an ordinary polycrystalline materials, such as the one shown in Fig. 37-10, 
rises a little bit and reversibly at first, then rises irreversibly, and then curves over 
slowly. Of course, there is no sharp break-point between the three regions — they 
blend smoothly, one into the other. 

It is not hard to show that the magnetization process in the middle part of the 
magnetization curve is jerky — that the domain walls jerk and snap as they shift 
All you need is a coil of wire — with many thousands of turns — connected to an 
amplifier and a loudspeaker, as shown in Fig. 37-1 1. If you put a few silicon steel 
sheets (of the type used in transformers) at the center of the coil and bring a bar 
magnet slowly near the stack, the sudden changes in magnetization will produce 
impulses of emf in the coil, which are heard as distinct clicks in the loudspeaker. 
As you move the magnet nearer to the iron you will hear a whole rush of clicks 
that sound something like the noise of sand grains falling over each other as a 
can of sand is tilted. The domain walls are jumping, snapping, and jiggling as the 
field is increased. This phenomenon is called the Barkhausen effect. 

As you move the magnet even closer to the iron sheets, the noise grows louder 
and louder for a while but then there is relatively little noise when the magnet gets 
very close. Why? Because nearly all the domain walls have moved as far as they 
can go. Any greater field is merely turning the magnetization in each domain, 
which is a smooth process. 



Fig. 37—9. The microscopic structure 
of an unmagnetized ferromagnetic ma- 
terial. Each crystal grain has an easy 
direction of magnetization and is broken 
up into domains which are spontaneously 
magnetized (usually) parallel to this 
direction. 



Fig. 37-10. The magnetization curve 
for polycrystalline iron. 



Fig. 37—11. The sudden changes in 
the magnetization of the steel strip are 
heard as clicks in the loudspeaker. 
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If you now withdraw the magnet, so as to come back on the downward branch 
of the hysteresis loop, the domains all try to get back to low energy again, and you 
hear another rush of backward-going jerks. You can also note that if you bring 
the magnet to a given place and move it back and forth a little bit, there is relatively 
little noise. It is again like tilting a can of sand — once the grains shift into place, 
small movements of the can don't disturb them. In the iron the small variations 
in the magnetic field aren’t enough to move any boundaries over any of the 
“humps.” 


37-4 Ferromagnetic materials 



Fig 37-12. The hysteresis curve of 
Alnico V. 


Now we would like to talk about the various kinds of magnetic materials that 
there are in the technical world and to consider some of the problems involved in 
designing magnetic materials for different purposes. First, the term “the magnetic 
properties of iron," which one often hears, is a misnomer — there is no such thing. 
“Iron” is not a well-defined material — the properties of iron depend critically on 
the amount of impurities and also on how the iron is formed. You can appreciate 
that the magnetic properties will depend on how easily the domain walls move and 
that this is a gross property, not a property of the individual atoms. So practical 
ferromagnetism is not really a property of an iron atom — it is a property of solid 
iron m a certain form. For example, iron can take on two different crystalline 
forms. The common form has a body-centered cubic lattice, but it can also have 
a face-centered cubic lattice, which is, however, stable only at temperatures above 
1100°C. Of course, at that temperature the body-centered cubic structure is 
already past the Curie point. However, by alloying chromium and nickel with 
the iron (one possible mixture is 18 percent chromium and 8 percent nickel) we 
can get what is called stainless steel, which, although it is mainly iron, retains the 
face-centered lattice even at low temperatures. Because its crystal structure is 
different, it has completely different magnetic properties. Most kinds TT^tainless 
steel are not magnetic to any appreciable degree, although there are some kinds 
which are somewhat magnetic — it depends on the composition of the alloy. Even 
when such an alloy is magnetic, it is not /mx>magnetic like ordinary iron — even 
though it is mostly just iron. 

We would like now to describe a few of the special materials which have been 
developed for their particular magnetic properties. First, if we want to make a 
permanent magnet, we would like material with an enormously wide hysteresis 
loop so that, when we turn the current off and come down to zero magnetizing 
field, the magnetization will remain large. For such materials the domain bounda- 
ries should be “frozen” m place as much as possible One such material is the re- 
markable alloy “Alnico V” (51% Fe, 8% Al, 14% Ni, 24% Co, 3% Cu). (The 
rather complex composition of this alloy is indicative of the kind of detailed effort 
that has gone into making good magnets. What patience it takes to mix five things 
together and test them until you find the most ideal substance!) When Alnico 
solidifies, there is a “second phase” which precipitates out, making many tiny grains 
and very high internal strains. In this material, the domain boundaries have a 
hard time moving at all. In addition to having a precise composition, Alnico is 
mechanically “worked” in a way that makes the crystals appear in the form of 
long grains along the direction in which the magnetization is going to be. Then 
the magnetization will have a natural tendency to be lined up in these directions 
and will be held there from the anisotropic effects. Furthermore, the material is 
even cooled in an external magnetic field when it is manufactured, so that the grains 
will grow with the right crystal orientation. The hysteresis loop of Alnico V is 
shown in Fig 37-12. You see that it is about 500 times wider than the hysteresis 
curve for soft iron that we showed in the last chapter in Fig. 36-8. 

Let’s turn now to a different kind of material. For building transformers and 
motors, we want a material which is magnetically “soft” — one in which the mag- 
netism is easily changed so that an enormous amount of magnetization results 
from a very small applied field. To arrange this, we need pure, well-annealed 
material which will have very few dislocations and impurities so that the domain 
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walls can move easily. It would also be nice if we could make the anisotropy 
small. Then, even if a grain of the material sits at the wrong angle with respect to 
the field, it will still magnetize easily. Now we have said that iron prefers to mag- 
netize along the [100] direction, whereas nickel prefers the [111] direction; so if 
we mix iron and nickel in various proportions, we might hope to find that with 
just the right proportions the alloy wouldn't prefer any direction — the [100] and 
[1 11] directions would be equivalent. It turns out that this happens with a mixture 
of 70 percent nickel and 30 percent iron. In addition — possibly by luck or maybe 
because of some physical relationship between the anisotropy and the magneto- 
striction effects — it turns out that the magnetostriction of iron and nickel has the 
opposite sign. And in an alloy of the two metals, this property goes through zero 
at about 80 percent nickel. So somewhere between 70 and 80 percent nickel we get 
very “soft” magnetic materials — alloys that are very easy to magnetize. They are 
called the permalloys . Permalloys are useful for high-quality transformers (at low 
signal levels), but they would be no good at all for permanent magnets. Perm- 
alloys must be very carefully made and handled. The magnetic properties of a 
piece of permalloy are drastically changed if it is stressed beyond its elastic limit — it 
mustn’t be bent. Then, its permeability is reduced because of the dislocations, slip 
bands, and so on, which are produced by the mechanical deformations. The 
domain boundaries are no longer easy to move. The high permeability can, how- 
ever, be restored by annealing at high temperatures. 

It is often convenient to have some numbers to characterize the various 
magnetic materials. Two useful numbers are the intercepts of the hysteresis loop 
with the B- and H- axes, as indicated in Fig. 37-12. These intercepts are called the 
remanent magnetic jield B r and the coercive force H t . In Table 37-1 we list these 
numbers for a few magnetic materials. 


Table 37-1 


Properties of some ferromagnetic materials 



B , 

H r 


Residual 

Coercive 


magnetic 

force 


field 

(gauss) 

Material 

(gauss) 


Supermalloy 
Silicon steel 

(-5000) 

0 004 

(transformer) 

12,000 

0 05 

Armco iron 

400U 

0 6 

Alnico V 

13,000 

550. 


(a) (b) 


t f ♦ 

I 1 

T ll I I 

(c) (d) 


Fig. 37-13. Relative orientation of 
electron spins in various materials: (a) 
ferromagnetic, (b) antiferromagnetic, (c) 
ferrite, (d) yttrium-iron alloy. (Broken 
arrows show direction of total angular 
momentum, including orbital motion.) 


37-5 Extraordinary magnetic materials 

We would now like to discuss some of the more exotic magnetic materials. 
There are many elements in the periodic table which have incomplete inner electron 
shells and hence have atomic magnetic moments For instance, right next to the 
ferromagnetic elements iron, nickel, and cobalt you will find chromium and manga- 
nese. Why aren’t they ferromagnetic 9 The answer is that the X term in Eq. (37.1) 
has the opposite sign for these elements. In the chromium lattice, for example, the 
spins of the chromium atoms alternate atom by atom , as shown in Fig. 37-1 3(b). 
So chromium is “magnetic” from its own point of view, but it is not technically 
interesting because there are no external magnetic effects. Chromium, then, is an 
example of a material in which quantum mechanical effects make the spins alter- 
nate. Such a material is called antiferromagnetic . The alignment in antiferromag- 
netic materials is also temperature dependent. Below a critical temperature, all 
the spins are lined up in the alternating array, but when the material is heated above 
a certain temperature — which is again called the Curie temperature — the spins 
suddenly become random. There is, internally, a sudden transition. This transition 
can be seen in the specific heat curve. Also it shows up in some special “magnetic” 
effects. For instance, the existence of the alternating spins can be verified by scatter- 
ing neutrons from a crystal of chromium. Because a neutron itself has a spin 
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Fig. 37-14. Crystal structure of the 
mineral spinel (MgAI 2 0 4 ); the Mg +2 ions 
occupy tetrahedral sites, each surrounded 
by four oxygen ions; the Al +3 ions occupy 
octahedral sites, each surrounded by six 
oxygen ions. [From Charles Kittel, Intro- 
duction to Solid State Physics, John Wiley 
and Sons, Inc , New York, 2nd ed., 1 956 ] 


(and a magnetic moment), it has a different amplitude to be scattered, depending on 
whether its spin is parallel or opposite to the spin of the scatterer. Thus, we get a 
different interference pattern when the spins in a crystal are alternating than we 
do when they have a random distribution 

There is another kind of substance in which quantum mechanical effects make 
the electron spins alternate, but which is nevertheless ferromagnetic — that is, the 
crystal has a permanent net magnetization. The idea behind such materials is 
shown in Fig. 37-14. The figure shows the crystal structure of spinel , a magnesium- 
aluminum oxide, which — as it is shown — is not magnetic. The oxide has two kinds 
of metal atoms: magnesium and aluminum. Now if we replace the magnesium 
and the aluminum by two magnetic elements like iron and zinc, or by zinc and 
manganese — in other words, if we put in magnetic atoms instead of the nonmagnetic 
ones — an interesting thing happens. Let’s call one kind of metal atom a and the 
other kind of metal atom b\ then the following combination of forces m^4)e 
considered. There is an a-b interaction which tries to make the a atonic and the 
b atoms have opposite spins — because quantum mechanics always gives the oppo- 
site sign (except for the mysterious crystals of iron, nickel, and cobalt). Then, 
there is a direct a-a interaction which tries to make the a's opposite, and also a 
b-b interaction which lues to make the 5’ s opposite. Now, of course we cannot 
have everything opposite everything else — a opposite 6, a opposite a , and b op- 
posite b Presumably because of the distances between the a' s and the presence of 
the oxygen (although we really don’t know why), it turns out that the a-b inter- 
action is stronger than the a-a or the b-b . So the solution that nature uses in this 
case is to make all the a's parallel to each other , and all the 5’s parallel to each other , 
but the two systems opposite That gives the lowest energy because of the stronger 
a-b interaction. The result: all the a'§ are spinning up and all the 5’ s are spinning 
down — or vice versa, of course. But if the magnetic moments of the <Mype atom 
and the 5-type atom are not equal , we can get the situation shown in Fig. 37-1 3(c), 
and there can be a net magnetization in the material. The material will then be 
ferromagnetic — although somewhat weak Such materials are called ferrites . 
They do not have as high a saturation magnetization as iron — for obvious reasons 
— so they are only useful for smaller fields. But they have a very important differ- 
ence — they are insulators; the ferrites are ferromagnetic insulators . In high- 
frequency fields, they will have very small eddy currents and so can be used, for 
example, in microwave systems. The microwave fields will be able to get inside 
such an insulating material, whereas they would be kept out by the eddy currents 
in a conductor like iron. 

There is another class of magnetic materials which has only recently been 
discovered — members of the family of the orthosilicates called garnets . They are 
again crystals in which the lattice contains two kinds of metallic atoms, and we 
have again a situation in which two kinds of atoms can be substituted almost at 
will. Among the many compounds of interest there is one which is completely 
ferromagnetic. Tt has yttrium and iron in the garnet structure, and the reason it is 
ferromagnetic is very curious. Here again quantum mechanics is making the 
neighboring spins opposite, so that there is a locked-in system of spins with the 
electron spins of the iron one way and the electron spins of the yttrium the opposite 
way But the yttrium atom is complicated. It is a rare-earth element and gets a 
large contribution to its magnetic moment from orbital motion of the electrons. 
For yttrium, the orbital motion contribution is opposite that of the spin and also 
is bigger. Thus, although quantum mechanics, working through the exclusion 
principle, makes the spins of the yttrium opposite those of the iron, it makes the 
total magnetic moment of the yttiium atom parallel to the iron because of the 
orbital effect — as sketched in Fig. 37- 13(d) The compound is therefore a regular 
ferromagnet. 

Another interesting example of ferromagnetism occurs in some of the rare- 
earth elements. It has to do with a still more peculiar arrangement of the spins. 
The material is not ferromagnetic in the sense that the spins are all parallel, nor is 
it antiferromagnetic in the sense that every atom is opposite. In these crystals all 
of the spins in one layer are parallel and lie in the plane of the layer. In the next 
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layer all spins are again parallel to each other, but point in a somewhat different 
direction. In the following layer they are in still another direction, and so on. 
The result is that the local magnetization vector varies in the form of a spiral — the 
magnetic moments of the successive layers rotate as we proceed along a line 
perpendicular to the layers. It is interesting to try to analyze what happens when a 
field is applied to such a spiral — all the twistings and turnings that must go on in 
all those atomic magnets. (Some people like to amuse themselves with the theory 
of these things!) Not only are there cases of “flat” spirals, but there are also cases 
in which the directions the magnetic moments of successive layers map out a cone, 
so that it has a spiral component and also a uniform ferromagnetic component 
in one direction! 

The magnetic properties of materials, worked out on a more advanced level 
than we have been able to do here, have fascinated physicists of all kinds. In the 
first place, there are those practical people who love to work out ways of making 
things in a better way — they love to design better and more interesting magnetic 
materials. The discovery of things like ferrites, or their application, immediately 
delights people who like to see clever new ways of doing things. Besides this, 
there are those who find a fascination in the terrible complexity that nature can 
produce using a few basic laws. Starting with one and the same general idea, 
nature goes from the ferromagnetism of iron and its domains, to the antiferro- 
magnetism of chromium, to the magnetism of ferrites and garnets, to the spiral 
structure of the rare earth elements, and on, and on. It is fascinating to discover 
experimentally all the strange things that go on in these special substances. Then, 
to the theoretical physicists, ferromagnetism presents a number of very interesting, 
unsolved, and beautiful challenges. One challenge is to understand why it exists 
at all. Another is to predict the statistics of the interacting spins in an ideal lattice. 
Even neglecting any possible extraneous complications, this problem has, so far, 
defied full understanding. The reason that it is so interesting is that it is such an 
easily stated problem: Given a lot of electron spins in a regular lattice, interacting 
with such-and-such a law, what do they do? It is simply stated, but it has defied 
complete analysis for years. Although it has been analyzed rather carefully for 
temperatures not too close to the Curie point, the theory of the sudden transition 
St the Curie point still needs to be completed. 

Finally, the whole subject of the system of spinning atomic magnets — in 
ferromagnetic, or in paramagnetic materials and in nuclear magnetism, has also 
been a fascinating thing to advanced students in physics. The system of spins can 
be pushed on and pulled on with external magnetic fields, so one can do many 
tricks with resonances, with relaxation effects, with spin-echoes, and with other 
effects. It serves as a prototype of many complicated thermodynamic systems. 
But in paramagnetic materials the situation is often fairly simple, and people 
have been delighted both to do experiments and to explain the phenomena theo- 
retically. 

We now close our study of electricity and magnetism. In the first chapter, 
we spoke of the great strides that have been made since the early Greek observation 
of the strange behaviors of amber and of lodestone. Yet in all our long and in- 
volved discussion we have never explained why ii is ihai when we rub a piece of 
amber we get a charge on it, nor have we explained why a lodestone is magnetized f 
You may say, “Oh, we just didn’t get the right sign.” No, it is worse than that 
Even if we did get the right sign, we would still have the question: Why is the piece 
of lodestone in the ground magnetized? There is the earth’s magnetic field, of 
course, but where does the earth's field come froml Nobody really knows — there 
have only been some good guesses. So you see, this physics of ours is a lot of 
fakery — we start out with the phenomena of lodestone and amber, and we end up 
not understanding either of them very well. But we have learned a tremendous 
amount of very exciting and very practical information in the process! 
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ElaHtirity 


38-1 Hooke’s law 


The subject of elasticity deals with the behavior of those substances which 
have the property of recovering their size and shape when the forces producing 
deformations are removed. We find this elastic property to some extent in all 
solid bodies. If we had the time to deal with the subject at length, we would want 
to look into many things: the behavior of materials, the general laws of elasticity, 
the general theory of elasticity, the atomic machinery that determine the elastic 
properties, and finally the limitations of elastic laws when the forces become so 
great that plastic flow and fracture occur. It would take more time than we have 
to cover all these subjects in detail, so we will have to leave out some things 
For example, we will not discuss plasticity or the limitations of the elastic laws. 
(We touched on these subjects briefly when we were talking about dislocations in 
metals.) Also, we will not be able to discuss the internal mechanisms of elasticity — 
so our treatment will not have the completeness we have tried to achieve in the 
earlier chapters. Our aim is mainly to give you an acquaintance with some of the 
ways of dealing with such practical problems as the bending of beams. 

When you push on a piece of material, it “gives” — the material is deformed. 
If the force is small enough, the relative displacements of the various points in the 
material are proportional to the force — we say the behavior is elastic . We will 
discuss only the elastic behavior. First, we will write down the fundamental laws 
of elasticity, and then we will apply them to a number of different situations 

Suppose we take a rectangular block of material of length /, width w 3 and 
height /?, as shown in Fig. 38-1. If we pull on the ends with a force F, then the 
length increases by an amount A/, We will suppose in all cases that the change m 
length is a small fraction of the original length. As a matter of fact, for materials 
like wood and steel, the material will break if the change in length is more than a 
few percent of the original length. For a large number of materials, experiments 
show that for sufficiently small extensions the force is proportional to the extension 

F oc A/. (38.1) 

This relation is known as Hooke's law. 

The lengthening A/ of the bar will also depend on its length. We can figure out 
how by the following argument. If we cement two identical blocks together, end 
to end, the same forces act on each block , each will stretch by A/. Thus, the stretch 
of a block of length 21 would be twice as big as a block of the same cross section, 
but of length /. In order to get a number more characteristic of the material, and 
less of any particular shape, we choose to deal with the ratio A/// of the extension 
to the original length. This ratio is proportional to the force but independent of /: 


38-1 Hooke’s law 

38-2 Uniform strains 

38-3 The torsion bar; shear waves 

38-4 The bent beam 

38-5 Buckling 

Review: Chapter 47, Vol. I, Sound; 
the Wave Equation. 


L^W + A* 



t + &e 


Fig. 38-1. The stretching of a bar 
under uniform Tension. 


F oc 


A / 
/ 


(38.2) 


The force F will also depend on the area of the block. Suppose that we put 
two blocks side by side. Then for a given stretch A / we would have the force F 
on each block, or twice as much on the combination of the two blocks. The force, 
for a given amount of stretch, must be proportional to the cross-sectional area A 
of the block. To obtain a law in which the coefficient of proportionality is inde- 
pendent of the dimensions of the body, we write Hooke’s law for a rectangular 
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block in the form 


F = YA — ■ (38.3) 

The constant Y is a property only of the nature of the material; it is known as 
Young's modulus . (Usually you will see Young’s modulus called E. But we’ve 
used E for electric fields, energy, and emf’s, so we prefer to use a different letter.) 

The force per unit area is called the stress, and the stretch per unit length— the 
fractional stretch — is called the strain. Equation (38.3) can therefore be rewritten 
in the following way: 

F Al 

S = Y X ~ > (38.4) 

A l 


Stress = (Young’s modulus) X (Strain). 



Fig. 38-2. A bar 
hydrostatic pressure. 


under uniform 



Fig 38-3. Hydrostatic pressure is 
the superposition of three longitudinal 
compressions. 


There is another part to Hooke’s law: When you stretch a block of material 
in one direction it contracts at right angles to the stretch. The contraction in 
width is proportional to the width w and also to A///. The sideways contraction is 
in the same proportion for both width and height, and is usually written 

^’ = ^= (38.5) 

w h 1 

where the constant a is another property of the material called Poisson's ratio. It is 
always positive in sign and is a number less than 1/2. (It is “reasonable” that o' 
should be generally positive, but it is not quite clear that it must be so.) 

The two constants Y and cr specify completely the elastic properties of a ho - 
mogeneous ’ isotropic (that is, noncrystalline) material. In crystalline materials the 
stretches and contractions can be different in different directions, so there can be 
many more elastic constants. We will restrict our discussion temporarily to homo- 
geneous’ isotropic materials whose properties can be described by Y and cr. As usual 
there are different ways of describing things — some people like to describe the 
elastic properties of materials by different constants. It always takes two, and 
they can be related to cr and Y. 

The last general law we need is the principle of superposition. Since the two 
laws (38 4 ) and (38.5) are linear in the forces and in the displacements, superposition 
will work. If you have one set of forces and get some displacements, and then 
you add a new set of forces and get some additional displacements, the resulting 
displacements will be the sum of the ones you would get with the two sets of forces 
acting independently. 

Now we have all the general principles — the superposition principle and Eqs. 
(38.4) and (38.5) — and that’s all there is to elasticity. But that is like saying that 
once you have Newton’s laws that’s all there is to mechanics. Or, given Maxwell’s 
equations, that’s all there is to electricity. It is, of course, true that with these 
principles you have a great deal, because with your present mathematical ability 
you could go a long way. We will, however, work out a few special applications. 


38-2 Uniform strains 

As our first example let’s find out what happens to a rectangular block under 
uniform hydrostatic pressure Let’s put a block under water in a pressure tank. 
Then there will be a force acting inward on every face of the block proportional 
to the area (see Fig. 38-2). Since the hydrostatic pressure is uniform, the stress 
(force per unit area) on each face of the block is the same. We will work out first 
the change in the length. The change m length of the block can be thought of as 
the sum of changes in length that would occur in the three independent problems 
which are sketched in Fig. 38-3. 
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Problem L If we push on the ends of the block with a pressure p , the com- 
pressional strain is p/Y, and it is negative, 

A/i = p 

l Y * 


Problem 2. If we push on the two sides of the block with pressure p , the com- 
pressional strain is again p/ Y, but now we want the lengthwise strain. We can get 
that from the sideways strain multiplied by —a. The sideways strain is 


so 


Aw 

w 


P_ . 
Y ’ 



Problem 3 . If we push on the top of the block, the compressional strain is 
once more p/Y, and the corresponding strain in the sideways direction is again 
—<rp/ Y. We get 


A/3 

/ 


+<r 


£ 
Y ' 


Combining the results of the three problems — that is, taking Al = A/i + 
A/ 2 + A/ 3 — we get 

y = - f (1 - 2a). (38.6) 


The problem is, of course, symmetrical in all three directions; it follows that 



People like to call AV/V the volume strain and write 


P = 



F 


The volume stress p is proportional to the volume strain — Hooke’s law once more. 
The coefficient K is called the bulk modulus; it is related to the other constants by 


K 3(1 - 2a) 


(38.9) 


Since K is of some practical interest, many handbooks give Y and K instead of Y 
and er. If you wantcr you can always get it from Eq. (38.9). We can also see from 
Eq. (38.9) that Poisson’s ratio, cr, must be less than one-half. If it were not, the 
bulk modulus K would be negative, and the material would expand under increas- 
ing pressure. That would allow us to get mechanical energy out of any old block — 
it would mean that the block was in unstable equilibrium. If it started to expand 
it would continue by itself with a release of energy. 

Now we want to consider what happens when you put a “shear” strain on 
something. By shear strain we mean the kind of distortion shown in Fig. 38-4. As a 
preliminary to this, let us look at the strains in a cube of material subjected to the 
forces shown in Fig. 38-5. Again we can break it up into two problems: the vertical 


F 


F 


Fig. 38-5. A cube with compressing 
forces on top and bottom and equal 
stretching forces on two sides. 
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pushes, and the horizontal pulls. Calling A the area of the cube face, we have for 
the change in horizontal length 


A/ _1F 1 F _ 1 + cr F 

l ~ Y A + a Y A~ Y A' 

The change in the vertical height is just the negative of this. 


(38.10) 



Fig. 38-6. The two pairs of shear forces in (a) produce the same stress as 
the compressing and stretching forces of (b). 



Now suppose we have the same cube and subject it to the shearing forces 
shown in Fig. 38-6(a). Note that all the forces have to be equal if there are to be 
no net torques and the cube is to be in equilibrium. (Similar forces must also 
exist in Fig. 38-4, since the block is in equilibrium. They are provided through 
the “glue” that holds the block to the table.) The cube is then said to be in a state 
of pure shear. But note that if we cut the cube by a plane at 45° — say along the 
diagonal A in the figure — the total force acting across the plane is normal to plane 
and is equal to \^2G. The area over which this force acts is \/2A; therefore, the 
tensile stress normal to this plane is simply G/A. Similarly, if we examine a plane 
at an angle of 45° the other way — the diagonal B in the figure — we see that there 
is a compressional stress normal to this plane of —G/A. From this, we see that 
the stress in a “pure shear” is equivalent to a combination of tension and com- 
pression stresses of equal strength and at right angles to each other, and at 45° to 
the original faces of the cube The internal stresses and strains are the same as 
we would find in the larger block of material with the forces shown in Fig. 38-6(b). 
But this is the problem we have already solved. The change in length of the diagonal 
is given by Eq. (38.10), 


AD _ 1 + a G 
D Y A 


(38.11) 


(One diagonal is shortened; the other is elongated.) 

It is often convenient to express a shear strain in terms of the angle by which 
the cube is twisted — the angle 6 in Fig. 38-7. From the geometry of the figure you 
can see that the horizontal shift 5 of the top edge is equal to y/2 AD. So 


5 _ \/2 AD _ ^ AD 

I ~ T~ ~ 1 D 


(38.12) 


The shear stress g is defined as the tangential force on one face divided by the 
area, g ~ G/A. Using Eq. (38.11) in (38.12), we get 


Fig. 38-7. The shear strain 0 is 
2 AD/D. 


Or, writing this in the form “stress = constant times strain,” 


g = fxd. 


(38.13) 
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The proportionality coefficient ju is called the shear modulus (or, sometimes, the 
coefficient of rigidity). It is given in terms of Y and a by 


Y 

~ 2(1 + Or) 


(38.14) 


Incidentally, the shear modulus must be positive — otherwise you could get work 
out of a self-shearing block. From Eq. (38.14), cr must be greater than — 1. We 
know, then, that <7 must be between — 1 and +^; in practice, however, it is always 
greater than zero. 

As a last example of the type of situation where the stresses are uniform through 
the material, let’s consider the problem of a block which is stretched, while it is 
at the same time constrained so that no lateral contraction can take place. (Tech- 
nically, it’s a little easier to compress it while keeping the sides from bulging out — 
but it’s the same problem.) What happens? Well, there must be sideways forces 
which keep it from changing its thickness — forces we don’t know off-hand but 
will have to calculate. It’s the same kind of problem we have already done, only 
with a little different algebra. We imagine forces on all three sides, as shown in 
Fig. 38-8; we calculate the changes in dimensions, and we choose the transverse 
forces to make the width and height remain constant. Following the usual argu- 
ments, we get for the three strains: 



F* 

Ax 



(38.15) 

(38.16) 

(38.17) 


F y 

i 1 1 



1 

1 

! o 

i 

o Fz 

o ! 

i 

1 


i 

_i 


i 



Fig. 38-8. Stretching without 
contraction. 


Now since A l u and A l z are supposed to be zero, Eqs. (38.16) and (38.17) give 
two equations relating F y and F z to F x . Solving them together, we get that 


Fy = Fz = __<r_ F x 

A-jj A % 1 (T Ax 


Substituting in (38.15), we have 

A! x _ 1 / 2<r 2 \ F x _ 1 (\ - a - 2a 2 \ F x 

l x Y V 1 - a) A x Y\ l - a ) A x 


(38.18) 


(38.19) 


Often, you will see this turned around, and with the quadratic incr factored out, it 
is then written 


F l-o- A / 

a - (i + o")( i - 2a) Y r 


(38.20) 


When we constrain the sides, Young’s modulus gets multiplied by a complicated 
function of o\ As you can most easily see from Eq. (38.19), the factor in front of 
Y is always greater than 1. It is harder to stretch the block when the sides are 
held — which also means that a block is stronger when the sides are held than 
when they are not. 


38-3 The torsion bar; shear waves 

Let’s now turn our attention to an example which is more complicated because 
different parts of the material are stressed by different amounts. We consider a 
twisted rod such as you would find in a drive shaft of some machinery, or in a 
quartz fiber suspension used in a delicate instrument. As you probably know from 
experiments with the torsion pendulum, the torque on a twisted rod is proportional 
to the angle — the constant of proportionality obviously depending upon the 
length of the rod, on the radius of the rod, and on the properties of the material. 
The question is: In what way? We are now in a position to answer this question; 
it’s just a matter of working out some geometry. 
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Fig. 38-9. (a) A cylindrical bar in torsion, (b) A cylindrical shell in torsion, 

(c) Each small piece of the shell is in shear. 


Fig. 38-9(a) shows a cylindrical rod of length L , and radius a , with one end 
twisted by the angle <j> with respect to the other. If we want to relate the strains to 
what we already know, we can think of the rod as being made up of many cylindrical 
shells and work out separately what happens to each shell. We start by looking at 
a thin, short cylinder of radius r (less than a) and thickness i\r — as drawn in Fig, 
38-9(b). Now if we look at a piece of this cylinder that was originally a small 
square, we see that it has been distorted into a parallelogram. Each such element 
of the cylinder is in shear, and the shear angle 8 is 



The shear stress g in the material is, therefore [from Eq. (38.13)], 

g = ^=M~- (38.21) 

The shear stress is the tangential force AF on the end of the square divided 
by the area A / Ar of the end [see Fig. 38— 9(c)] 

AF 

8 ~ AlAr 

The force AF on the end of such a square contributes a torque At around the axis 
of the rod equal to 

At = r AF = rgAlAr. (38.22) 

The total torque r is the sum of such torques around a complete circumference of 
the cylinder. So putting together enough pieces so that the A/’s add up to 27rr, 
we find that the total torque, for a hollow tube , is 

rg(2irr) Ar. (38.23) 

Or, using (38.21), 

t = 2t tm r -~~ ■ (38.24) 

We get that the rotational stiffness, t/<£>, of a hollow tube is proportional to the 
cube of the radius r and to the thickness A r, and inversely proportional to the 
length L. 

We can now imagine a solid rod to be made up of a series of concentric tubes, 
each twisted by the same angle <f> (although the internal stresses are different for 
each tube). The total torque is the sum of the torques required to rotate each 
shell; for the solid rod 

t = lirfi ~ J r s dr , 
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where the integral goes from r — 0 to r = a 9 the radius of the rod. Integrating, 
we have 

r = n ~ <t>. (38.25) 

For a rod in torsion, the torque is proportional to the angle and is proportional to 
the fourth power of the diameter — a rod twice as thick is sixteen times as stiff 
for torsion. 

Before leaving the subject of torsion, let us apply what we have just learned 
to an interesting problem: torsional waves. If you take a long rod and suddenly 
twist one end, a wave of twist works it way along the rod, as sketched in Fig. 
38-10(a). That’s a little more exciting than a steady twist — let’s see whether we 
can work out what happens. 




Let z be the distance to some point down the rod. For a static torsion the 
torque is the same everywhere along the rod, and is proportional to 4>/L, the total 
torsion angle over the total length. What matters to the material is the local 
torsional strain, which is, you will appreciate, d<t>/dz. When the torsion along the 
rod is not uniform, we should replace Eq. (38.25) by 

r(z) (38.26) 

Now let’s look at what happens to an element of length A z shown magnified in 
Fig. 38— 10(b). There is a torque r(z) at end 1 of the little hunk of rod, and a differ- 
ent torque r(z + A z) at end 2. If Az is small enough, we can use a Taylor ex- 
pansion and write 

r(z + Az) = r(z) + Az. (38.27) 

The net torque At acting on the little piece of rod between z and z + Az is 
clearly the difference between r(z) and r(z + Az), or At = ( dr/dz ) Az. Differ- 
entiating Eq. (38.26), we get 

Ar = M ~r S’ Az - (38 - 28) 

The effect of this net torque is to give an angular acceleration to the little 
slice of the rod. The mass of the slice is 

AM ~ ( fra 2 Az)p, 

where p is the density of the material. We worked out in Chapter 19, Vol. I, that 
the moment of inertia of a circular cylinder is mr 2 /l \ calling the moment of inertia 
of our piece A/, we have 

A / = ~ pa 4 Az. (38.29) 

Newton’s law says the torque is equal to the moment of inertia times the angular 
acceleration, or 

At = A/ § • (38.30) 
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Pulling everything together, we get 


or 


2 dz 2 


7T 


pa 


4 


A z 


d 2 J_ 

dt 2 ’ 


d 2 <j) p d 2 <f> 
~dz^ ~~ ~jx ~dt^ 


(38.31) 


You will recognize this as the one-dimensional wave equation. We have found 
that waves of torsion will propagate down the rod with the speed 


Qh 


ear 



(38.32) 


The denser the rod — for the same stiffness — the slower the waves; and the stiffer 
the rod, the quicker the waves work their way down. The speed does not depend 
upon the diameter of the rod. 

Torsional waves are a special example of shear waves . In general, shear waves 
are those in which the strains do not change the volume of any part of the material. 
In torsional waves, we have a particular distribution of such shear stresses — namely, 
distributed on a circle. But for any arrangement of shear stresses, waves will 
propagate with the same speed — the one given in Eq. (38.32). For example, the 
seismologists find such shear waves travelling in the interior of the earth. 

We can have another kind of a wave in the elastic world inside a solid material. 
If you push something, you can start “longitudinal” waves — also called" compres- 
sional” waves. They are like the sound waves in air or in water — the displace- 
ments are in the same direction as the wave propagation. (At the surfaces of an 
elastic body there can also be other types of waves — called “Rayleigh waves” or 
“Love waves.” In them, the strains are neither purely longitudinal nor purely 
transverse. We will not have time to study them.) 

While we’re on the subject of waves, what is the velocity of the pure com- 
pressional waves in a large solid body like the earth 9 We say “large” because the 
speed of sound in a thick body is different from what it is, for instance, along a 
thin rod. By a “thick” body we mean one in which the transverse dimensions are 
much larger than the wavelength of the sound. Then, when we push on the object, 
it cannot expand sideways — it can only compress in one dimension. Fortunately, 
we have already worked out the special case of the compression of a constrained 
elastic material. We have also worked out in Chapter 47, Voh I, the speed of 
sound waves in a gas. Following the same arguments you can see that the speed 
of sound in a solid is equal to x/Fyp, where Y f is the “longitudinal modulus” — 
or pressure divided by the relative change in length — for the constrained case. 
This is just the ratio of A/// to F/A we got in Eq- (38.20). So the speed of the 
longitudinal waves is given by 


C 


2 

long 


7' _ 1 - a Y 

P ~ (1 + o-)(l - 2 a) p 


(38.33) 


So long ascr is between zero and 1/2, the shear modulus p is less than Young’s 
modulus 7, and also Y f is greater than Y, so 


P < Y < Y'. 


This means that longitudinal waves travel faster than shear waves. One of the most 
precise ways of measuring the elastic constants of a substance is by measuring the 
density of the material and the speeds of the two kinds of waves. From this 
information one can get both Y and a. It is, incidentally, by measuring the differ- 
ence in the arrival times of the two kinds of waves from an earthquake that a 
seismologist can estimate — even from the signals at only one station — the distance 
to the quake. 
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38-4 The bent beam 


We want now to look at another practical matter — the bending of a rod or a 
beam. What are the forces when we bend a bar of some arbitrary cross section? 
We will work it out thinking of a bar with a circular cross section, but our answer 
will be good for any shape. To save time, however, we will cut some corners, so 
our theory we will work out is only approximate. Our results will be correct only 
when the radius of the bend is much larger than the thickness of the beam. 

Suppose you grab the two ends of a straight bar and bend it into some curve 
like the one shown in Fig. 38-11. What goes on inside the bar? Well, if it is curved, 
that means that the material on the inside of the curve is compressed and the ma- 
terial on the outside is stretched. There is some surface which goes along more or 
less parallel to the axis of the bar that is neither stretched nor compressed. This is 
called the neutral surface. You would expect this surface to be near the “middle” 
of the cross section. It can be shown (but we won’t do it here) that, for small 
bending of simple beams, the neutral surface goes through the “center of gravity” 
of the cross section. This is true only for “pure” bending — if you are not stretching 
or compressing the beam at the same time. 

For pure bending, then, a thin transverse slice of the bar is distorted as shown 
in Fig. 38- 12(a). The material below the neutral surface has a compressional 
strain which is proportional to the distance from the neutral surface ; and the material 
above is stretched, also in proportion to its distance from the neutral surface. So 
the longitudinal stretch A / is proportional to the height y. The constant of pro- 
portionality is just / over the radius of curvature of the bar — see Fig. 38-12: 



A/ __ y 
l ~ R 

So the force per unit area — the stress — in a small strip at y is also proportional to 
the distance from the neutral surface 


AF = Y y 
A A R' 


(38.34) 


Now let’s look at the forces that would produce such a strain. The forces 
acting on the little segment drawn in Fig. 38-12 are shown in the figure. If we 
think of any transverse cut, the forces acting across it are one way above the 
neutral surface and the other way below. They come in pairs to make a “bending 
moment” — by which we mean the torque about the neutral line. We can com- 
pute the total moment by integrating the force times the distance from the neutral 
surface for one of the faces of the segment of Fig. 38-12: 



cross 

sect 


(38.35) 


From Eq. (38.34), dF = Yy/R dA, so 




3TI = ~ J y 2 dA. 

The integral of y 2 dA is what we can call the “moment of inertia” of the geometric 
cross section about a horizontal axis through its “center of mass”;* we will call 
it /: 

an = ~ (38.36) 

K 


(b) 


Fig. 38-12. (a) Small segment of a 

bent beam, (b) Cross section of ♦he beam. 


I = 


/ 


y 2 dA. 


(38.37) 


It is, of course, really the moment of inertia of a slice with unit mass per unit area. 
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Equation (38.36), then, gives us the relation between the bending moment SfTt 
and the curvature 1 /R of the beam. The “stiffness” of the beam is proportional 
to Y and to the moment of inertia /. In other words, if you want the stiffest 
possible beam with a given amount of, say, aluminum, you want to put as much 
of it as possible as far as you can from the neutral surface, to make a large moment 
of inertia. You can’t carry this to an extreme, however, because then the thing 
will not curve as we have supposed — it will buckle or twist and become weaker 
again. But now you see why structural beams are made in the form of an I or an 
H — as shown in Fig. 38-13. 

As an example of the use of our beam equation (38.36), let’s work out the 
deflection of a cantilevered beam with a concentrated force W acting at the free 
end, as sketched in Fig. 38-14. (By “cantilevered” we simply mean that the beam 
is supported in such a way that both the position and the slope are fixed at one 
end — it is stuck into a cement wall.) What is the shape of the beam? Let’s call 
the deflection at the distance x: from the fixed end z; we want to know z(x). We’ll 
work it out only for small deflections. We will also assume that the beam is long 
in comparison with its cross section. Now, as you know from your mathematics 
courses, the curvature l/R of any curve z(x) is given by 



w 


1 _ d 2 z/dx 2 
R ~ [T + {dz/dxyfl2 


(38.38) 


Since we are interested only in small slopes — this is usually the case in engineering 
structures — we neglect (< dz/dx ) 2 in comparison with 1, and take 


L = 

R dx 2 


(38.39) 


38-14. A cantilevered beam We also need to know the bending moment 9fH. It is a function of x because it is 

weight at one end. equal to the torque about the neutral axis of any cross section. Let’s neglect the 

weight of the beam and take only the downward force W at the end of the beam. 

(You can put in the beam weight yourself if you want.) Then the bending moment 

at x is 

9TC(jc) - W(L - x), 


because that is the torque about the point at x, exerted by the weight W — the 
torque which the beam must support of x. We get 


or 



YI 

d^Z 

W(L - x) 

~ ~R ~ 

YI — 
dx 2 

d 2 z 

W 


dx 2 

YI (L 

x). 


This one we can integrate without any tricks; we get 


(38.40) 


W ( Lx 2 
Z ~ YI V 2 



(38.41) 


using our assumptions that z(0) — 0 and that dz/dx is also zero at x = 0. That 
is the shape of the beam. The displacement of the end is 


z(L) = 


W L?_ . 
YI 3 ’ 


(38.42) 


the displacement of the end of a beam increases as the cube of the length. 

In deriving our approximate beam theory, we have assumed that the cross 
section of the beam did not change when the beam was bent. When the thickness 
of the beam is small compared to the radius of curvature, the cross section changes 
very little and our result is O.K. In general, however, this effect cannot be neglected, 
as you can easily demonstrate for yourselves by bending a soft-rubber eraser in 
your fingers. If the cross section was originally rectangular, you will find that when 
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it is bent it bulges at the bottom (see Fig. 38-15). This happens because when we 
compress the bottom, the material expands sideways — as described by Poisson’s 
ratio. Rubber is easy to bend or stretch, but it is somewhat like a liquid in that 
it’s hard to change the volume — as shows up nicely when you bend the eraser. For 
an incompressible material, Poisson’s ratio would be exactly 1/2 — for rubber it is 
nearly that. 


38-5 Buckling 

We want now to use our beam theory to understand the theory of the “buck- 
ling” of beams, or columns, or rods. Consider the situation sketched in Fig. 
38-16 in which a rod that would normally be straight is held in its bent shape by 
two opposite forces that push on the ends of the rod. We would like to calculate 
the shape of the rod and the magnitude of the forces on the ends. 

Let the deflection of the rod from the straight line between the ends be >>(*), 
where x is the distance from one end. The bending moment at the point P 
in the figure is equal to the force F multiplied by the moment arm, which is the 
perpendicular distance y, 

9T l(x) = Fy. (38.43) 




Fig. 38-1 5 (a) A bent eraser; (b) 

cross section. 


Using the beam equation (38.36), we have 


Y L - 

~R 

For small deflections, we can take \/R — 
curvature is downward). We get 

d 2 y = _ 
dx 2 ~ 


Fy. (38.44) 



which is the differential equation of a sine wave. So for small deflections, the curve 

of such a bent beam is a sine curve. The “wavelength” X of the sine wave is twice Fig. 38-16. A buckled beam, 

the distance L between the ends. If the bending is small, this is just twice the 
unbent length of the rod. So the curve is 


y = K sin ttx/L. 

Taking the second derivative, we get 


dx 1 


7T 

1 1 


y • 


Comparing this to Eq. (38.45), we see that the force is 


F = 


2 


7T 


YJ 
LJ ' 


(38.46) 


For small bendings the force is independent of the bending displacement y f 

We have, then, the following thing physically If the force is less than the F 
given in Eq. (38.46), there will be no bending at all. But if it is slightly greater 
than this force, the material will suddenly bend a large amount — that is, for 
forces above the critical force tt 2 YI/L 2 (often called the “Euler force”) the beam 
will “buckle.” If the loading on the second floor of a building exceeds the Euler 
force for the supporting columns, the building will collapse. Another place where 
the buckling force is most important is in space rockets. On one hand, the rocket 
must be able to hold its own weight on the launching pad and endure the stresses 
during acceleration, on the other hand, it is important to keep the weight of the 
structure to a minimum, so that the payload and fuel capacity may be made as 
large as possible. 

Actually a beam will not necessarily collapse completely when the force 
exceeds the Euler force. When the displacements get large, the force is larger than 
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Fig. 38-17. The coordinates S and 0 
for the curve of a bent beam. 



F, 


F. 



Fig. 38-18. 


Curves of a bent rod 


what we have found because of the terms in \/R in Eq. (38.38) that we have ne- 
glected. To find the forces for a large bending of the beam, we have to go back to 
the exact equation, Eq. (38.44), which we had before we used the approximate 
relation between R and y. Equation (38.44) has a rather simple geometrical prop- 
erty.* It’s a little complicated to work out, but rather interesting. Instead of 
describing the curve in terms of x and y, we can use two new variables: S , the 
distance along the curve, and 0 the slope of the tangent to the curve. See Fig. 38-17. 
The curvature is the rate of change of angle with distance: 

J_ = dl 

R dS* 


We can, therefore write the exact equation (38.44) as 

cW F 

dS ~ YI y ‘ 


If we take the derivative of this equation with respect to S and replace dy/dS by 
sin 0, we get 


d 2 d 
dS 2 


F . 
yj sin 0. 


(38.47) 


[If 0 is small, we get back Eq. (38.45). Everything is O.K.J 

Now it may or may not delight you to know that Eq. (38.47) is exactly the 
same one you get for the large amplitude oscillations of a pendulum — with F/YI 
replaced by another constant, of course. We learned way back in Chapter 9, Vol. I, 
how to find the solution of such an equation by a numerical calculation, f The 
answers you get are some fascinating curves — known as the curves of the “Elastica.” 
Figure 38-18 shows three curves for different values of F/YI. 


* The same equation appears, incidentally, in other physical situations — for example, 
the meniscus at the surface of a liquid contained between parallel planes — and the same 
geometrical solution can be used. 

t The solutions can also be expressed in terms of some functions, called the "Jacobian 
elliptic functions,” that someone else has already computed. 
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39 


Elastic Materials 


39-1 The tensor of strain 


In the last chapter we talked about the distortions of particular elastic objects. 
In this chapter we want to look at what can happen in general inside an elastic 
material. We would like to be able to describe the conditions of stress and strain 
inside some big glob of jello which is twisted and squashed in some complicated 
way. To do this, we need to be able to describe the local strain at every point in an 
elastic body ; we can do it by giving a set of six numbers — which are the components 
of a symmetric tensor — for each point. Earlier, we spoke of the stress tensor 
(Chapter 31); now we need the tensor of strain. 

Imagine that we start with the material initially unstrained and watch the 
motion of a small speck of “dirt” embedded in the material when the strain is 
applied. A speck that was at the point P located at r = (x, y, z) moves to a new 
position P r at r' = (x\ y~\ z f ) as shown in Fig. 39-1. We will call u the vector 
displacements from P to / . Then 

u = r f - r. (39.1) 


39-1 The tensor of strain 

39-2 The tensor of elasticity 

39-3 The motions in an elastic body 

39-4 Nonelastic behavior 

39-5 Calculating the elastic constants 


Reference: C. Kittel, Introduction to 
Solid State Physics, John 
Wiley and Sons, Inc., New 
York, 2nd ed., 1956. 


The displacement u depends, of course, on which point P we start with, so u is a 
vector function of r — or, if you prefer, of (x, y, z). 

Let's look first at a simple situation in which the strain is constant over the 
material — so we have what is called a homogeneous strain. Suppose, for instance, 
that we have a block of material and we stretch it uniformly. We just change its 
dimensions uniformly in one direction — say, in the ^-direction, as shown in Fig. 
39-2. The motion u x of a speck at x is proportional to x. In fact, 


u x _ A / 
x / 


We will write u x this way: 

Ux = e xx x. BEFORE 



Fig. 39-1. A speck of the material at the point P in an unstrained block Fig. 39-2. A homogeneous stretch-type strain, 

moves to P f where the block is strained. 
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The proportionality constant e xx is, of course, the same thing as A l/L (You will 
see shortly why we use a double subscript.) 

If the strain is not uniform, the relation between u x and x will vary from place 
to place in the material. For the general situation, we define the e xx by a kind of 
local A///, namely by 

e xx = dujdx. (39.2) 


This number — which is now a function of x, y, and z — describes the amount of 
stretching in the x-direction throughout the hunk of jello. There may, of course, 
also be stretching in the y- and z-directions. We describe them by the numbers 


e yy 


dliy 

~dy ’ 


&ZZ 


du z 

dz 


(39.3) 


We need to be able to describe also the shear-type strains. Suppose we imagine 
a little cube marked out in the initially undisturbed jello. When the jello is pushed 
out of shape, this cube may get changed into a parallelogram, as sketched in Fig. 
39-3.* In this kind of a strain, the x-motion of each particle is proportional to 
its ^-coordinate, 

ux = {y. (39.4) 

And there is also a j-motion proportional to x, 

«* = !*. (39.5) 


So we can describe such a shear-type strain by writing 


with 


— e X yy j 


Ui, (?vxX 


e »* = 2 


Now you might think that when the strains are not homogeneous we could 
describe the generalized shear strains by defining the quantities e xy and e yx by 


du x 

dy 


6Uy 

eyx ~ ~dx 


(39.6) 


BEFORE 




Fig. 39-3. A homogeneous shear strain. 


But there is one difficulty. Suppose that the displacements u x and u y were given by 

e e 

u x = j y > Uy = ~ 2' 

* We choose for the moment to split the total shear angle 6 into two equal parts and 
make the strain symmetric with respect to x and y . 
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BEFORE 



AFTER 


Fig. 39—4. A homogeneous rotation — there is no strain. 


They are like Eqs. (39.4) and (39.5) except that the sign of u y is reversed. With 
these displacements a little cube in the jello simply gets shifted by the angle 6/2, 
as shown in Fig. 39-4. There is no strain at all — just a rotation in space. There is 
no distortion of the material; the relative positions of all the atoms are not changed 
at all. We must somehow make our definitions so that pure rotations are not 
included in our definitions of a shear strain. The key point is that if du y /dx and 
du x /dy are equal and opposite, there is no strain ; so we can fix things up by defining 

&xy ~ @yx ~ 2 (fia y / dx -j - da x / By). 


For a pure rotation they are both zero, but for a pure shear we get that e xy is 
equal to e yx , as we would like. 

In the most general distortion — which may include stretching or compression 
as well as shear — we define the state of strain by giving the nine numbers 


e vv 


3Uy 
dy ’ 


(39.7) 


e xy = i(du y /dx + dujdy). 


These are the terms of a tensor of strain. Because it is a symmetric tensor — our 
definitions make e xy = e yx , always — there are really only six different numbers. 
You remember (see Chapter 31) that the general characteristic of a tensor is that 
the terms transform like the products of the components of two vectors. (If 
A and B are vectors, C X3 — A X B 3 is a tensor.) Each term of e X3 is a product 
(or the sum of such products) of the components of the vector u = (u x , u y , u z ), and 
of the operator V = (d/dx, d/dy, d/dz), which we know transforms like a vector. 
Let’s let x u x 2 , and x 3 stand for x, y, and z and w x , u 2 , and w 3 stand for u x , u y , 
and u z ; then we can write the general term ei 3 of the strain tensor as 

e X3 ~ 2 (du 3 /dx x + dufdxfi, (39.8) 

where / and j can be 1,2, or 3. 

When we have a homogeneous strain — which may include both stretching 
and shear — all of the e X3 are constants, and we can write 

u x = e xx x + c xy y 4“ e xz z . (39.9) 

(We choose our origin of x, y, z at the point where « is zero.) In this case, the strain 
tensor e x3 gives the relationship between two vectors: the coordinate vector r = 
(x, y, z) and the displacement vector u = (u x , u y , u z ). 
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When the strains are not homogeneous, any piece of the jello may also get 
somewhat twisted — there will be a local rotation. If the distortions are all small, 
we would have 

Aw, = 2 ~ (39.10) 

3 

where a) l3 is an antisymmetric tensor, 

o) l3 = ^(dUj/dXt — dUt/dXj), (39.11) 

which describes the rotation. We will, however, not worry any more about rota- 
tions, but only about the strains described by the symmetric tensor e %3 . 

39-2 The tensor of elasticity 

Now that we have described the strains, we want to relate them to the internal 
forces — the stresses in the material. For each small piece of the material, we 
assume Hooke’s law holds and write that the stresses are proportional to the 
strains. In Chapter 31 we defined the stress tensor S l3 as the ith component of the 
force across a unit area perpendicular to the y-axis. Hooke’s law says that each 
component of S l3 is linearly related to each of the components of strain. Since 
S and e each have nine components, there are 9 X 9 = 81 possible coefficients 
which describe the elastic properties of the material. They are constants if the 
material itself is homogeneous. We write these coefficients as C l3k i and define 
them by the equation 

Sij = ^2 (39.12) 

k,i 

where iff, k , / all take on the values 1, 2, or 3. Since the coefficients C ljk i relate 
one tensor to another, they also form a tensor — a tensor of the fourth rank . We 
can call it the tensor of elasticity. 

Suppose that all the C’s are known and that you put a complicated force on 
an object of some peculiar shape. There will be all kinds of distortion, and the 
thing will settle down with some twisted shape. What are the displacements? 
You can see that it is a complicated problem. If you knew the strains, you could 
find the stresses from Eq. (39.12) — or vice versa. But the stresses and strains you 
end up with at any point depend on what happens in all the rest of the material. 

The easiest way to get at the problem is by thinking of the energy. When there 
is a force F proportional to a displacement x, say F = kx , the work required for 
any displacement x is kx 2 /l. In a similar way, the work w that goes into each 
unit volume of a distorted material turns out to be 

W = i ^ C nk ie l} e k i. (39.13) 

ijkl 

The total work W done in distorting the body is the integral of w over its volume: 

w = I hY, CukMidVol (39.14) 

ijkl 

This is then the potential energy stored in the internal stresses of the material. 
Now when a body is in equilibrium, this internal energy must be at a minimum. 
So the problem of finding the strains in a body can be solved by finding the set of 
displacements u throughout the body which will make W a minimum. In Chapter 
19 we gave some of the general ideas of the calculus of variations that are used in 
tackling minimization problems like this. We cannot go into the problem in any 
more detail here. 

What we are mainly interested in now is what we can say about the general 
properties of the tensor of elasticity. First, it is clear that there are not really 81 
different terms in C t3 u. Since both S l3 and e l3 are symmetric tensors, each with 
only six different terms, there can be at most 36 different terms in C l3ki . There are, 
however, usually many fewer than this. 
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Let’s look at the special case of a cubic crystal. In it, the energy density w 
starts out like this: 


W 2 \C xxxx e xx 


+ Cxxxy^xx^xy ~\~ ^xxxz^xx^xz 

+ C xxyx e xx e x y “1“ ^xxyy^xx^yy • * • ^tC . . . 
~f CyyyyCyy . etC . . . CtC . . . } , 


(39.15) 


with 81 terms in all! Now a cubic crystal has certain symmetries. In particular, if 
the crystal is rotated 90°, it has the same physical properties. It has the same 
stiffness for stretching in the ^-direction as for stretching in the x-direction. There- 
fore, if we change our definition of the coordinate directions x and y in Eq. (39. 1 5), 
the energy wouldn’t change. It must be that for a cubic crystal 


— C = r 
^ yyyy W; 


(39.16) 


Next we can show that the terms like C xxxy must be zero. A cubic crystal has 
the property that it is symmetric under a reflection about any plane perpendicular 
to one of the axes. If we replace y by —y, nothing is different. But changing y to 
—y changes e xy to — e xy — a displacement which was toward ~\-y is now toward —y. 
If the energy is not to change, C xxxy must go into — C xxxy when we make a reflec- 
tion. But a reflected crystal is the same as before, so C xxxy must be the same as 
— C xxxy . This can happen only if both are zero. 

You say, “But the same argument will make C yy yy = 0!” No, because there 
are four y’ s. The sign changes once for each y, and four minuses make a plus. If 
there are two or four y’ s, the term does not have to be zero. It is zero only when 
there is one, or three. So, for a cubic crystal, any nonzero term of C will have only 
an even number of identical subscripts. (The arguments we have made for y ob- 
viously hold also for x and z.) We might then have terms like C xxyy , C xyxy , C xyyx , 
and so on. We have already shown, however, that if we change all x’s to y 9 s and 
vice versa (or all z’s and x’s, and so on) we must get — for a cubic crystal — the same 
number. This means that there are only three different nonzero possibilities: 


C X XXX 1 

r =C 

i ^yyyy 

Cxxyy < 

( ~ Cyy X x 

Cxyxy ’ 

( = Cy X yx 


~ C ZZZZ )y 

Cjcxzz) etc.), 

= Cxzxz> GtC.). 


(39.17) 


For a cubic crystal, then, the energy density will look like this: 

W = 2 {C XXXX (& xx + @yy H“ &zz) 

T" 2C XX yy(e X x£yy + e yy€zz + e zz e xx) (39.18) 

“b 4 C x yxy(e xy + el z + e%)}. 

For an isotropic — that is, noncrystalline — material, the symmetry is still 
higher. The C’s must be the same for any choice of the coordinate system. Then 
it turns out that there is another relation among the C’s, namely, that 


C, 


- C, 


+ c x . 


(39.19) 


We can see that this is so by the following general argument. The stress tensor 
S t j has to be related to e l3 in a way that doesn’t depend at all on the coordinate 
directions — it must be related only by scalar quantities. “That’s easy,” you say. 
“The only way to obtain S tJ from e l3 is by multiplication by a scalar constant. 
It’s just Hooke’s law. It must be that S l3 = (const)c^.” But that’s not quite 
right; there could also be the unit tensor 8 tJ multiplied by some scalar, linearly 
related to e tJ . The only invariant you can make that is linear in the e’s is £e tl 
(It transforms like x 2 + y 2 + z 2 , which is a scalar.) So the most general form 
for the equation relating S l3 to e ld — for isotropic materials — is 

S l3 ~ 2fxe tJ + X (? e**) (39.20) 

(The first constant is usually written as two times jit; then the coefficient ju is equal 
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to the shear modulus we defined in the last chapter.) The constants ju and X are 
called the Lame elastic constants. Comparing Eq. (39.20) with Eq. (39.12), you 
see that 

Cxxyy — y 

Caw = 2fx, > (39.21) 

Cxxxx ~ "j“ X. 


So we have proved that Eq. (39.19) is indeed true. You also see that the elastic 
properties of an isotropic material are completely given by two constants, as we 
said in the last chapter. 

The C’s can be put in terms of any two of the elastic constants we have used 
earlier — for instance, in terms of Young’s modulus Y and Poisson’s ratio (r. We 
will leave it for you to show that 



Fig. 39-5. A small volume element V 
bounded by the surface A. 


2a 

c ™ - rh (r^2?) ' (39 ' 22) 

^ xyxy (1 + a) 

39-3 The motions in an elastic body 

We have pointed out that for an elastic body in equilibrium the internal 
stresses adjust themselves to make the energy a minimum. Now we take a look at 
what happens when the internal forces are not in equilibrium. Let’s say we have 
a small piece of the material inside some surface A. See Fig. 39-5. If the piece is in 
equilibrium, the total force F acting on it must be zero. We can think of this force 
as being made up of two parts. There could be one part due to “external” forces 
like gravity, which act from a distance on the matter in the piece to produce a 
force per unit volume / ext . The total external force F exz is the integral of f QXt over 
the volume of the piece: 


C = 

^ T.XTT, 


1 + 


^( 1 + 


1 - 


Fcxt = / foxtdV. 


(39.23) 


In equilibrium, this force would be balanced by the total force F lut from the neigh- 
boring material which acts across the surface A. When the piece is not in equili- 
brium — if it is moving — the sum of the internal and external forces is equal to the 
mass times the acceleration. We would have 


F tix t + F in t = J prdV , (39.24) 

where p is the density of the material, and r is its acceleration. We can now com- 
bine Eqs. (39.23) and (39.24), writing 

F mt = / (-/ext + pr)civ. (39.25) 

J V 

We will simplify our writing by defining 

/ = —/ext + P'r ■ (39.26) 

Then Eq. (39.25) is written 

F mt = / fdV. (39.27) 

J v 

What we have called F ini is related to the stresses in the material. The stress 
tensor S tJ was defined (Chapter 31) so that the ^-component of the force dF across 
a surface element da , whose unit normal is «, is given by 
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dF x (^S xx n x H~ S X yny -f - S X zftz) da. 


(39.28) 



The x-component of F lnt on our little piece is then the integral of dF x over the 
surface. Substituting this into the x-component of Eq. (39.27), we get 


I (S xx n x + S xy n y + S X2 n z ) da — I f x dV. (39.29) 

J A J v 

We have a surface integral related to a volume integral — and that reminds 
us of something we learned in electricity. Note that if you ignore the first subscript 
x on each of the S ’ s in the left-hand side of Eq. (39.29), it looks just like the integral 
of a quantity “S” ■ n — that is, the normal component of a vector — over the 
surface. It would be the flux of “5” out of the volume. And this could be written, 
using Gauss law, as the volume integral of the divergence of “S”. It is, in fact, 
true whether the x-subscript is there or not — it is just a mathematical theorem 
you get by integrating by parts. In other words, we can change Eq. (39.29) into 



(39.30) 


Now we can leave off the volume integrals and write the differential equation for 
the general component of / as 

/• = ( 39 -31) 

3 3 


This tells us how the force per unit volume is related to the stress tensor S l3 . 

The theory of the motions inside a solid works this way. If we start out know- 
ing the initial displacements — given by, say, u — we can work out the strains e l3 . 
From the strains we can get the stresses from Eq. (39.12). From the stresses we 
can get the force density /in Eq. (39.31). Knowing/, we can get, from Eq. (39.26), 
the acceleration r of the material, which tells us how the displacements will be 
changing. Putting everything together, we get the horrible equation of motion 
for an elastic solid. We will just write down the results that come out for an 
isotropic material. If you use (39.20) for S l3 , and write the e l3 as ^du l /dx 3 + 
du 3 /dx t , you end up with the vector equation 


/ = (X + /*) V(V • u) + M V 2 ir. (39.32) 


You can, in fact, see that the equation relating /and u must have this form. 
The force must depend on the second derivatives of the displacements u What 
second derivatives of u are there that are vectors? One is V(V • «); that’s a true 
vector. The only other one is V 2 u. So the most general form is 

/ = a V(V • u) + b V 2 w, 


which is just (39.32) with a different definition of the constants. You may be 
wondering why we don’t have a third term using V X V X «, which is also a 
vector. But remember that V X V X u is the same thing as V 2 w — V(V * «), 
so it is a linear combination of the two terms we have. Adding it would add nothing 
new. We have proved once more that isotropic material has only two elastic 
constants. 

For the equation of motion of the material, we can set (39.32) equal to 
p d 2 u/dt 2 — neglecting for now any body forces like gravity— and get 

p = (X + Ml v(v • H) + M V 2 «. (39.33) 

It looks something like the wave equation we had in electromagnetism, except 
that there is an additional complicating term. For materials whose elastic proper- 
ties are everywhere the same we can see what the general solutions look like in the 
following way. You will remember that any vector field can be written as the sum 
of two vectors: one whose divergence is zero, and the other whose curl is zero, in 
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Fig. 39-6. Measuring internal 
stresses with polarized light. 



Fig. 39-7. A stressed plastic model 
as seen between crossed polaroids. 
[From F. W. Sears, Optics, Addison- 
Wesley Publishing Co., Reading, Mass., 
1949.] 


other words, we can put 

u = M X + i* 2 , (39.34) 

where 

V * u x = 0, V X m 2 = 0. (39.35) 

Substituting n x + u 2 for u in (39.33), we get 

p d 2 /dt 2 [ui + «2] = (X + m) V(V * « 2 ) + mV 2 («i + m 2 ). (39.36) 

We can eliminate u x by taking the divergence of this equation, 

p d 2 /dt 2 (V • m 2 ) = (X + /*) V 2 (V • « 2 ) + • V 2 m 2 . 

Since the operators (V 2 ) and (V*) can be interchanged, we can factor out the di- 
vergence to get 

V- { pd 2 u 2 /dt 2 - (A + 2ju) V 2 m 2 } = 0. (39.37) 

Since V X u 2 is zero by definition, the curl of the bracket { } is also zero; so the 
bracket itself is identically zero, and 

pd 2 u 2 /dt 2 — (X + 2p) V 2 w 2 . v (39.38) 

This is the vector wave equation for waves which move at the speed 
C* 2 = a/(A + 2 p)/p. Since the curl of u 2 is zero, there is no shearing associated 
with this wave; this wave is just the compressional — sound-type — wave we discussed 
in the last chapter, and the velocity is just what we found for Ci onK - 

In a similar way — by taking the curl of Eq. (39.36) — we can show that iq 
satisfies the equation 

pd 2 u x /dt 2 = /xV 2 m x . (39.39) 

This is again a vector wave equation for waves with the speed C 2 = a/m7p. 
Since V • u x is zero, mi produces no changes in density; the vector u x corresponds 
to the transverse, or shear-type, wave we saw in the last chapter, and C 2 = C s h ea r* 
If we wished to know the static stresses in an isotropic material, we could, 
in principle, find them by solving Eq. (39.32) with / equal to zero — or equal to the 
static body forces from gravity such as pg — under certain conditions which are 
related to the forces acting on the surfaces of our large block of material. This is 
somewhat more difficult to do than the corresponding problems in electromagne- 
tism. It is more difficult, first, because the equations are a little more difficult to 
handle, and second, because the shape of the elastic bodies we are likely to be 
interested in are usually much more complicated. In electromagnetism, we are 
often interested in solving Maxwell’s equations around relatively simple geometric 
shapes such as cylinders, spheres, and so on, since these are convenient shapes 
for electrical devices. In elasticity, the objects we would like to analyze may have 
quite complicated shapes — like a crane hook, or an automobile crankshaft, or the 
rotor of a gas turbine. Such problems can sometimes be worked out approxi- 
mately by numerical methods, using the minimum energy principle we mentioned 
earlier. Another way is to use a model of the object and measure the internal strains 
experimentally, using polarized light. 

It works this way: When a transparent isotropic material — for example, a 
clear plastic like lucite — is put under stress, it becomes birefnngent. If you put 
polarized light through it, the plane of polarization will be rotated by an amount 
related to the stress: by measuring the rotation, you can measure the stress. Figure 
39-6 shows how such a setup might look. Figure 39-7 is a photograph of a 
photoelastic model of a complicated shape under stress. 


39-4 Nonelastic behavior 

In all that has been said so far, we have assumed that stress is proportional 
to strain; in general, that is not true. Figure 39-8 shows a typical stress-strain 
curve for a ductile material. For small strains, the stress is proportional to the 
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strain. Eventually, however, after a certain point, the relationship between stress 
and strain begins to deviate from a straight line. For many materials — the ones 
we would call “brittle” — the object breaks for strains only a little above the point 
where the curve starts to bend over. In general, there are other complications in 
the stress-strain relationship. For example, if you strain an object, the stresses 
may be high at first, but decrease slowly with time. Also if you go to high stresses, 
but still not to the “breaking” point, when you lower the strain the stress will 
return along a different curve. There is a small hysteresis effect (like the one we 
saw between B and H in magnetic materials). 

The stress at which a material will break varies widely from one material to 
another. Some materials will break when the maximum tensile stress reaches a 
certain value. Other materials will fail when the maximum shear stress reaches a 
certain value. Chalk is an example of a material which is much weaker in tension 
than in shear. If you pull on the ends of a piece of blackboard chalk, the chalk will 
break perpendicular to the direction of the applied stress, as shown in Fig. 39-9(a). 
It breaks perpendicular to the applied force because it is only a bunch of particles 
packed together which are easily pulled apart. The material is, however, much 
harder to shear, because the particles get in each other’s way. Now you will re- 
member that when we had a rod in torsion there was a shear all around it. Also, we 
showed that a shear was equivalent to a combination of a tension and compression 
at 45°. For these reasons, if you twist a piece of blackboard chalk, it will break 
along a complicated surface which starts out at 45° to the axis. A photograph of a 
piece of chalk broken in this way is shown in Fig. 39-9(b). The chalk breaks where 
the material is in maximum tension. 



Fig. 39-8. A typical stress-strain re- 
lation for large strains. 








Fig. 39-9. (a) A piece of chalk broken by pulling on the ends; (b) a piece broken by twisting. 


Other materials behave in strange and complicated ways. The more compli- 
cated the materials are, the more interesting their behavior. If we take a sheet of 
“Saran-Wrap” and crumple it up into a ball and throw it on the table, it slowly 
unfolds itself and returns toward its original flat form. At first sight, we might 
be tempted to think that it is inertia which prevents it from returning to its original 
form. However, a simple calculation shows that the inertia is several orders of 
.magnitude too small to account for, the effect. There appear to be two important 
competing effects: “something” inside the material “remembers” the shape it had 
initially and “tries” to get back there, but something else “prefers” the new shape 
and “resists” the return to the old shape. 

We will not attempt to describe the mechanism at play in the Saran plastic, 
but you can get an idea of how such an effect might come about from the following 
model Suppose you imagine a material made of long, flexible, but strong, fibers 
mixed together with some hollow cells filled with a viscous liquid. Imagine also 
that there are narrow pathways from one cell to the next so the liquid can leak 
slowly from a cell to its neighbor. When we crumple a sheet of this stuff, we 
distort the long fibers, squeezing the liquid out of the cells in one place and forcing 
it into other cells which are being stretched. When we let go, the long fibers try to 
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return to their original shape. But to do this, they have to force the liquid back to 
its original location — which will happen relatively slowly because of the viscosity. 
The forces we apply in crumpling the sheet are much larger than the forces exerted 
by the fibers. We can crumple the sheet quickly, but it will return more slowly. 
It is undoubtedly a combination of large stiff molecules and smaller, movable ones 
in the Saran-Wrap that is responsible for its behavior. This idea also fits with the 
fact that the material returns more quickly to its original shape when it’s warmed 
up than when it’s cold — the heat increases the mobility (decreases the viscosity) 
of the smaller molecules. 

Although we have been discussing how Hooke’s law breaks down, the re- 
markable thing is perhaps not that Hooke’s law breaks down f<y large strains but 
that it should be so generally true. We can get some idea of w^y this might be by 
looking at the strain energy in a material. To say that the stress is proportional to 
the strain is the same thing as saying that the strain energy varies as the square of 
the strain. Suppose we have a rod and we twist it through a small angle 0. If 
Hooke’s law holds, the strain energy should be proportional to the square of 6 . 
Suppose we were to assume that the energy were some arbitrary function of the 
angle; we could write it as a Taylor expansion about zero angle 

U(d) = 17(0) + U f (0)9 + £E/"(O)0 2 + £t/'"(0)0 3 . . . (39.40) 

The torque r is the derivative of U with respect to angle; we would have 

t{e) = U f (0) + U"(0)6 + iU”'(0)6 2 + • • • (39.41) 

Now if we measure our angles from the equilibrium position, the first term is zero. 
So the first remaining term is proportional to 0; and for small enough angles, it 
will dominate the term in 6 2 . [Actually, materials are sufficiently symmetric 
internally so that r(0) = — r(— 0); the term in d 2 will be zero, and the departures 
from linearity would come only from the 0 3 term. There is, however, no reason 
why this should be true for compressions and tensions.] The thing we have not 
explained is why materials usually break soon after the higher-order terms become 
significant. 

39-5 Calculating the elastic constants 

As our last topic on elasticity we would like to show how one could try to 
calculate the elastic constants of a material, starting with some knowledge of the 
properties of the atoms which make up the material. We will take only the simple 
case of an ionic cubic crystal like sodium chloride. When a crystal is strained, its 
volume or its shape is changed. Such changes result in an increase in the potential 
energy of the crystal. To calculate the change in strain energy, we have to know 
where each atom goes. In complicated crystals, the atoms will rearrange themselves 
in the lattice in very complicated ways to make the total energy as small as possible. 
This makes the computation of the strain energy rather difficult. In the case of a 
simple cubic crystal, however, it is easy to see what will happen. The distortions 
inside the crystal will be geometrically similar to the distortions of the outside 
boundaries of the crystal. 

We can calculate the elastic constants for a cubic crystal in the following way. 
First, we assume some force law between each pair of atoms in the crystal. Then, we 
calculate the change in the internal energy of the crystal when it is distorted from 
its equilibrium shape. This gives us a relation between the energy and the strains 
which is quadratic in all the strains. Comparing the energy obtained this way with 
Eq. (39.13), we can identify the coefficient of each term with the elastic constants 

For our example we will assume a simple force law: that the force between 
neighboring atoms is a central force, by which we mean that it acts along the line 
between the two atoms. We would expect the forces in ionic crystals to be like 
this, since they are just primarily Coulomb forces. (The forces of covalent bonds 
are usually more complicated, since they can exert a sideways push on a nearby 
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atom; we will leave out this complication.) We are also going to include only the 
forces between each atom and its nearest and next-nearest neighbors. In other 
words, we will make an approximation which neglects all forces beyond the next- 
nearest neighbor. The forces we will include are shown for the xy-plane in Fig. 
39-1 0(a). The corresponding forces in the yz- and zx-planes also have to be 
included. 

Since we are only interested in the elastic coefficients which apply to small 
strains, and therefore only want the terms in the energy which vary quadratically 
with the strains, we can imagine that the force between each atom pair varies 
linearly with the displacements. We can then imagine that each pair of atoms is 
joined by a linear spring, as drawn in Fig. 39— 10(b). All of the springs between a 
sodium atom and a chlorine atom should have the same spring constant, say 
The springs between two sodiums and between two chlorines could have different 
constants, but we will make our discussion simpler by taking them equal; we call 
them k 2 . (We could come back later and make them different after we have seen 
how the calculations go.) 

Now we assume that the crystal is distorted by a homogeneous strain de- 
scribed by the strain tensor In general, it will have components involving 
x, y , and z; but we will consider now only a strain with the three components 
e xx , e xy > and e yy so that it will be easy to visualize. If we pick one atom as our 
origin, the displacement of every other atom is given by equations like Eq. (39.9): 

u x — e xx x + e xy y y 

u y = e xy x + e yy y. 1 ' 

Suppose we call the atom at x = y = 0 “atom 1” and number its neighbors in 

the xy-plane as shown in Fig. 39-11. Calling the lattice constant a , we get the x 
and y displacements u x and u y listed in Table 39-1. 

Now we can calculate the energy stored in the springs, which is k 2 /2 times 
the square of the extension for each spring. For example, the energy in the hori- 
zontal spring between atom 1 and atom 2 is 

. (39.43) 

Note that to first order, the y-displacement of atom 2 does not change the length of 
the spring between atom 1 and atom 2. To get the strain energy in a diagonal spring, 
such as that to atom 3, however, we need to calculate the change in length due to 
both the horizontal and vertical displacements. For small displacements from the 




Fig. 39-10. (a) The interatomic 

forces we are taking into account; (b) a 
model in which the atoms are connected 
by springs. 



f 






Fig. 39-1 1 . The displacements of the 
nearest and next-nearest neighbors of 
atom 1 (exaggerated). 
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Table 39-1 


Atom 

Location 

x,y 

u x 

Uy 

k 

1 

0, a 

0 

0 

— 

2 

a, 0 

e xx a 

e yx ci 

ki 

3 

a , a 

(e xx 4- e xy )a 

(pyx 4” 

k 2 

4 

0, a 

C X y(X 

Cyyd 

ki 

5 

— a, a 

(—e X x + e xy )a 

( e yx 4“ e yy)u 

k 2 

6 

—a, 0 

&XX& 

1 e yx CL 

k\ 

7 

— a, —a 

— ( e xx 4- e xy )a 

(pyx 4- &yy)u 

k 2 

8 

0, —a 

e xy a 

e yyU 

ki 

9 

a, —a 

(,e xx e xy )a 

(e yx e vv )a 

k 2 


original cube, we can write the change in the distance to atom 3 as the sum of the 
components of u x and u y in the diagonal direction, namely as 


— rz (u x + Uy). 

V2 


Using the values of u x and u y from the table, we get the energy 


k 2 / U x ~h Uy 

T V V2 


( e _j_ e 
^ Wzz T c yx 


4 ~ ?xy 


v) * 


(39.44) 


For the total energy for all the springs in the xy-plane, we need the sum of 
eight terms like (39.43) and (39.44). Calling this energy U 0 , we get 


U 0 


~ \krelx + ^ (e x 


+ €xy + £yy ) 2 


+ k\el v H — ^ (e xx — 

+ k\e xx H — y (e X x + 
+ k\eyy + -y (e xx — 


Qyx 


&yx 


€yx 


— e xy + 

+ ?xy + 

€ x y 4“ 


&yy) 


e vv) 



(39.45) 


To get the total energy of all the springs connected to atom 1, we must make one 
addition to the energy in Eq. (39.45). Even though we have only x- and y-com- 
ponents of the strain, there are still some energies associated with the next-nearest 
neighbors off the xy-plane. This additional energy is 

k 2 (e£ x a 2 + e$ y a 2 ). (39.46) 

The elastic constants are related to the energy density w by Eq. (39.13). The 
energy we have calculated is the energy associated with one atom, or rather, it is 
twice the energy per atom, since one-half of the energy of each spring should be 
assigned to each of the two atoms it joins. Since there are 1/a 3 atoms per unit 
volume, w and U 0 are related by 


To find the elastic constants C X3 ku we need only to expand out the squares in 
Eq. (39.45) — adding the terms of (39.46) — and compare the coefficients of eaeui 
with the corresponding coefficient in Eq. (39.13). For example, collecting the terms 
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in el x and in e$ y , we get the factor 

(Aa + 2 k 2 )a 2 , 
so 

n n _ A: i + 2/c 2 

^yyyy 


For the remaining terms, there is a slight complication. Since we cannot distin- 
guish the product of two terms like e xx e yy from e yy e xx , the coefficient of such terms 
in our energy is equal to the sum of two terms in Eq. (39.13). The coefficient of 
e xx e yy in Eq. (39.45) is lk 2 , so we have that 

( Cxxyy + Cyyxx) = ' 

But because of the symmetry in our crystal, C xxyy = C yyzX9 so we have that 

C = C — — • 

^xxyy ^ yyxx ~ 

By a similar process, we can also get 

C ~ C = — • 

^xyxy ^ yxyx 


Finally, you will notice that any term which involves either x or y only once is 
zero — as we concluded earlier from symmetry arguments. Summarizing our results : 


c — 

Cyyyy 

k\ T - 2k 2 

— s 

a 

c — 

^xyxy 

c 

'-'yxyx 

11 

C — 

^ xxyy 

Cyyxx 

Cxyyx Cyxxy 

C — 

^ xxxy 

c 

^ xyyy 

= etc. = 0. 


(39.47) 


We have been able to relate the bulk elastic constants to the atomic properties 
which appear in the constants k l and k 2 . In our particular case, C xyxy = C xxyy . 
It turns out — as you can perhaps see from the way the calculations went — that 
these terms are always equal for a cubic crystal, no matter how many force terms 
are taken into account, provided only that the forces act along the line joining 
each pair of atoms — that is, so long as the forces between atoms are like springs 
and don’t have a sideways part such as you might get from a cantilevered beam 
(and you do get in covalent bonds). 

We can check this conclusion with the experimental measurements of the 
elastic constants. In Table 39-2 we give the observed values of the three elastic 
coefficients for several cubic crystals.* You will notice that C xxyy and C xyxy are, 
in general, not equal. The reason is that in metals like sodium and potassium the 
interatomic forces are not along the line joining the atoms, as we assumed in our 
model. Diamond does not obey the law either, because the forces in diamond are 
covalent forces and have some directional properties — the bonds would prefer to 
be at the tetrahedral angle. The ionic crystals like lithium fluoride, sodium chloride, 
and so on, do have nearly all the physical properties assumed in our model, and 
the table shows that the constants C xxyy and C xyxy are almost equal. It is not clear 
why silver chloride should not satisfy the condition that C xxyy = C xyxyt 


Table 39-2* 

Elastic Moduli of Cubic Crystals 
in 10 12 dyneS'Cm 2 



c 

^xxxx 

Cxxyy 

c 

^ xyxy 

Na 

0.055 

0.042 

0.049 

K 

0.046 

0.037 

0 026 

Fe 

2.37 

1.41 

1.16 

Diamond 

10.76 

1.25 

5.76 

A! 

1 08 

0 62 

0.28 

LiF 

1.19 

0.54 

0.53 

NaCI 

0.486 

0.127 

0.128 

KC1 

0.40 

0.062 

0.062 

NaBr 

0.33 

0.13 

0.13 

KI 

0.27 

0.043 

0.042 

AgCl 

0.60 

0 36 

0.062 


* From C. Kittel, Introduction to Solid 
State Physics , John Wiley and Sons, Inc , 
New York, 2nd. ed , 1956, p. 93. 


* In the literature you will often find that a different notation is used. For instance, 
people usually write C xxxx = Cn, C xxyy = C 12 , and C xyxy = C 44 . 
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40 


The Flow of Dry Water 


40-1 Hydrostatics 

The subject of the flow of fluids, and particularly of water, fascinates every- 
body. We can all remember, as children, playing in the bathtub or in mud puddles 
with the strange stuff. As we get older, we watch streams, waterfalls, and whirl- 
pools, and we are fascinated by this substance which seems almost alive relative 
to solids. The behavior of fluids is in many ways very unexpected and interesting — 
it is the subject of this chapter and the next. The efforts of a child trying to dam a 
small stream flowing in the street and his surprise at the strange way the water 
works its way out has its analog in our attempts over the years to understand the 
flow of fluids. We have tried to dam the water up — in our understanding — by 
getting the laws and the equations that describe the flow. We will describe these 
attempts in this chapter. In the next chapter, we will describe the unique way in 
which water has broken through the dam and escaped our attempts to under- 
stand it. 

We suppose that the elementary properties of water are already known to 
you. The main property that distinguishes a fluid from a solid is that a fluid cannot 
maintain a shear stress for any length of time. If a shear is applied to a fluid, it 
will move under the shear. Thicker liquids like honey move less easily than fluids 
like air or water. The measure of the ease with which a fluid yields is its viscosity. 
In this chapter we will consider only situations in which the viscous effects can be 
ignored. The effects of viscosity will be taken up in the next chapter. 

We begin by considering hydrostatics , the theory of liquids at rest. When 
liquids are at rest, there are no shear forces (even for viscous liquids). The law 
of hydrostatics, therefore, is that the stresses are always normal to any surface 
inside the fluid. The normal force per unit area is called the pressure . From the 
fact that there is no shear in a static fluid it follows that the pressure stress is the 
same in all directions (Fig. 40-1). We will let you entertain yourself by proving 
that if there is no shear on any plane in a fluid, the pressure must be the same in 
any direction. 

The pressure in a fluid may vary from place to place. For example, in a static 
fluid at the earth’s surface the pressure will vary with height because of the weight 
of the fluid. If the density p of the fluid is considered constant, and if the pressure 
at some arbitrary zero level is called p 0 (Fig- 40-2), then the pressure at a height 
h above this point is p ~ p 0 “ Pgh, where g is the gravitational force per unit 
mass. The combination 

P + Pgh 

is, therefore, a constant in the static fluid. This relation is familiar to you, but we 
will now derive a more general result of which it is a special case. 

If we take a small cube of water, what is the net force on it from the pressure? 
Since the pressure at any place is the same in all directions, there can be a net 
force per unit volume only because the pressure varies from one point to another. 
Suppose that the pressure is varying in the x-direction — and we take the coordinate 
directions parallel to the cube edges. The pressure on the face at x gives the force 
p Ay A z (Fig. 40-3), and the pressure on the face at x + Ax gives the force 
— [p + {dp/dx) Ax] Ay Az, so that the resultant force is — (dp/dx) Ax Ay Az. If 
we take the remaining pairs of faces of the cube, we easily see that the pressure 
force per unit volume is — Vp. If there are other forces in addition — such as gravity 
— then the pressure must balance them to give equilibrium. 
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Fig. 40—1. In a static fluid the force 
per unit area across any surface is 
normal to the surface and is the same for 
all orientations of the surface. 


SURFACE 



Fig. 40-2. The pressure in a static 
liquid. 
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Fig. 40-3. The net pressure force on 
a cube is — Vp per unit volume. 


Let’s take a circumstance in which such an additional force can be described 
by a potential energy, as would be true in the case of gravitation; we will let <t> 
stand for the potential energy per unit mass. (For gravity, for instance, <j> is just gz.) 
The force per unit mass is given in terms of the potential by — V<t> 3 and if p is the 
density of the fluid, the force per unit volume is — p V<t> . For equilibrium this 
force per unit volume added to the pressure force per unit volume must give zero: 

-Vp-pV0 = 0. (40.1) 

Equation (40.1) is the equation of hydrostatics. In genera! , it has no solution. 
If the density varies in space in an arbitrary way, there is no way for the forces to 
be in balance, and the fluid cannot be in static equilibrium. Convection currents 
will start up. We can see this from the equation since the pressure term is a pure 
gradient, whereas for variable p the other term is not. Only when p is a constant 
is the potential term a pure gradient. Then the equation has a solution 

p + p<f> = const. 


Another possibility which allows hydrostatic equilibrium is for p to be a function 
only of p. However, we will leave the subject of hydrostatics because it is not 
nearly so interesting as the situation when fluids are in motion. 


40-2 The equations of motion 

First, we will discuss fluid motions in a purely abstract, theoretical way and 
then consider special examples. To describe the motion of a fluid, we must give 
its properties at every point. For example, at different places, the water (let us 
call the fluid “water”) is moving with different velocities. To specify the character 
of the flow, therefore, we must give the three components of velocity at every point 
and for any time. If we can find the equations that determine the velocity, then we 
would know how the liquid moves at all times. The velocity, however, is not the 
only property that the fluid has which varies from point to point. We have just 
discussed the variation of the pressure from point to point. And there are still 
other variables. There may also be a variation of density from point to point. 
In addition, the fluid may be a conductor and carry an electric current whose 
density j varies from point to point in magnitude and direction. There may be a 
temperature which varies from point to point, or a magnetic field , and so on. So 
the number of fields needed to describe the complete situation will depend on how 
complicated the problem is. There are interesting phenomena when currents and 
magnetism play a dominant part m determining the behavior of the fluid; the 
subject is called magnetohydrodynamics , and great attention is being paid to it at 
the present time. However, we are not going to consider these more complicated 
situations because there are already interesting phenomena at a lower level of 
complexity, and even the more elementary level will be complicated enough. 

We will take the situation where there is no magnetic field and no conductivity, 
and we will not worry about the temperature because we will suppose that the 
density and pressure determine in a unique manner the temperature at any point. 
As a matter of fact, we will reduce the complexity of our work by making the as- 
sumption that the density is a constant — we imagine that the fluid is essentially 
incompressible. Putting it another way, we are supposing that the variations of 
pressure are so small that the changes in density produced thereby are negligible. 
If that is not the case, we would encounter phenomena additional to the ones we 
will be discussing here — for example, the propagation of sound or of shock waves. 
We have already discussed the propagation of sound and shocks to some extent, 
so we will now isolate our consideration of hydrodynamics from these other 
phenomena by making the approximation that the density p is a constant. It is 
easy to determine when the approximation of constant p is a good one We can 
say that if the velocities of flow are much less than the speed of a sound wave in the 
fluid, we do not have to worry about variations in density. The escape that water 
makes in our attempts to understand it is not related to the approximation of 
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constant density. The complications that do permit the escape will be discussed 
in the next chapter. 

In the general theory of fluids one must begin with an equation of state for 
the fluid which connects the pressure to the density. In our approximation this 
equation of state is simply 

p = const. 

This then is the first relation for our variables. The next relation expresses the 
conservation of matter — if matter flows away from a point, there must be a decrease 
in the amount left behind. If the fluid velocity is v, then the mass which flows in a 
unit time across a unit area of surface is the component of pu normal to the sur- 
face. We have had a similar relation in electricity. We also know from electricity 
that the divergence of such a quantity gives the rate of decrease of the density per 
unit time. In the same way, the equation 

V ■ ( pv ) = - ^ (40.2) 

expresses the conservation of mass for a fluid; it is the hydrodynamic equation of 
continuity. In our approximation, which is the incompressible fluid approximation, 
p is a constant, and the equation of continuity is simply 

V • v = 0. (40.3) 

The fluid velocity v — like the magnetic field B — has zero divergence. (The hydro- 
dynamic equations are often closely analogous to the electrodynamic equations; 
that’s why we studied electrodynamics first. Some people argue the other way; 
they think that one should study hydrodynamics first so that it will be easier to 
understand electricity afterwards. But electrodynamics is really much easier than 
hydrodynamics.) 

We will get our next equation from Newton’s law which tells us how the 
velocity changes because of the forces. The mass of an element of volume of the 
fluid times its acceleration must be equal to the force on the element. Taking an 
element of unit volume, and writing the force per unit volume as f we have 

p X (acceleration) ~ f. 

We will write the force density as the sum of three terms. We have already con- 
sidered the pressure force per unit volume, — Vp. Then there are the “external” 
forces which act at a distance — like gravity or electricity. When they are con- 
servative forces with a potential per unit mass, 0, they give a force density — pV0. 
(If the external forces are not conservative, we would have to write / ext for the 
external force per unit volume.) Then there is another “internal” force per unit 
volume, which is due to the fact that in a flowing fluid there can also be a shearing 
stress. This is called the viscous force, which we will write / V1SC . Our equation of 
motion is 

p X (acceleration) ~ —Vp ~ p V0 + f v 1SC . (40.4) 

For this chapter we are going to suppose that the liquid is “thin” in the sense 
that the viscosity is unimportant, so we will omit / V1SC .. When we drop the viscosity 
term, we will be making an approximation which describes some ideal stuff rather 
than real water. John von Neumann was well aware of the tremendous difference 
between what happens when you don’t have the viscous terms and when you do, 
and he was also aware that, during most of the development of hydrodynamics 
until about 1900, almost the main interest was in solving beautiful mathematical 
problems with this approximation which had almost nothing to do with real fluids. 
He characterized the theorist who made such analyses as a man who studied 
“dry water ” Such analyses leave out an essential property of the fluid It is 
because we are leaving this property out of our calculations in this chapter that 
we have given it the title “The Flow of Dry Water.” We are postponing a dis- 
cussion of real water to the next chapter. 
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Fig. 40—4. The 
fluid particle. 


acceleration of a 



If we leave out / V1SC , we have m Eq. (40.4) everything we need except an ex- 
pression for the acceleration. You might think that the formula for the accelera- 
tion of a fluid particle would be very simple, for it seems obvious that if v is the 
velocity of a fluid particle at some place in the fluid, the acceleration would just 
be dv/dt. It is not — and for a rather subtle reason. The derivative dv/dt, is the 
rate at which the velocity v(x, y, z, t) changes at a fixed point in space. What we 
need is how fast the velocity changes for a particular piece of fluid. Imagine that 
we mark one of the drops of water with a colored speck so we can watch it. In 
a small interval of time At, this drop will move to a different location. If the drop 
is moving along some path as sketched in Fig. 40-4, it might in At move from 
PitoP 2 . In fact, it will move in the xr-direction by an amount v x At, in the y-direc- 
tion by the amount v y At, and in the z-direction by the amount v z At. We see 
that, if v(x, y, z, t) is the velocity of the fluid particle which is at (x, y, z) at the 
time t, then the velocity of the same particle at the time t + At is given by v(x: + 
Ax, y + Ay, z + Az, t + A/) — with 

Ax: = v x At, Ay — v y At, and Az = v z A/. 

From the definition of the partial derivatives — recall Eq. (2.7) — we have, to first 
order, that 


v(x + v x At, y + v y At, z + v z At, t + A/) 


= v(x, y, z, i) + 


dv bv bv 

bx VxA + by VyA + bz v ‘ 




The acceleration Av/At is 


dv . dv . dv . dv 

b Vy -r ^ V z — ' 

dx dy dz dt 


We can write this symbolically treating V as a vector — as 


(WP + f t . 


(40.5) 


Note that there can be an acceleration even though dv/dt = 0 so that velocity 
at a given point is not changing. As an example, water flowing in a circle at a 
constant speed is accelerating even though the velocity at a given point is not 
changing. The reason is, of course, that the velocity of a particular piece of water 
which is initially at one point on the circle has a different direction a moment 
later; there is a centripetal acceleration. 

The rest of our theory is just mathematical — finding solutions of the equation 
of motion we get by putting the acceleration (40.5) into Eq. (40.4). We get 

^ + {v ■ V)v = - 'SP _ v< p t (40.6) 

dt p 

where viscosity has been omitted. We can rearrange this equation by using the 
following identity from vector analysis: 

(v ■ V)v ~ (V X v) X v + |V(u • v). 
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If we now define a new vector field 12, as the curl of v , 

12 = V X v, (40.7) 

the vector identity can be written as 

(v • V)v = 12 X v + \ Vv 2 , 

'Tid our equation of motion (40.6) becomes 

~ + £i X v + \ Vv 2 = - ^P- - V<p. (40.8) 

at l p 

You can verify that Eqs. (40.6) and (40.8) are equivalent by checking that the 
components of the two sides of the equation are equal — and making use of (40.7). 

The vector field 12 is called the vorticity. If the vorticity is zero everywhere, we 
say that the flow is irroicuional. We have already defined in Section 3-5 a thing 
called the circulation of a vector field. The circulation around any closed loop in a 
fluid is the line integral of the fluid velocity, at a given instant of time, around that 
loop: 


(Circulation) = v * ds. 

The circulation per unit area for an infinitesimal loop is then — using Stokes’ 
theorem — equal to V X v. So the vorticity 12 is the circulation around a unit 
area (perpendicular to the direction of 12). It also follows that if you put a little 
piece of dirt — not an infinitesimal point — at any place in the liquid it will rotate 
with the angular velocity 12/2. Try to see if you can prove that. You can also 
check it out that for a bucket of water on a turntable, 12 is equal to twice the local 
angular velocity of the water. 

If we are interested only in the velocity field, we can eliminate the pressure 
from our equations. Taking the curl of both sides of Eq. (40.8), remembering that 
p is a constant and that the curl of any gradient is zero, and using Eq. (40.3), we get 

~ + V X (O X V) = 0. (40.9) 

at 

This equation, together with the equations 

12 = V X v (40.10) 

and 

V-v = 0, (40.11) 

describes completely the velocity field v. Mathematically speaking, if we know 12 
at some time, then we know the curl of the velocity vector, and we also know 
that its divergence is zero, so given the physical situation we have all we need to 
determine v everywhere. (It is just like the situation in magnetism where we had 
V • B = 0 and V X B — j/e 0 c 2 .) Thus, a given 12 determines v just as a given 
j determines B. Then, knowing v , Eq. (40.9) tells us the rate of change of 12 from 
which we can get the new 12 for the next instant. Using Eq. (40.10), again we find 
the new v y and so on. You see how these equations contain all the machinery for 
calculating the flow. Note, however, that this procedure gives the velocity field 
only; we have lost all information about the pressure. 

We point out one special consequence of our equation. If 12 = 0 everywhere 
at any time /, d!2 /dt also vanishes, so that 12 is still zero everywhere at t + At, 
We have a solution to the equation; the flow is permanently irrotational. If a 
flow was started with zero rotation, it would always have zero rotation. The 
equations to be solved then are 

V • 1/ = 0, V X v = 0. 

They are just like the equations for the electrostatic or magnetostatic fields in 
free space. We will come back to them and look at some special problems later. 
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40-3 Steady flow — Bernoulli’s theorem 



Now we want to return to the equation of motion, Eq. (40.8), but limit our- 
selves to situations in which the flow is “steady.” By steady flow we mean that 
at any one place in the fluid the velocity never changes. The fluid at any point is 
always replaced by new fluid moving in exactly the same way. The velocity picture 
always looks the same — v is a static vector field. In the same way that we drew 
“field lines” in magnetostatics, we can now draw lines which are always tangent 
to the fluid velocity as shown in Fig. 40-5. These lines are called streamlines. 
For steady flow, they are evidently the actual paths of fluid particles. (In unsteady 
flow the streamline pattern changes in time, and the streamline pattern at any 
instant does not represent the path of a fluid particle.) 

A steady flow does not mean that nothing is happening — atoms in the fluid 
are moving and changing their velocities. It only means that dv/dt = 0. Then 
if we take the dot product of v into the equation of motion, the term v * (£2 X v) 
drops out, and we are left with 

v • V t y 2 j = 0. (40.12) 


Fig. 40-5. Streamlines in steady This equation says that for a small displacement in the direction of the fluid velocity 
fluid flow. the quantity inside the brackets doesn't change. Now in steady flow all displace- 

ments are along streamlines, so Eq (40.12) tells us that for all the points along a 
streamline , we can write 

n 1 

— + ~ v 2 + $ = const (streamline). (40.13) 

P 2 

This is Bernoulli's theorem . The constant may in general be different for different 
streamlines; all we know is that the left-hand side of Eq. (40.13) is the same all 
along a given streamline. Incidentally, we may notice that for steady irrotational 
motion for which = 0, the equation of motion (40.8) gives us the relation 

v{£+‘„’ + *|_°, 

so that 

^ ^ y 2 + 0 = const (everywhere). (40.14) 


It’s just like Eq. (40.13) except that now the constant has the same value throughout 
the fluid. 



The theorem of Bernoulli is in fact nothing more than a statement of the con- 
servation of energy. A conservation theorem such as this gives us a lot of informa- 
tion about a flow without our actually having to solve the detailed equations. 
Bernoulli's theorem is so important and so simple that we would like to show you 
how it can be derived in a way that is different from the formal calculations we 
have just used. Imagine a bundle of adjacent streamlines which form a stream 
tube as sketched in Fig. 40-6. Since the walls of the tube consist of streamlines, 
no fluid flows out through the wall. Let's call the area at one end of the stream 
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tube A } , the fluid velocity there v x , the density of the fluid p i, and the potential 
energy (f> 1 . At the other end of the tube, we have the corresponding quantities 
A 2 , v 2 , p 2 , and <f> 2 . Now after a short interval of time At, the fluid a tA 1 has moved 
a distance v x At, and the fluid at A 2 has moved a distance v 2 At [Fig. 40-6(b)]. 
The conservation of mass requires that the mass which enters through A x must be 
equal to the mass which leaves through A 2 . These masses at these two ends must 
be the same : 

AM ~ pxAxViAt = p 2 A 2 v 2 At, 


So we have the equality 

plA\Vi = p2^2^2- (40.15) 

This equation telj^ us that the velocity varies inversely with the area of the stream 
tube if p is constant. 

Now we calculate the work done by the fluid pressure. The work done on the 
fluid entering at A 1 is P\Aiv x At, and the work given up at A 2 is p 2 A 2 v 2 At The 
net work on the fluid between Ax and A 2 is, therefore, 


PiApv i At — p 2 A 2 v 2 At, 

which must equal the increase in the energy of a mass AM of fluid in going from 
A i to A 2 . In other words, 


p\A x V\ At - p 2 A 2 v 2 At = AM(E 2 - Ei), (40.16) 


where E x is the energy per unit mass of fluid at A x , and E 2 is the energy per unit 
mass a t A 2 . The energy per unit mass of the fluid can be written as 

E = \v 2 + 4> + U, 

where is the kinetic energy per unit mass, is the potential energy per unit 
mass, and U is an additional term which represents the internal energy per unit mass 
of fluid. The internal energy might correspond, for example, to the thermal 
energy in a compressible fluid, or to chemical energy. All these quantities can 
vary from point to point. Using this form for the energies in (40.16), we have 


P\A\V\ At P'2^2 r 2 At 


AM 


AM 


— 2 *'2 “ 1 “ 4>2 + 


Uo 


2 

*’l 


— <j>i — U x 


But we have seen that AM ~ pAv At, so we get 

— + ^ + ^1 *T Ui = ^ v 2 + 02 + U 2 , (40.17) 

Pi 2 p 2 2 

which is the Bernoulli result with an additional term for the internal energy. If 
the fluid is incompressible, the internal energy term is the same on both sides, and 
we get again that Eq. (40.14) holds along any streamline. 

We consider now some simple examples m which the Bernoulli integral gives 
us a description of the flow. Suppose we have water flowing out of a hole near 
the bottom of a tank, as drawn in Fig. 40-7. We take a situation in which the 
flow speed /; {)Ut at the hole is much larger than the flow speed near the top of the 
tank; in other words, we imagine that the diameter of the tank is so large that 
we can neglect the drop in the liquid level. (We could make a more accurate 
calculation if we wished.) At the top of the tank the pressure is p 0 , the atmospheric 
pressure, and the pressure at the sides of the jet is also /? 0 . Now we write our 
Bernoulli equation for a streamline, such as the one shown in the figure. At the 
top of the tank, we take v equal to zero and we also take the gravity potential 
<f> to be zero. At the speed *> out , and <f> — — gh , so that 

Po = Po + ipv ? m t - pgh, 
or 

r <mt = V2gh. (40.18) 
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Fig. 40-7. Flow from a tank. Fig. 40-8. With a re-entrant dis- 

charge tube, the stream contracts to one- 
half the area of the opening. 



Fig. 40-9. The pressure is lowest 
where the velocity is highest. 


This velocity is just what we would get for something which falls the distance h. 
It is not too surprising, since the water at the exit gains kinetic energy at the ex- 
pense of the potential energy of the water at the top. Do not get the idea, however, 
that you can figure out the rate that the fluid flows out of the tank by multiplying 
this velocity by the area of the hole. The fluid velocities as the jet leaves the hole 
are not all parallel to each other but have components inward toward the center 
of the stream — the jet is converging. After the jet has gone a little way, the con- 
traction stops and the velocities do become parallel. So the total flow is the velocity 
times the area at that point. In fact, if we have a discharge opening which is just a 
round hole with a sharp edge, the jet contracts to 62 percent of the area of the hole. 
The reduced effective area of the discharge varies for different shapes of discharge 
tubes, and experimental contractions are available as tables of efflux coefficients . 

If the discharge tube is re-entrant, as shown in Fig. 40-8, it is possible to prove 
in a most beautiful way that the efflux coefficient is exactly 50 percent. We will 
give just a hint of how the proof goes. We have used the conservation of energy 
to get the velocity, Eq. (40.18), but there is also momentum conservation to 
consider. Since there is an outflow of momentum in the discharge jet, there must 
be a force applied over the cross section of the discharge tube Where does the 
force come from? The force must come from the pressure on the walls. As long 
as the efflux hole is small and away from the walls, the fluid velocity near the walls 
of the tank will be very small. Therefore, the pressure on every face is almost 
exactly the same as the static pressure in a fluid at rest — from Eq. (30.14). Then 
the static pressure at any point on the side of the tank must be matched by an 
equal pressure at the point on the opposite wall, except at the points on the wall 
opposite the charge tube. If we calculate the momentum poured out through the 
jet by this pressure, we can show that the efflux coefficient is 1/2. We cannot use 
this method for a discharge hole like that shown in Fy*. 40-7, however, because 
the velocity increase along the wall right near the discharge area gives a pressure 
fall which we are not able to calculate. 

Let’s look at another example — a horizontal pipe with changing cross 
section, as shown in Fig. 40-9, with water flowing in one end and out the 
other. The conservation of energy, namely Bernoulli’s formula, says that the pres- 
sure is lower in the constricted area where the velocity is higher. We can easily 
demonstrate this effect by measuring the pressure at different cross sections with 
small vertical columns of water attached to the flow tube through holes small 
enough so that they do not disturb the flow. The pressure is then measured by 
the height of water in these vertical columns. The pressure is found to be less at 
the constriction than it is on either side. If the area beyond the constriction comes 
back to the same value it had before the constriction, the pressure rises again. 
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Bernoulli’s formula would predict that the pressure downstream of the con- 
striction should be the same as it was upstream, but actually it is noticeably less. 
The reason that our prediction is wrong is that we have neglected the frictional, 
viscous forces which cause a pressure drop along the tube. Despite this pressure 
drop the pressure is definitely lower at the constriction (because of the increased 
speed) than it is on either side of it — as predicted by Bernoulli. The speed v 2 
must certainly exceed v 1 to get the same amount of water through the narrower 
tube. So the water accelerates in going from the wide to the narrow part. The 
force that gives this acceleration comes from the drop in pressure. 

We can check our results with another simple demonstration Suppose we 
have on a tank a discharge tube which throws a jet of water upward as shown in 
Fig 40-10. If the efflux velocity were exactly \/2gh, the discharge water should 
rise to a level even with the surface of the water in the tank. Experimentally, it 
falls somewhat short. Our prediction is roughly right, but again viscous friction 
which has not been included in our energy conservation formula has resulted in 
a loss of energy * 

Have you ever held two pieces of paper close together and tried to blow 
them apart? Try it! They come together. The reason, of course, is that the air has 
a higher speed going through the constricted space between ihe sheets than it 
does when it gets outside. The pressure between the sheets is lower than atmos- 
pheric pressure, so they come together rather than separating. 



Fig. 40-10. Proof that v is not equal 
to \/2gh. 


40-4 Circulation 

We saw at the beginning of the last section that if we have an incompressible 
fluid with no circulation, the flow satisfies the following two equations: 

V 1/ = 0, V X V = 0. (40.19) 


They are the same as the equations of electrostatics or magnetostatics in empty 
space. The divergence of the electric field is zero when there are no charges, and 
the curl of the electrostatic field is always zero. The curl of the magnetic field is 
zero if there are no currents, and the divergence of the magnetic field is always 
zero. Therefore, Eqs. (40.19) have the same solutions as the equations for E in 
electrostatics or for B in magnetostatics. As a matter of fact, we have already 
solved the problem of the flow of a fluid past a sphere, as an electrostatic analogy, 
in Section 12-5. The electrostatic analog is a uniform electric field plus a dipole 
field. The dipole field is so adjusted that the flow velocity normal to the surface 
of the sphere is zero. The same problem for the flow past a cylinder can be worked 
out in a similar way by using a suitable line dipole with a uniform flow field. This 
solution holds for a situation in which the fluid velocity at large distances is con- 
stant — both in magnitude and direction. The solution is sketched in Fig. 40-1 1(a). 

There is another solution for the flow around a cylinder when the conditions 
are such that the fluid at large distances moves m circles around the cylinder. The 
flow is, then, circular everywhere, as in Fig. 40-1 1(b). Such a flow has a circulation 
around the cylinder, although V X v is still zero in the fluid. How can there be 
circulation without a curl? We have a circulation around the cylinder because the 
line integral of v around any loop enclosing the cylinder is not zero. At the same 
time, the line integral of v around any closed path which does not include the cyl- 
inder is zero. We saw the same thing when we found the magnetic field around a 
wire. The curl of B was zero outside of the wire, although a line integral of B 
around a path which encloses the wire did not vanish. The velocity field in an ir- 
rotational circulation around a cylinder is precisely the same as the magnetic 
field around a wire. For a circular path with its center at the center of the cylinder, 
the line integral of the velocity is 





v * ds = 2i rrr. 


For irrotational flow the integral must be independent of r. Let’s call the constant 
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Fig. 40-1 1. (a) Ideal fluid flow past 

a cylinder. (b) Circulation around a 
cylinder. (c) The superposition of (a) 
and (b). 



value C, then we have that 



Fig. 40-1 2. Water with circulation 
draining from a tank. 


V = Irr ’ 

where v is the tangential velocity, and r is the distance from the axis. 

There is a nice demonstration of a fluid circulating around a hole. You take a 
transparent cylindrical tank with a drain hole in the center of the bottom. You fill 
it with water, stir up some circulation with a stick, and pull the drain plug. You 
get the pretty effect shown in Fig. 40-12. (You’ve seen a similar thing many 
times in the bathtub!) Although you put in some w at beginning, it soon dies down 
because of viscosity and the flow becomes irrotational — although still with some 
circulation around the hole. 

From the theory, we can calculate the shape of the inner surface of the water. 
As a particle of the water moves inward it picks up speed. From Eq. (40 20) the 
tangential velocity goes as l/r — it’s just from the conservation of angular mo- 
mentum, like the skater pulling in her arms. Also the radial velocity goes as 
l/r. Ignoring the tangential motion, we have water going radially inward toward 
a hole; from V * v — 0, it follows that the radial velocity is proportional to l/r. 
So the total velocity also increases as 1 /r, and the water goes in along Archimedean 
spirals. The air-water surface is all at atmospheric pressure, so it must have — from 
Eq. (40.14) — the property that 

gz + \mv 2 = const. 


But v is proportional to 1 /r, so the shape of the surface is 


O 


zo) = 


k 

7 ' 


An interesting point — which is not true in general but is true for incompressible, 
irrotational flow — is that if we have one solution and a second solution, then the 
sum is also a solution. This is true because the equations in (40,19) are linear. 
The complete equations of hydrodynamics, Eqs. (40.8), (40.9), and (40.10), are 
not linear, which makes a vast difference. For the irrotational flow about the 
cylinder, however, we can superpose the flow of Fig. 40-1 1(a) on the flow of 
Fig. 40-1 1(b) and get the new flow pattern shown in Fig. 40-1 1(c). This flow is 
of special interest. The flow velocity is higher on the upper side of the cylinder 
than on the lower side. The pressures are therefore lower on the upper side than 
on the lower side. So when we have a combination of a circulation around a 
cylinder and a net horizontal flow, there is a net vertical force on the cylinder — it 
is called a lift force. Of course, if there is no circulation, there is no net force on 
any body according to our theory of “dry” water. 


40-5 Vortex lines 

We have already written down the general equations for the flow of an in- 
compressible fluid when there may be vorticity. They are 

I. v • v = 0, 

II. 12 = V X v, 

m. ~ + v x (a x v) = o. 

ot 

The physical content of these equations has been described in words by Helmholtz 
in terms of three theorems. First, imagine that in the fluid we were to draw vortex 
lines rather than streamlines. By vortex lines we mean field lines that have the 
direction of 12 and have a density in any region proportional to the magnitude of 
12. From II the divergence of 12 is always zero (remember — Section 3-7 — that the 
divergence of a curl is always zero). So vortex lines are like lines of B — they never 
start or stop, and will tend to go in closed loops. Now Helmholtz described III 
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in words by the following statement: the vortex lines move with the fluid. This 
means that if you were to mark the fluid particles along some vortex lines — by 
coloring them with ink, for example — then as the fluid moves and carries those 
particles along, they will always mark the new positions of the vortex lines. In 
whatever way the atoms of the liquid move, the vortex lines move with them 
That is one way to describe the laws. 

It also suggests a method for solving any problems. Given the initial flow 
pattern — say v everywhere — then you can calculate £i. From the v you can also 
tell where the vortex lines are going to be a little later — they move with the speed 
v. With the new Q you can use 1 and II to find the new v. (That’s just like the 
problem of finding B, given the currents.) If we are given the flow pattern at one 
instant we can in principle calculate it for all subsequent times. We have the general 
solution for nonviscous flow. 

We would like to show how Helmholtz’s statement — and, therefore, III — can 
be at least partly understood. It is really just the law of conservation of angular 
momentum applied to the fluid. Suppose we imagine a small cylinder of the liquid 
whose axis is parallel to the vortex lines, as in Fig. 40-13(a). At some time later, 
this same piece of fluid will be somewhere else. Generally it will occupy a cylinder 
with a different diameter and be in a different place. It may also have a different 
orientation, say as in Fig. 40-1 3(b). If the diameter has changed, however, the 
length will have increased to keep the volume constant (since we are assuming an 
incompressible fluid). Also, since the vortex lines are stuck with the material, 
their density will go up as the cross-sectional area goes down. The product of the 
vorticity £2 and area A of the cylinder will remain constant, so according to 
Helmholtz, we should have 

0*2^2 = SMi- (40.21) 

Now notice that with zero viscosity all the forces on the surface of the cy- 
lindrical volume (or any volume, for that matter) are perpendicular to the surface 
The pressure forces can cause the volume to be moved from place to place, or 
can cause it to change shape; but with no tangential forces the magnitude of the 
angular momentum of the material inside cannot change. The angular momentum 
of the liquid in the little cylinder is its moment of inertia / times the angular 
velocity of the liquid, which is proportional to the vorticity £2. For a cylinder, the 
moment of inertia is proportional to mr 2 . So from the conservation of angular 
momentum, we would conclude that 



Fig. 40-13. (a) A group of vortex 

lines at t; (b) the same lines at a later 
time t f . 


(A/i7^i)£2i — (M 2 R%)& 2 . 


But the mass is the same, Mi ~ M 2 , and the areas are proportional to R 2 , so 
we get again just Eq. (40.21). Helmholtz’s statement — which is equivalent to 
III — is just a consequence of the fact that in the absence of viscosity the angular 
momentum of an element of the fluid cannot change. 



Fig. 40-14. Making a travelling vor- 
tex ring. 


There is a nice demonstration of a moving vortex which is made with the 
simple apparatus of Fig 40 - 14 . It is a “drum” two feet in diameter and two feet 
long made by stretching a thick rubber sheet over the open end of a cylindrical 
“box.” The “bottom” — the drum is tipped on its side — is solid except for a 3-inch 
diameter hole. If you give a sharp blow on the rubber diaphragm with your hand, 
a vortex ring is projected out of the hole. Although the vortex is invisible, you can 
tell it’s there because it will blow out a candle 10 to 20 feet away. By the delay in 
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Fig. 40-15. A moving vortex ring 
(a smoke ring), (a) The vortex lines, (b) A 
cross section of the ring. 


the effect, you can tell that “something” is travelling at a finite speed. You can 
see better what is going on if you first blow some smoke into the box. Then you 
see the vortex as a beautiful round “smoke ring.” 

The smoke ring is a torus-shaped bundle of vortex lines, as shown m Fig 
40-1 5(a). Since £2 = V X u, these vortex lines represent also a cn culalion of v 
as shown in part (b) of the figure. We can understand the forward motion of the 
ring in the following way: The circulating velocity around the bottom of the ring 
extends up to the top of the ring, having there a forward motion. Since the lines 
of il move with the fluid, they also move ahead with the velocity v. (Of course, 
the circulation of v around the top part of the ring is responsible for the forward 
motion of the vortex lines at the bottom. 

We must now mention a serious difficulty We have already noted that Eq. 
(40 9) says that, if £2 is initially zero, it will always be zero. This result is a great 
failure of the theory of “dry” water, because it means that once £2 is zero it is 
always zero — it is impossible to produce any vorticity under any circumstance. 
Yet, in our simple demonstration with the drum, we can generate a vortex ring 
starting with air which was initially at rest. (Certainly, v - 0, £2 = 0 everywhere 
in the box before we hit it.) Also, we all know that we can start some vorticity in a 
lake with a paddle. Clearly, we must go to a theory of “wet” water to get a complete 
understanding of the behavior of a fluid. 

Another feature of the dry water theory which is incorrect is the supposition 
we make regarding the flow at the boundary between it and the surface of a solid. 
When we discussed the flow past a cylinder — as in Fig. 40-11, for example — we 
permitted the fluid to slide along the surface of the solid. In our theory, the 
velocity at a solid surface could have any value depending on how it got started, 
and we did not consider any “friction” between the fluid and the solid. It is an 
experimental fact, however, that the velocity of a real fluid always goes to zero at 
the surface of a solid object. Therefore, our solution for the cylinder, with or 
without circulation, is wrong — us is our result regarding the generation of vorticity. 
We will tell you about the more correct theories in the next chapter. 
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41 


The Flow of Wet Water 


41-1 Viscosity 

In the last chapter we discussed the behavior of water, disregarding the 
phenomenon of viscosity. Now we would like to discuss the phenomena of the 
flow of fluids, including the effects of viscosity. We want to look at the real behavior 
of fluids. We will describe qualitatively the actual behavior of the fluids under 
various different circumstances so that you will get some feel for the subject. Al- 
though you will see some complicated equations and hear about some complicated 
things, it is not our purpose that you should learn all these things. This is, in a 
sense, a “cultural” chapter which will give you some idea of the way the world is. 
There is only one item which is worth learning, and that is the simple definition of 
viscosity which we will come to in a moment. The rest is only for your entertain- 
ment. 

In the last chapter we found that the laws of motion of a fluid are contained 
in the equation 

~ + (v V)v = - - V<f> + (41.1) 

dr p p 

In our “dry” water approximation we left out the last term, so we were neglecting 
all viscous effects. Also, we sometimes made an additional approximation by 
considering the fluid as incompressible; then we had the additional equation 

V ■ v = 0. 

This last approximation is often quite good — particularly when flow speeds are 
much slower than the speed of sound. But in real fluids it is almost never true that 
we can neglect the internal friction that we call viscosity; most of the interesting 
things that happen come from it in one way or another. For example, we saw that 
in “dry” water the circulation never changes — if there is none to start out with, 
there will never be any. Yet, circulation in fluids is an everyday occurrence. We 
must fix up our theory. 

We begin with an important experimental fact. When we worked out the 
flow of “dry” water around or past a cylinder — the so-called “potential flow” — we 
had no reason not to permit the water to have a velocity tangent to the surface; 
only the normal component had to be zero. We took no account of the possibility 
that there might be a shear force between the liquid and the solid. It turns out — 
although it is not at all self-evident — that in all circumstances where it has been 
experimentally checked, the velocity of a fluid is exactly zero at the surface of a 
solid. You have noticed, no doubt, that the blade of a fan will collect a thin layer of 
dust — and that it is still there after the fan has been churning up the air. You 
can see the same effect even on the great fan of a wind tunnel. Why isn't the dust 
blown off by the air? In spite of the fact that the fan blade is moving at high speed 
through the air, the speed of the air relative to the fan blade goes to zero right at 
the surface. So the very smallest dust particles are not disturbed.* We must 
modify the theory to agree with the experimental fact that m all ordinary fluids, 
the molecules next to a solid surface have zero velocity (relative to the surface).f 


* You can blow large dust particles from a table top, but not the very finest ones. The 
large ones stick up into the breeze. 

| You can imagine circumstances when it is not true: glass is theoretically a “liquid,” 
but it can certainly be made to slide along a steel surface. So our assertion must break 
down somewhere. 
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Fig. A 1-1. Viscous drag between two 
parallel plates. 



v = 0 



V 


We originally characterized a liquid by the fact that if you put a shearing 
stress on it — no matter how small — it would give way. It flows. In static situations, 
there are no shear stresses. But before equilibrium is reached — as long as you still 
push on it — there can be shear forces. Viscosity describes these shear forces which 
exist in a moving fluid. To get a measure of the shear forces during the motion 
of a fluid, we consider the following kind of experiment. Suppose that we have two 
solid plane surfaces with water between them, as in Fig. 41-1, and we keep one 
stationary while moving the other parallel to it at the slow speed r {) . If you measure 
the force required to keep the upper plate moving, you find that it is proportional 
to the area of the plates and to iu\fd. where cl is the distance between the plates. So 
the shear stress F/A is proportional to v^/d: 

F _ o t) 

A v ~d' 


Fig. 41—2. The shear stress in a 
viscous fluid. 


The constant of proportionality ^ is called the coefficient oj viscosity. 

If we have a more complicated situation, we can always consider a little, flat, 
rectangular cell in the water with its faces parallel to the flow, as in Fig. 41-2. The 
shear force across this cell is given by 


A F AVjr dr x 

A A 77 Ay 77 dy 


( 41 . 2 ) 


Now, di'r/dy is the rate of change of the shear strain we defined in Chapter 38, so 
for a liquid, the shear stress is proportional to the rate of change of the shear strain. 
In the general case we write 


$xy ~~ V 


d fjl _L dl ' x \ 
dx dy) 


(41.3) 



Fig. 41-3. The flow in a fluid be- 
tween two concentric cylinders rotating 
at different angular velocities. 


If there is a uniform rotation of the fluid, dvjdy is the negative of dv y /dx and S xu 
is zero — as it should be since there are no stresses in a uniformly rotating fluid. 
(We did a similar thing in defining e xy in Chapter 39.) There are, of course, the 
corresponding expressions for S yz and S zx . 

As an example of the application of these ideas, we consider the motion of a 
fluid between two coaxial cylinders. Let the inner one have the radius a and the 
peripheral velocity v a , and let the outer one have radius b and velocity See 
Fig. 41-3. We might ask, what is the velocity distribution between the cylinders? 
To answer this question, we begin by finding a formula for the viscous shear in 
the fluid at a distance r from the axis From the symmetry of the problem, we can 
assume that the flow is always tangential and that its magnitude depends only on 
r; v = v(r). If we watch a speck in the water at the radius r, its coordinates as a 
function of time are 

x ~ r cos cot, y = r sin cot, 

where cu — v/r. Then the x- and y-components of velocity are 

v x -- — raj sin co/ — —coy and v y = rco cos cot = cox. (41.4) 
From Eq. (41.3), we have 



d d . . 


dco 

dco 

V 

[di {X “ ] ~ dy (yw \ 

= V 

x dx 

~ y Byl 


( 41 . 5 ) 
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For a point at y = 0, do/dy = 0, and jv: do/dx is the same as r do/ dr. So at that 
point 

(41.6) 

(It is reasonable that S should depend on do/ dr: when there is no change in o 
with r, the liquid is in uniform rotation and there are no stresses.) 

The stress we have calculated is the tangential shear which is the same all 
around the cylinder We can get the torque acting across a cylindrical surface at 
the radius r by multiplying the shear stress by the moment arm r and the area 
2irrl. We get 

t = 2irr 2 l(S X y) y=0 = lirrflr 3 * (41.7) 


Since the motion of the water is steady — there is no angular acceleration — the 
net torque on the cylindrical shell of water between r and r + dr must be zero; 
that is, the torque at r must be balanced by an equal and opposite torque at r + dr , 
so that r must be independent of r. In other words, r 3 do/dr is equal to some con- 
stant, say A, and 


do _ A 
dr r' 6 


(41.8) 


Integrating, we find that o varies with r as 


o 



(41.9) 


The constants A and B are to be determined to fit the conditions that o ~o a 
at r = a, and o = o h at r = b . We get that 


A = 


2a % 2 
b 2 — a- 


(ob a) a ), 


7 2 2 

b Ob — a o n 
b 2 — a 2 


(41.10) 


So we know oj as a function of r , and from it v ~ or. 

If we want the torque, we can get it from Eqs. (41.7) and (41.8): 


or 


r = 2ttt}IA 


Air via 1 b z r . 

T = b 2 a > ~ “al 


(41.11) 


It is proportional to the relative angular velocities of the two cylinders. One stand- 
ard apparatus for measuring the coefficients of viscosity is built this way. One 
cylinder — say the outer one- — is on pivots but is held stationary by a spring balance 
which measures the torque on it, while the inner one is rotated at a constant angular 
velocity. The coefficient of viscosity is then determined from Eq. (41.11). 

From its definition, you see that the units of v are newton-sec/m 2 . For water 
at 20°C, 

7} = 10 3 newton-sec/m 2 . 


It is usually more convenient to use the specific viscosity , which is v divided by 
the density p. The values for water and air are then comparable: 


water at 20°C, 
air at 20°C, 


yj/p = 10 *’ m 2 /sec, 
i ]/p = 15 X I0~°m 2 /sec. 


(41.12) 


Viscosities usually depend strongly on temperature. For instance, for water just 
above the freezing point, tj/p I s 1-8 times larger than it is at 20°C. 
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41-2 Viscous flow 


We now go to a general theory of viscous flow — at least in the most general 
form known to man We already understand that the shear stress components are 
proportional to the spatial derivatives of the various velocity components such 
as dr r /dy or dv {j /d x. However, in the general case of a compressible fluid there is 
another term m the stress which depends on other derivatives of the velocity. 
The general expression is 

s "-'(£; + S) + , ' mv ^ ,1U3) 


where x t is any one of the rectangular coordinates v, y, or z, and is any one of 
the rectangular coordinates of the velocity. (The symbol b 13 is the Kronecker 
delta which is 1 when / = / and 0 for / ^ y.) The additional term adds fV - v 
to all the diagonal elements S n of the stress tensor. Jf the liquid is incompressible 
V • v = 0, and this extra term doesn’t appear. So it has to do with internal forces 
during compression. So two constants are required to describe the liquid, just 
as we had two constants to describe a homogeneous elastic solid. The coefficient 
7] is the “ordinary” coefficient of viscosity which we have already encountered, 
it is also called the first coefficient of viscosity or the “shear viscosity coefficient,” 
and the new coefficient f is called the second coefficient of viscosity . 

Now we want to determine the viscous force per unit volume,/, IM> , so we can 
put it into Eq (41 1) to get the equation of motion for a real fluid. The force on a 
small cubical volume element of a fluid is the resultant of the forces on all the six 
faces. Taking them two at a time, we will get differences that depend on the 
derivatives of the stresses, and, therefore, on the second derivatives of the velocity. 
This is nice because it will get us back to a vector equation. The component of 
the viscous force per unit volume in the direction of the rectangular coordinate 
x t is 

ff \ f ^ 

KJviscJi 3 V 

^1 ° Xf 


- i£ 


d 

dx } 



d 

dx z 


(Vv • V) 


(41 14) 


Usually, the variation of the viscosity coefficients with position is not significant 
and can be neglected. Then, the viscous force per unit volume contains only second 
derivatives of the velocity. We saw in Chapter 39 that the most general form of 
second derivatives that can occur in a vector equation is the sum of a term in the 
Laplacian (V * Vv = V 2 u), and a term m the gradient of the divergence (v(V ■ u)). 
Equation (41.14) is just such a sum with the coefficients rj and (rj + f). We get 

/v. sc = v V 2 V + ( v + f) V(V • v). (41.15) 

In the incompressible case, V * u = 0, and the viscous force per unit volume is 
just t? V 2 v. That is all that many people use; however, if you should want to cal- 
culate the absorption of sound in a fluid, you would need the second term. 

We can now complete our general equation of motion for a real fluid. Sub- 
stituting Eq. (41 15) into Eq. (41.1), we get 

P |~ + ( v ■ V)vj = - \p - p + V V 2 V + (r, + 1?') V(V • V ) 

It’s complicated. But that’s the way nature is. 

If we introduce the vorticity £2 = V X v, as we did before, we can write our 
equation as 

{dv 1 o 1 9 

pj— + OXV+2 = - V/; - p Y<t> + V 

+ (17 + f) v(v - v), (41.16) 
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We are supposing again that the only body forces acting are conservative forces 
like gravity. To see what the new term means, let’s look at the incompressible 
fluid case. Then, if we take the curl of Eq. (41.16), we get 

~ + v X (Q X v) = | v 2 n. (41.17) 

This is like Eq. (40.9) except for the new term on the right-hand side. When the 
right-hand side was zero, we had the Helmholtz theorem that the vorticity stays 
with the fluid. Now, we have the rather complicated nonzero term on the right- 
hand side which, however, has straightforward physical consequences. If we 
disregard for the moment the term V X (£2 X v ), we have a diffusion equation. 
The new term means that the vorticity £2 diffuses through the fluid. If there is a 
large gradient in the vorticity, it will spread out into the neighboring fluid. 

This is the term that causes the smoke ring to get thicker as it goes along. 
Also, it shows up nicely if you send a “clean” vortex (a “smokeless” ring made by 
the apparatus described in the last chapter) through a cloud of smoke. When it 
comes out of the cloud, it will have picked up some smoke, and you will see a 
hollow shell of a smoke ring. Some of the £2 diffuses outward into the smoke, 
while still maintaining its forward motion with the vortex. 


41-3 The Reynolds number 

We will now describe the changes which are made in the character of fluid 
flow as a consequence of the new viscosity term. We will look at two problems 
in some detail. The first of these is the flow of a fluid past a cylinder — a flow which 
we tried to calculate in the previous chapter using the theory for nonviscous flow. 
It turns out that the viscous equations can be solved by man today only for a few 
special cases. So some of what we will tell you is based on experimental measure- 
ments — assuming that the experimental model satisfies Eq. (41.17). 

The mathematical problem is this: We would like the solution for the flow of 
an incompressible, viscous fluid past a long cylinder of diameter D . The flow should 
be given by Eq. (41.17) and by 

12 = V X v (41.18) 

with the conditions that the velocity at large distances is some constant velocity, 
say F (parallel to the x-axis), and at the surface of the cylinder is zero. That is, 

v x — v y = v z = 0 (41.19) 

for 


That specifies completely the mathematical problem. 

If you look at the equations, you see that there are four different parameters 
to the problem: ??, p, D , and F. You might think that we would have to give a 
whole series of cases for different F’s, different D’ s, and so on. However, that is 
not the case. All the different possible solutions correspond to different values of 
one parameter . This is the most important general thing we can say about viscous 
flow. To see why this is so, notice first that the viscosity and density appear only 
in the ratio rj/p — the specific viscosity. That reduces the number of independent 
parameters to three. Now suppose we measure all distances in the only length 
that appears in the problem, the diameter D of the cylinder; that is, we substitute 
for x, y, z, the new variables x', y', z f with 

x = x'D , y = y'D, z — z' D. 

Then D disappears from (41.19). In the same way, if we measure all velocities in 
terms of F— that is, we set r = r'F— we get rid of the F, and v f is just equal to 1 
at large distances. Since we have fixed our units of length and velocity, our unit 
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of time is now D/V; so we should set 


t = /' 


D 

V 


(41.20) 


With our new variables, the derivatives in Eq. (41.18) get changed from d/dx 
to (1 /D) d/dx\ and so on; so Eq (41.18) becomes 

a=vxi> = -^v'xi/ = -^ a'. (41.21) 


Our main equation (41.17) then reads 


~df 


+ V' x (a' x v') = --?=■ v 2 n'. 

pv D 


All the constants condense into one factor which we write, following tradition, as 

1/<R: 

(R = ~ VD. (41.22) 

V 


If we just remember that all of our equations are to be written with all quantities 
in the new units, we can omit all the primes. Our equations for the flow are then 


and 

with the conditions 

for 

and 

for 


f +TX(OX.)=iv«0 


a = v x v 
v = o 

x 2 y 2 — 1/4 

V x = 1, Vy = V Z = 0 
X 2 + y 2 + z 2 » 1 . 


(41.23) 


(41.24) 


What this all means physically is very interesting It means, for example, that 
if we solve the problem of the flow for one velocity V\ and a certain cylinder 
diameter D t , and then ask about the flow for a different diameter Do and a different 
fluid, the flow will be the same for the velocity V 2 which gives the same Reynolds 
number — that is, when 


(Rj = F , Di = (R a = £2 Vo D 2 . (41.25) 

Vl V2 

For any two situations which have the same Reynolds number, the flows will 
“look” the same — in terms of the appropriate scaled x\ y', z\ and f. This is an 
important proposition because it means that we can determine what the behavior 
of the flow of air past an airplane wing will be without having to build an airplane 
and try it. We can, instead, make a model and make measurements using a velocity 
that gives the same Reynolds number. This is the principle which allows us to 
apply the results of “wind-tunnel” measurements on small-scale airplanes, or 
“model-basin” results on scale model boats, to the full-scale objects. Remember, 
however, that we can only do this provided the compressibility of the fluid can be 
neglected. Otherwise, a new quantity enters — the speed of sound. And different 
situations will really correspond to each other only if the ratio of V to the sound 
speed is also the same This latter ratio is called the Mach number So, for veloci- 
ties near the speed of sound or above, the flows are the same in two situations 
if both the Mach number and the Reynolds number are the same for both 
situations. 
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Fig. 41—4. The drag coefficient C/> of a circular cylinder as a function of the Reynolds number. 


41-4 Flow past a circular cylinder 

Let’s go back to the problem of low-speed (nearly incompressible) flow over 
the cylinder. We will give a qualitative description of the flow of a real fluid. 
There are many things we might want to know about such a flow — for instance, 
what is the drag force on the cylinder? The drag force on a cylinder is plotted in 
Fig. 41-4 as a function of (R — which is proportional to the air speed V if everything 
else is held fixed. What is actually plotted is the so-called drag coefficient Cn, 
which is a dimensionless number equal to the force divided by \ pV 2 Dl , where 
D is the diameter, / is the length of the cylinder, and p is the density of the liquid: 


Cd %pV 2 Dl' 


The coefficient of drag varies in a rather complicated way, giving us a pre-hint 
that something rather interesting and complicated is happening in the flow. We will 
now describe the nature of flow for the different ranges of the Reynolds number. 
First, when the Reynolds number is very small, the flow is quite steady; that is, 
the velocity is constant at any place, and the flow goes around the cylinder. The 
actual distribution of the flow lines is, however, not like it is in potential flow. 
They are solutions of a somewhat different equation. When the velocity is very 
low or, what is equivalent, when the viscosity is very high so the stuff is like honey, 
then the inertial terms are negligible and the flow is described by the equation 


V 2 Q = 0. 



This equation was first solved by Stokes. He also solved the same problem for a P, g 41 -5. Viscous flow (low veloci- 

sphere. If you have a small sphere moving under such conditions of low Reynolds ties) around a circular cylinder. 

number, the force needed to drag it is equal to 67rrjaV, where a is the radius of the 

sphere and V is its velocity. This is a very useful formula because it tells the speed 

at which tiny grains of dirt (or other particles which can be approximated as 

spheres) move through a fluid under a given force— as, for instance, in a centrifuge, 

or in sedimentation, or diffusion In the low Reynolds number region — for fit less 

than 1 — the lines of v around a cylinder are as drawn in Fig 41-5. 

If we now increase the fluid speed to get a Reynolds number somewhat greater 
than 1, we find that the flow is different. There is a circulation behind the sphere, 
as shown m Fig. 4l-6(b). It is still an open question as to whether there is always 
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a circulation there even at the smallest Reynolds number or whether things sud- 
denly change at a certain Reynolds number. It used to be thought that the cir- 
culation grew continuously. But it is now thought that it appears suddenly, and 
it is certain that the circulation increases with <ft. In any case, there is a different 
character to the flow for (ft in the region from about 10 to 30. There is a pair of 
vortices behind the cylinder. 

The flow changes again by the time we get to a number of 40 or so. There is 
suddenly a complete change in the character of the motion. What happens is that 
one of the vortices behind the cylinder gets so long that it breaks off and travels 
downstream with the fluid. Then the fluid curls around behind the cylinder and 
makes a new vortex. The vortices peel off alternately on each side, so an instan- 
taneous view of the flow looks roughly as sketched in Fig. 41-6(c). The stream of 
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Fig. 41-7. Photograph by Ludwig 
Prandtl of the “vortex street” in the flow 
behind a cylinder. 


vortices is called a “Karman vortex street.” They always appear for (R > 40. 
We show a photograph of such a flow in Fig. 41-7. 

The difference between the two flows in Fig. 41— 6(c) and 41— 6(b) or 41-6(a) 
is almost a complete difference m regime In Fig. 41 -6(a) or (b), the velocity is 
constant, whereas in Fig 41-6(c), the velocity at any point varies with time There 
is no steady solution above (R = 40 — which we have marked on Fig. 41-4 by a 
dashed line. For these higher Reynolds numbers, the flow vanes with time but in a 
regular , cyclic fashion. 

We can get a physical idea of how these vortices are produced We know 
that the fluid velocity must be zero at the surface of the cylinder and that it also 
increases rapidly away from that surface. Vorticity is created by this large local 
variation in fluid velocity. Now when the main stream velocity is low enough, there 
is sufficient time for this vorticity to diffuse out of the thin region near the solid 
surface where it is produced and to grow into a large region of vorticity. This 
physical picture should help to prepare us for the next change in the nature of the 
flow as the mam stream velocity, or (R, is increased still more. 

As the velocity gets higher and higher, there is less and less time for the 
vorticity to diffuse into a larger region of fluid. By the time we reach a Reynolds 
number of several hundred, the vorticity begins to fill in a thin band, as shown in 
Fig. 41-6(d). In this layer the flow is chaotic and irregular. The region is called 
the boundary layer and this irregular flow region works its wa> farther and farther 
upstream as (R is increased. In the turbulent region, the velocities are very irregular 
and “noisy”; also the flow is no longer two-dimensional but twists and turns in 
all three dimensions. There is still a regular alternating motion superimposed on 
the turbulent one. 

As the Reynolds number is increased further, the turbulent region works its 
way forward until it reaches the point where the flow lines leave the cylinder — for 
flows somewhat above 61 = 10\ The flow is as shown in Fig. 41-6(e), and we 
have what is called a “turbulent boundary layer.” Also, there is a drastic change 
in the drag force; it drops by a large factor, as shown in Fig. 41-4. In this speed 
region, the drag force actually decreases with increasing speed. There seems to 
be little evidence of periodicity. 

What happens for still larger Reynolds numbers? As we increase the speed 
further, the wake increases in size again and the drag increases. The latest experi- 
ments — which go up to 61 = 10 7 or so — indicate that a new periodicity appears 
in the wake, either because the whole wake is oscillating back and forth in a gross 
motion or because some new kind of vortex is occurring together with an irregular 
noisy motion. The details are as yet not entirely clear, and are still being studied 
experimentally. 

41-5 The limit of zero viscosity 

We would like to point out that none of the flows we have described are 
anything like the potential flow solution we found in the preceding chapter. This 
is, at first sight, quite surprising. After all, 61 is proportional to 1/tj. So tj going to 
zero is equivalent to (R going to infinity. And if we take the limit of large (R in 
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(C) (d) 


Fig. 41-8. Liquid flow patterns be- 
tween two transparent rotating cylinders. 


Eq. (41.23), we get rid of the right-hand side and get just the equations of the last 
chapter. Yet, you would find it hard to believe that the highly turbulent flow at 
(ft = 10 7 was approaching the smooth flow computed from the equations of “dry” 
water. How can it be that as we approach (ft = go, the flow described by Eq. 
(41.23) gives a completely different solution from the one we obtained taking 
rj ~ 0 to start out with? The answer is very interesting. Note that the right-hand 
term of Eq. (41.23) has 1 /(ft times a second derivative. It is a higher derivative than 
any other derivative in the equation. What happens is that although the coefficient 
l/(ft is small, there are very rapid variations of Cl in the space near the surface. 
These rapid variations compensate for the small coefficient, and the product 
does not go to zero with increasing (ft. The solutions do not approach the limiting 
case as the coefficient of V 2 Cl goes to zero. 

You may be wondering, “What is the fine-grain turbulence and how does it 
maintain itself? How can the vorticity which is made somewhere at the edge of 
the cylinder generate so much noise in the background?” The answer is again 
interesting. Vorticity has a tendency to amplify itself. If we forget for a moment 
about the diffusion of vorticity which causes a loss, the laws of flow say (as we have 
seen) that the vortex lines are carried along with the fluid, at the velocity v. We 
can imagine a certain number of lines of Cl which are being distorted and twisted 
by the complicated flow pattern of v. This pulls the lines closer together and mixes 
them all up. Lines that were simple before will get knotted and pulled close 
together. They will be longer and tighter together. The strength of the vorticity 
will increase and its irregularities — the pluses and minuses — will, in general, 
increase. So the magnitude of vorticity in three dimensions increases as we twist 
the fluid about. 

You might well ask, “When is the potential flow a satisfactory theory at all?” 
In the first place, it is satisfactory outside the turbulent region where the vorticity 
has not entered appreciably by diffusion. By making special streamlined bodies, 
we can keep the turbulent region as small as possible; the flow around airplane 
wings — which are carefully designed — is almost entirely true potential flow. 


41-6 Couette flow 

It is possible to demonstrate that the complex and shifting character of the 
flow past a cylinder is not special but that the great variety of flow possibilities 
occurs generally. We have worked out in Section 1 a solution for the viscous 
flow between two cylinders, and we can compare the results with what actually 
happens. If we take two concentric cylinders with an oil in the space between them 
and put a fine aluminum powder as a suspension in the oil, the flow is easy to see. 
Now if we turn the outer cylinder slowly, nothing unexpected happens; see Fig. 
41 -8(a). Alternatively, if we turn the inner cylinder slowly, nothing very striking 
occurs. However, if we turn the inner cylinder at a higher rate, we get a surprise. 
The fluid breaks into horizontal bands, as indicated in Fig. 41— 8(b). When the 
outer cylinder rotates at a similar rate with the inner one at rest, no such effect 
occurs. How can it be that there is a difference between rotating the inner or the 
out cylinder? After all, the flow pattern we derived in Section 1 depended only 
on o) b — co a . We can get the answer by looking at the cross sections shown in 
Fig. 41-9. When the inner layers of the fluid are moving more rapidly than the 
outer ones, they tend to move outward — the centrifugal force is larger than the 
pressure holding them in place. A whole layer cannot move out uniformly because 
the outer layers are in the way. It must break into cells and circulate, as shown in 
Fig. 41— 9(b). It is like the convection currents in a room which has hot air at the 
bottom. When the inner cylinder is at rest and the outer cylinder has a high velocity, 
the centrifugal forces build up a pressure gradient which keeps everything in 
equilibrium — see Fig. 41-9(c) (as in a room with hot air at the top). 

Now let’s speed up the inner cylinder. At first, the number of bands increases. 
Then suddenly you see the bands become wavy, as in Fig. 41-8(c), and the waves 
travel around the cylinder. The speed of these waves is easily measured. For high 
rotation speeds they approach 1/3 the speed of the inner cylinder. And no one 
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Fig. 41-9. Why the flow breaks up into bands. 


knows why! There’s a challenge. A simple number like 1/3, and no explanation 
In fact, the whole mechanism of the wave formation is not very well understood, 
yet it is steady laminar flow. 

If we now start rotating the outer cylinder also — but in the opposite direction — 
the flow pattern starts to break up. We get wavy regions alternating with apparently 
quiet regions, as sketched in Fig. 41-8(d), making a spiral pattern. In these “quiet” 
regions, however, we can see that the flow is really quite irregular; it is, in fact 
completely turbulent. The wavy regions also begin to show irregular turbulent 
flow If the cylinders are rotated still more rapidly, the whole flow becomes 
chaotically turbulent. 

In this simple experiment we see many interesting regimes of flow which are 
quite different, and yet which are all contained in our simple equation for various 
values of the one parameter (R. With our rotating cylinders, we can see many of 
the effects which occur in the flow past a cylinder: first, there is a steady flow, second, 
a flow sets in which varies in time but in a regular, smooth way; finally, the flow 
becomes completely irregular. You have all seen the same effects in the column 
of smoke rising from a cigarette in quiet air. There is a smooth steady column 
followed by a series of twistings as the stream of smoke begins to break up, ending 
finally in an irregular churning cloud of smoke 

The mam lesson to be learned from all of this is that a tremendous vanetv 
of behavior is hidden in the simple set of equations in (41.23). All the solutions 
are for the same equations, only with different values of <Tf We have no reason 
to think that there are any terms missing from these equations. The only difficulty 
is that we do not have the mathematical power today to analyze them except for 
very small Reynolds numbers — that is, in the completely viscous case. That we 
have written an equation does not remove from the flow of fluids its charm or 
mystery or its surprise. 

If such variety is possible in a simple equation with only one parameter, how 
much more is possible with more complex equations! Perhaps the fundamental 
equation that describes the swirling nebulae and the condensing, revolving, and 
exploding stars and galaxies is just a simple equation for the hydrodynamic 
behavior of nearly pure hydrogen gas. Often, people in some unjustified fear of 
physics say you can’t write an equation for life. Well, perhaps we can. As a matter 
of fact, we very possibly already have the equation to a sufficient approximation 
when we write the equation of quantum mechanics: 


We have just seen that the complexities of things can so easily and dramatically 
escape the simplicity of the equations which describe them. Unaware of the scope 
of simple equations, man has often concluded that nothing short of God, not mere 
equations, is required to explain the complexities of the world. 
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We have written the equations of water flow. From experiment, we find a set 
of concepts and approximations to use to discuss the solution — vortex streets, 
turbulent wakes, boundary layers. When we have similar equations in a less 
familiar situation, and one for which we cannot yet experiment, we try to solve 
the equations in a primitive, halting, and confused way to try to determine what 
new qualitative features may come out, or what new qualitative forms are a con- 
sequence of the equations. Our equations for the sun, for example, as a ball of 
hydrogen gas, describe a sun without sunspots, without the rice-grain structure of 
the surface, without prominences, without coronas. Yet, all of these are really 
m the equations; we just haven’t found the way to get them out. 

There are those who are going to be disappointed when no life is found on 
other planets. Not I — I want to be reminded and delighted and surprised once 
again, through interplanetary exploration, with the infinite variety and novelty of 
phenomena that can be generated from such simple principles. The test of science 
is its ability to predict. Had you never visited the earth, could you predict the 
thunderstorms, the volcanos, the ocean waves, the auroras, and the colorful sunset? 
A salutary lesson it will be when we learn of all that goes on on each of those 
dead planets — those eight or ten balls, each agglomerated from the same dust cloud 
and each obeying exactly the same laws of physics. 

The next great era of awakening of human intellect may well produce a method 
of understanding the qualitative content of equations. Today we cannot. Today 
we cannot see that the water flow equations contain such things as the barber pole 
structure of turbulence that one sees between rotating cylinders. Today we cannot 
see whether Schrodinger’s equation contains frogs, musical composers, or morality 
— or whether it does not. We cannot say whether something beyond it like God 
is needed, or not. And so we can all hold strong opinions either way. 
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Conservation, of angular momentum, 
1-4-7, 1-18-6 ff, 1-20-5 
of charge, 1-4-7, II— 1 3—1 f 
of energy, 1-3-2, 1-4-1 ff, II— 27— 1 f 
of linear momentum, 1-4-7, 

I— 10— 1 ff 
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Contraction hypothesis, 1-15-3 
Copernicus, 1-7- 1 
Coriolis force, 1-19-8 t 
Cornea, 1-35-1 
Cosmic rays, 1 1-9-2 
Couette flow, 11-41-10 ff 
Coulomb's law, 1-28-2, 11-4-2 ff, 
11-5-6 

Coupling, coefficient ol, 11-17-14 
C ovalent bond, 11-30-2 
Cross product, 11-2-8, 11-31-8 
Cross section for scattering, 1-32-7 
Crystal, 11-30-1 ff 

geometry of, 11-30-1 f 
Crystal diffraction, 1-38-4 f 
Crystal lattice, 1 1-30-3 f 
Cubic cell, II— 30-7 
Curie law, 1 1-1 1-5 
Curie temperature, 11-36-13 
Curie-Weiss law, 1 1— 1 1-9 
Curl operator, 1 1-2-8, 11-3-1 
C urrent, Amperian, 11-36-2 
atomic, II— 1 3-5 t 
eddy, 11—16—6 
electric, 11-13-1 f 
induced, 11-1 6- 1 ff 
Current density, 11-13-1 
Cutoff frequency, 11-22-14 

D'Alemberuan, I f— 25— 8 
Debye length, 1 1-7-9 
Dedekmd, R , 1-22-4 
Degrees of freedom, 1-25—2, 1-39-12 
Demagnetization, adiabatic, 11-35-9 f 
Density, 1-1-4 
Derivative, 1-8-5 ff 
partial, 1-14-9 
Diamagnetism, 11-34-1 ff 
Dicke, R H , 1-7-1 1 
Dielectric, II— 10— 1 ff, 11-11 — 1 ff 
Dielectric constant, 1 1- 10-1 f 
Differential calculus, 1-8-4, 1 1—2— 1 ff 
Diffraction, 1-30-1 ff 
by screen, 1-31-10 f 
Diffraction grating, 1-29-5, T-30-3 ff 
Diffusion, 1-43-1 ff 
ot neutrons, II-12-6 ff 
Dipole, 1 1-2 1-5 ff 
electric, 1 1-6-2 ff 
magnetic, II- 14-7 f 
Dipole moment, 1-12-6, II-6-7 
Dipole potential, 1 1-6-4 ff 
Dipole radiator, 1-28-5 f, 1-29-3 ff 
Dirac, P , 1-52-10, 1 1-2-1. 1 1-28-7 
Dirac equation, 1-20-6 
Dislocation, 1 1-30-8, 1 1-30-9 
Dispersion, 1-3 1-6 ff 
Distance, 1-5-5 ff 

Distance measurement, color brightness, 
1-5-6 

triangulation, 1-5-6 
Divergence, 1 1-25-7 
Divergence operator, 1 1-2-7, 1 1-3-1 
Domain, II— 37-6 
Doppler effect, 1-17-8, 1-23-9, 

1-34-7 f, 1-38-6 
Dot product, 1 1-2-4, II-25-3 
Double stars, 1-7-6 
Drag coefficient, 1 1-4 1-7 
“Dry’ 7 water, II— 40— I ff 
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Dynamics, 1-7-2 f, 1-9-1 ff 
relativistic, 1-1 5-9 f 

Eddy current, 1 1- 16-6 

Efficiency of ideal engine, 1-44-7 f 

Einstein, A., 1-2-6, 1-7-11, 1-12-12, 

1-15-1,1-16-1, 1-41-8, 1-42-8, 

I- 42-9 

Elastic collision, 1-10-7 
Elastic constants, II— 39— 6, 11-39—10 f 
Elastic energy, 1-4-2, 1-4—6 
Elastic materials, 11-39-1 ff 
Elastica, 11—38—12 
Elasticity, II-38-1 ff 
Elasticity tensor, II-39-4 ff 
Electret, 1 1- 1 1-8 

Electric charge density, II— 2-8, II— 4-3 
Electric current, II— 1 3—1 f 
in the atmosphere, 11-9-2 f 
Electric current density, 1 1 — 2 — 8 
Electric dipole, II— 6-2 ff 
Electric field, 1-2-4, 1-12-7 f, II-1-2, 

II- I-3, II— 6— I ff, 1 1-7-1 ff 
relativity of, II— 1 3—6 ff 

Electric flux, 1 1— 1 —4 
Electric potential, 1 1-4-4 
Electric susceptibility, II— 10— 4 
Electrical energy, 1—4-2, II- 15-3 ff 
Electrical forces, II— 1—1 ff, II— 13— 1 
Electrodynamics, II— 1-3 

relativistic notation, II— 25— 1 ff 
Electromagnet, 1 1—3 6—9 ff 
Electromagnetic energy, 1-29-2 
Electromagnetic field, 1-2-2, 1-2-5, 

1-10-9 

Electromagnetic mass, II— 28— 3 f 
Electromagnetic radiation, 1-26-1, 

I- 28-1 ff 

Electromagnetic waves, 1 1-2 1-1 f 
cosmic rays, 1-2-5 
gamma rays, 1-2-5 
infrared, 1-2-5, 1-23-8, I-26-l 
light, 1-2-5 

ultraviolet, 1-2-5, 1-26-1 
x-rays, 1-2-5, 1-26-1 
Electromagnetism, II— 1—1 ff 
laws of, II— 1—5 ff 
Electromotive force, II-16-2 
Electron, 1-2-4, 1-37-1, 1-37-4 ff 
charge on, I- 1 2-7 
radius of, classical, 1-32-4 
Electron cloud, 1-6—1 1 
Electron microscope, II — 29—3 f 
Electron-ray tube, 1-12-9 
Electron volt (unit), 1—34—4 
Electronic polarization, II— 1 1-1 ff 
Electrostatic energy, II— 8—1 ff 
of charges, 1 1-8-1 f 
of ionic crystal, 1 1—8—4 ff 
in nuclei, 1 1-8-6 ff 
of a point charge, II— 8—1 2 
Electrostatic equations, II- 10-6 f 
Electrostatic field, 11—5—1 ff, II— 7-1 f 
energy in, II— 8-9 ff 
of a grid, 1 1-7-10 f 
Electrostatic lens, II-29-2 f 
Electrostatic potential, equations of, 

I I - 6-1 

Electrostatics, 11-4—1 ff, 1 1—5 — 1 
Ellipse, 1-7-1 


Emissivity, 1 1-6-1 4 
Energy, 1 1-22-1 1 f 
chemical, 1-4—2 
of a condenser, II— 8-2 ff 
conservation of, 1-3-2, 1-4-1 ff, 

II — 27 — 1 f 

elastic, 1—4-2, 1-4-6 
electrical, 1-4-2, 1 1- 15-3 ff 
electromagnetic, 1-29-2 
electrostatic, II-8-1 ff 
in electrostatic field, 1 1—8—9 ff 
gravitational, 1-4—2 ff 
heat, 1-4-2, 1-4-6, 1-10-7, 1-10-8 
kinetic, 1-1-7, 1-4-2, 1-4-5 f, 

I — 39 — 4 

magnetic, 11-17-12 ff 
mass, 1—4—2, 1-4-7 
mechanical, II — 1 5—3 ff 
nuclear, 1—4—2 

potential, 1-4-4, 1-13-1 ff, 1-14-1 ff 
radiant, 1-4-2 
relativistic, I— 16— I ff 
Energy density, II— 27-2 
Energy flux, 1 1—27—2 
Energy levels, 1-38-7 f 
Energy theorem, 1-50-7 f 
Enthalpy, 1-45-5 
Entropy, 1—44—10 ff, 1-46—7 ff 
Eotvos, L., 1—7—1 1 
Equilibrium, 1—1—6 
Equipotential surfaces, 1 1—4— 1 1 f 
Equivalent circuits, 11—22—10 f 
Euclid, 1-5-6 

Euclidean geometry, 1-12-3 
Euler force, 11-38-1 1 
Evaporation, 1-1-5 f 

of a liquid, 1-40—3 f, 1-42-1 ff 
Exchange force, 1 1-37-2 
Excited state, 1 1-8-7 
Expansion, adiabatic, 1-44-5 
isothermal, 1-44—5 
Exponential atmosphere, 1-40-1 f 
Eye, compound, 1—36-6 ff 
human, 1-35-1 f, 1-36-3 ff 

Farad (unit), 1-25-7, 1 1-6- 13 

Faraday, M , 1 1— 1 0-1 

Faraday's law of induction, 1 1-17-2 

Fermat, P , 1-26-3 

Fermi (unit), 1—5—10 

Fermi, E , 1-5-10 

Ferrite, 11-37-12 

Ferroelectricity, II — 1 1—8 ff 

Ferromagnetic insulators, 11-37-12 

Ferromagnetism, 1 1-34-1 f, 1 1-36-1 ff, 

II- 37-1 ff 

Feynman, R., 1 1—28—8 

Fields, 1-2-2, 1-2-4, 1-2-5, 1-10-9, 

I- 12-7 ff, 1-13-8 f, 1-14-7 ff 
in a cavity, 1 1-5-8 f 

of a charged conductor, 1 1-6-8 
of a conductor, 1 1-5-7 f 
electric, 1-2-4, 1-12-7 f, II— 1-2, 

IT- 1-3, 1 1-6-1 ff, 1 1-7-1 ff 
electrostatic, 1 1-5-1 ff, 1 1-7-1 f 
magnetic, II- 1-2, II-1-3, II— 1 3—1 , 

II- 14— 1 ff 

magnetizing, 1 1-3 6-7 
scalar, 11-2-2 ff 
superposition of, 1-1 2-9 



two-dimensional, 1 1—7—2 ff 
vector, 11-1-4 f, 11-2-1 ff 
Field energy, 11-27-1 ff 

of a point charge, 11-28-1 f 
Field index, 11-29-5 
Field-ion microscope, II— 6— 1 4 
Field lines, II-4-1 1 
Field momentum, 1 1—27—9 ff 
of a moving charge, 11-28-2 f 
Field strength, II-1-4 
Filter, 11-22-14 ff 
Flow, fluid, II— 1 2-8 ff 
irrotational, 1 1-40-5 
viscous, 11-41-4 f 
Fluid flow, 11-12-8 ff 
Flux, 1 1-4-7 ff 
electric, II-1-4 
of a vector field, 1 1-3-2 ff 
Flux rule, II— 17— 1 ff 
Focal length, 1-27-1 ff 
Focus, 1-26-5 

Force, centrifugal, 1-7-5, 1-12-1 1 
components ol, 1-9-3 
conservative, 1-14-3 ff 
Coriolis, 1-19-8 f 

electrical, 1-2-3 ff, II-l-l ff, 1-13-1 
electromotive, 11-16-2 
gravitational, 1-2-3 
Lorentz, 1 1- 13-1, 11-15-14 
magnetic, II-1-2, 11-13-1 
molecular, 1-1-3, 1-12-6 f 
moment of, 1-18-5 
nonconservative, 1-14-6 f 
nuclear, 1-12-12 
pseudo, 1-12-10 ff 
Fourier, J , 1-50-2 f 
Fourier analysis, 1-50-2 ff 
Fourier theorem, 1 1-7-1 1 
Fourier transform, 1-25-4 
Four-vectors, 1-15-8 f, 1-17-5 ff, 
11-25-1 ff 
Fovea, 1-35-1 
Frank, I , 1-51-2 
Franklin, 1 1-5-6 
Frequency, angular, 1-21-3, 1-29-2 
of oscillation, 1-2-5 
plasma, II-7-6, II— 32— I 2 
Fresnel’s reflection formulas, 1-33-8 
Friction, 1-10-5, 1-12-3 ff 
coefficient of, 1-12-4 

Galileo, 1-5-1, 1-7-2, 1-9-1, 1-52-3 

Galilean relativity, 1-10-3 

Galilean transformation, 1-12-11 

Galvanometer, 11-1-8, IT- 16-1 

Garnet, 11-37-12 

Gauss (unit), 1-34-4 

Gauss, K , 11-16-2 

Gauss' law, 1 1-4-9 t, 11-5-1 ff 

Gauss" theorem, 1 1-3-5 

Gaussian surface. 1 1 — 1 0— 1 

Geiger, 1 1-5-3 

Gell-Mann, M , 1-2-9 

Generator, alternating-current, 

11-17-6 ff 

electric, 1 1- 16-1 ff, 1 1-22-5 ff 
van de Graaff, 1 1-5-9, 1 1-8-7 
Geometrical optics, 1-26-1, 1-27-1 f 
Gerlach, 11-35-3 

Gradient operator, 1 1—2—4, 11-3-1 


Gravitation, 1-2-3, 1-7-1 ff, 1-12-2 
Gravitational acceleration, 1-9-4 
Gravitational coefficient, 1-7-9 
Gravitational energy, 1-4-2 ff 
Gravitational field, 1-12-8 ff, 1-13-8 f 
Gravity, 1-13-3 ff 

acceleration of, 1-9-4 
Green’s function, 1-25-4 
Ground state, II-8-7 
Gyroscope, 1-20-5 ff 

Hamilton’s first principal function, 

II— 1 9-8 

Harmonic motion, 1-21-4, I-23-l ff 
Harmonic oscillator, 1-10-1, 1-21-1 ff 
forced, 1-21-5 f, 1-23-3 ff 
Harmonics, 1-50-1 ff 
Heat, 1-1-3, 1-13-3 
Heat conduction, II— 3— 6 ff 
Heat diffusion equation, II— 3— 8 
Heat energy, 1-4-2, 1-4-6, 1-10-7, 
1 - 10-8 

Heat engines, 1-44-1 ff 
Heat flow, II-2-8 f, II-12-2 ff 
Heisenberg, W„ 1-6-10, 1-37-1, 

I- 37-9, 1-37-11, 1-37-12, 1-38-9 
Helmholtz, H., 1-35-7, 11-40-10 
Henry (unit), 1-25-7 

Hess, II— 9-2 
Hexagonal cell, 1 1-3 0-7 
High-voltage breakdown, II— 6—1 3 f 
Hooke’s law, 1-12-6, 11-38-1 f 
Huygens, C., 1-15-2, 1-26-2 
Hydrodynamics, 11-40-2 ff 
Hydrostatics, 11-40-1 ff 
Hypocycloid, 1-34-3 
Hysteresis curve. 1 1—37—5 ff 
Hysteresis loop, 11-36-8 

Ideal gas law, 1-39-10 ff 
Illumination, 11-12-10 ff 
Image charge, 1 1-6-9 
Impedance, 1-25-8 f, 11-22- 1 ff 
complex, 1-23-7 
Incidence, angle of, 1-26-3 
Inclined plane, 1-4-4 
Index of refraction, 1-31-1 ff 
Induced currents, 1 1— 1 6— l ff 
Inductance, 1-23-6, 11-16-4 f, 

II- 17-12 ff, 1 1-22-2 f 
mutual, II— 1 7-9 ff, 11-22-16 
self-, 1 1— 1 6-4 7 II— 17— 1 1 f 

Induction, laws of, 11-17-1 ff 
Inductor, T-23-6 
Inertia, 1-2-3, 1-7-11 

moment of, 1-18-7, 1-19-5 ff 
principle of, 1-9-1 
Infeld. 1 1-28-7 

Infrared radiation, 1-23-8, 1-26-1 
Integral, 1-8-7 f 
Integral calculus, II— 3—1 ff 
Insulator, II-1-2, II— 10— 1 
Interference, 1-28-6, 1-29-1 ff 
Interfering waves, 1-37-4 
Interferometer, 1-15-5 
Internal reflection, 11-33-12 
Ion, 1-1-6 
Ionic bond, II-30-2 
Ionic conductivity, 1-43-6 f 
Ionic polarizability, 1 1— 1 1 —8 


Ionization energy, 1-42-5 
Ionosphere, 1 1—7—5, II— 9— 3 
Irrotational flow, 1 1-40-5 
Isotherm, II— 2— 3 
Isothermal atmosphere, 1-40-2 
Isothermal compression, 1-44-5 
Isothermal expansion, 1-44-5 
Isothermal surfaces, II— 2— 3 
Isotopes, 1-3-4 ff 

Jeans, J , 1-40-9, 1-4 1 -6 f, II-2-6 
Johnson noise, 1-41-2, 1-41-8 
Joule (unit), 1-13-3 
Joule heating, 1-24-2 

Karman vortex street, II— 4 1 —9 
Kepler, J, 1-7-1 

Kepler’s laws, 1-7-1 f, 1-9-1, 1-18-6 
Kerr cell, 1-33-5 
Kilocalorie (unit), 1 1—8—5 
Kinetic energy, 1-1-7, 1-4-2, 1-4-5 f, 
1-39-4 

rotational, 1-19-7 ff 
Kinetic theory, 1-42-1 ff 
of gases, 1-39-1 ff 
Kirchhoff’s laws, 1-25-9, II— 22— 7 ff 
Kronecker delta, 1 1-3 1-6 

Lamb, 1 1-5-6 

Lame elastic constants, 1 1—39—6 
Lande g-factor, 1 1-34-4 
Laplace, P , 1-47-7 
Laplace equation, 1 1-6-1, 1 1—7— 1 
Laplacian operator, 1 1-2-10 
Larmor frequency, 1 1-34-7 
Larmor’s theorem, 1 1-34-6 f 
Laser, 1-32-6, 1-42-10 
Laughton, 11—5—6 

Laws, of electromagnetism, 11-1-5 ff 
of induction, 1 1— 1 7—1 ff 
Least action, principle of, 1 1-19-1 ff 
Least time, principle of, 1-26-3 ff, 
1-26-8 

Leibnitz, G. W , 1-8-4 
Lens formula, 1-27-6 
Lenz's rule, 1 1- 16-4, 1 1—34—2 
Leverrier, U , 1-7-5 
Lienard-Wiechert potentials, 11-21-11 
Light, II-21-1 f 

momentum of. 1-34-10 f 
polarized, 1-32-9 
scattering of, 1-32-5 ff 
speed of, 1-15-1, 1 1- 1 8-8 f 
Light waves, 1-48-1 
Lightning, 1 1—9— 1 0 f 
Line of charge, 1 1-5-3 f 
Line integral, 11-3-1 
Linear momentum, conservation of, 
1-4-7, I— 10— 1 fT 
Linear systems, 1-25-1 ff 
Lodestone, II— 1 — 10 
Logarithms, 1-22-4 
Lorentz, H A , 1-15-3 
Lorentz condition, 1 1—25—9 
Lorentz contraction, 1-15-7 
Lorentz force, 1 1- 13-1, 11-15-14 
Lorentz formula, 11-21-12 f 
Lorentz gauge, 1 1- 1 8-1 1 
Lorentz transformation, 1-15-3, 

1-17-1, 1-34-8,-52-2, II— 25-1 
of fields, 1 1-26-1 ff 
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McCullough, II-I-9 
Mach number, 1 1-4 1-6 
Magnetic dipole, 1 1- 1 4-7 t 
Magnetic dipole moment, 11-14-8 
Magnetic energy, 11-17-12 ff 
Magnetic field 1-12-9 1, II-1-2, 

II- 1-3, II-13-I, II-14-I ff 
relativity of, II— 13-6 ff 
of steady currents, 1 1- 1 3-3 f 
Magnetic force, II-1-2, II— 13— 1 
on a current, 11-13-2 f 
Magnetic induction, I-l 2-10 
Magnetic lens, 1 1-29-3 
Magnetic materials, II-37-1 ff 
Magnetic moments, 1 1-34-3 t 
Magnetic resonance, 1 1 —35— 1 ff 
Magnetic susceptibility, 1 1 —35—7 
Magnetism, 1-2-4, 11-34-1 ff 
Magnetization currents, 1 1-36-1 ff 
Magnetizing fields 1 1-36-7 
Magnetostatics, 11-4-1, 1 1- 13-1 ff 
Magnetostriction, 1 1-37-6 
Magnification, 1-27-5 
Marsden, 11-5-3 
Maser, 1-42-10 
Mass, 1-9-1, 1-15-1 

center of, 1-18-1 f, 1-19-1 f] 
electromagnetic, 1 1-28-3 f 
relativistic, 1-16-6 ff 
Mass energy, 1-4—2, 1-4-7 
Mass-energy equivalence, 1-15-10 f 
Maxwell, J C , 1-6-1, 1-6-9, 1-28-1, 

I- 40-8, 1-41-7, 1-46-5, II— 1—8, 

II- l-l 1, 1 1-5-6, II-I8-1 ff 
Maxwell’s equations, I— I 5—2, 1-25-3, 

I- 47-7, 1 1-2-1 , 1 1-2-8, 1 1-4-1 , 

I I- 6-1 , II — 1 8-1 ff, 1 1-32-3 ff 
currents and charges, II— 21-1 ff 
free space, 1 1-20- 1 ff 

Mayer, J R , 1-3-2 

Mean free path. 1-43-3 f 

Mean square distance, 1-6-5, 1-4 1 -9 

Mechanical energy, 11-15— 3 ff 

Mendeleev, 1-2-9 

Metastable atom, 1—42-10 

Meter (unit), 1-5-10 

Mev (unit), 1-2-9 

Michelson-Morley experiment, 

1-15-3 ff 

Miller, W C , 1-35-2 
Minkowski, 1-17-8 
Minkowski space, IF-31-12 
Modes, 1-49-1 ff 
Mossbauer, R , 1-23-9 
Mole (unit), 1—39— 1 0 
Molecular attraction, I- 1 -3, I- 1 2-6 f 
Molecular crystal, 1 1-30-2 
Molecular diffusion, 1-43-7 ff 
Molecular dipole, 1 1— 1 1-1 
Molecular motion, 1-41-1 
Molecule, 1-1-3 
Moment, dipole, 1-12-6 
of force, 1-18-5 
of inertia, 1-18-7, 1-19-5 ff 
Momentum, 1-9-1 t, 1-38-2 ff 

angular, 1-7-7, 1-18-5 ff, 1-20-1, 

1-20-5 

of light, 1-34-10 f 
linear, 1-4-7, 1-10-1 ff 
relativistic, 1-10-8 f, 1-16-1 ff 

Indlx 4 


Momentum spectrometer, 11-29-1 
Momentum spectrum, 11-29-2 
Monatomic gas, 1-39-5 
Monoclimc cell, 1 L— 30— 7 
Motion, 1-5-1, 1-8- 1 ff 
of charge, 11-29-1 ff 
circular, 1-21-4 
constrained, 1-14-3 
harmonic, 1-21-4, 1-23-1 ff 
parabolic, 1-8-10 

planetary, 1-7- L ff, 1-9-6 f, 1-13-5 
Motors, electric, II-16-1 ff 
Moving charge, field momentum of, 

1 1-28-2 f 
Music, [-50—1 

Mutual inductance, II— 17-9 ff, 

11 - 22- 1 6 


Nernst heat theorem, 1-44-1 l 
Neuman, J von, IT— 12-9 
Neutrons, 1-2-4 

diffusion of, II— 12-6 ff 
Neutron diffusion equation, 1 1 — 1 2—7 
Newton, I, 1-8-4, 1-15-1, 1-37-1, 

II— 4-1 0 

Newton*meter (unit), 1-13-3 
Newton's laws, 1-2-6, 1-7-3 ff, 1-7-11, 
1-9-1 ff, I-10-L ff, 1-1 1-7 f, 

I- 12-1, 1-39-2, 1-41-1, 1-46-1, 

I I- 7-5 

Nishijima, 1-2-9 
Nodes, 1-49-2 
Noise, 1-50-1 

Nonpolar molecule, II-l 1-1 
Nuclear cross section, 1-5-9 
Nuclear energy, 1-4-2 
Nuclear forces, 1-12-12 
Nuclear g-factor, 1 1-34-4 
Nuclear interactions, II— 8— 7 
Nuclear magnetic resonance, 

11-35-10 ff 

Nucleus, 1-2-4, 1-2-8 ff 
Numerical analysis, T-9-6 
Nutation, 1—20—7 
Nye, J F , II-30-9 


Oersted (unit), 11-36— 6 
Ohm (unit), 1-25-7 
Ohm’s law, 1-25-7, 1-43-7 
Operator, curl, II-2-8, II— 3—1 
divergence, II-2-7, II— 3—1 
gradient, II-2-4, II— 3-1 
Laplacian, 1 1—2—1 0 
vector, 1 1-2-6 
Optic axis, 1-33-3 
Optic nerve, 1-35-2 
Optics, 1-26-1 ff 

geometrical, 1—26— 1 , 1-27-1 ff 
Orbital motion, II— 34— 3 
Orientation polarization, II- 1 1-3 ff 
Oriented magnetic moment, 11-35-4 
Orthorhombic cell, II— 30— 7 
Oscillation, amplitude, of, 1—21-3 
damped, 1-24-3 f 
frequency of, 1-2-5 
period of, 1-2 1 -3 
periodic, 1-9-4 
phase of, 1—2 L —3 


Oscillator, 1-5-2 

harmonic, 1-10-1, 1-21-1, 1—2 1 -5 f, 

I- 23-3 ff 

Pappus, theorem ol 1-19-4 
Parabolic antenna, 1-30-6 f 
Parabolic motion, 1-8-10 
Parallel-axis theorem, 1-19-6 
Parallel-plate capacitor, 1-14-9, 

I I- 6- 1 1 ff, 1 1-8-3 

Paramagnetism, 1 1—34— 1 ff, 11-35-1 ff 
Paraxial rays, 1-27-2 
Partial derivative, I- 1 4-9 
Particles, ‘'strange", 1 1-8-7 
Permalloy, 11-37- 1 1 
Permeability, 11-36—9 
Pascal’s triangle, 1-6-4 
Passive circuit element, 1 1-22-5 
Pendulum, 1-49-6 f 
Pendulum clock, 1—5—2 
Period of oscillation, 1-21-3 
Periodic time, 1-5-1 f 
Perpetual motion, 1-46-2 
Phase of oscillation, 1-21-3 
Phase shift, 1-21-3 
Phase velocity, 1-48-6 
Photon, 1-2-7, 1-26-1, 1-37-8 
Physiochemistry of color vision, 
1-35-9 f 
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Feynman's Preface 


These are the lectures in physics that I gave last year and the year before to the 
freshman and sophomore classes at Caltech. The lectures are, of course, not 
verbatim they have been edited, sometimes extensively and sometimes less so. 
The lectures form only part of the complete course. The whole group of 180 
students gathered in a big lecture room twice a week to hear these lectures &nd 
then they broke up into small groups of 15 to 20 students in recitation sections 
under the guidance of a teaching assistant. In addition, there was a laboratory 
session once a week. 

The special problem we tried to get at with these lectures was to maintain the 
interest of the very enthusiastic and rather smart students coming out of the high 
schools and into Caltech. They have heard a lot about how interesting and excit- 
ing physics is — the theory of relativity, quantum mechanics, and other modern 
ideas. By the end ot two years of our previous course, many would be very dis- 
couraged because there were really very few grand, new, modern ideas presented 
to them. They were made to study inclined planes, electrostatics, and so forth, 
and after two years it was quite stultifying. The problem was whether or not we 
could make a course which would save the more advanced and excited student by 
maintaining his enthusiasm. 

The lectures here are not in any way meant to be a survey course, but are very 
serious. I thought to address them to the most intelligent in the class and to make 
sure, if possible, that even the most intelligent student was unable to completely 
encompass everything that was in the lectures— by putting in suggestions of appli- 
cations of the ideas and concepts in various directions outside the main line of 
attack. For this reason, though, 1 tried very hard to make all the statements as 
accurate as possible, to point out in every case where the equations and ideas fitted 
into the body of physics, and how — when they learned more — things would be 
modified. I also felt that for such students it is important to indicate what it is 
that they should— if they are sufficiently clever— be able to understand by deduc- 
tion from what has been said before, and what is being put in as something new. 
When new ideas came in, I would try either to deduce them if they were deducible, 
or to explain that it was a new idea which hadn’t any basis in terms of things they 
had already learned and which was not supposed to be provable— but was just 
added in. 

At the start of these lectures, I assumed that the students knew something when 
they came out of high school — such things as geometrical optics, simple chemistry 
ideas, and so on. I also didn’t see that there was any reason to make the lectures 
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Foreword 


A great triumph of twentieth-century physics, the theory of quantum mechanics, 
is now nearly 40 years old, yet we have generally been giving our students their 
introductory course in physics (for many students, their last) with hardly more 
than a casual allusion to this central part of our knowledge of the physical world. 
We should do better by them. These lectures are an attempt to present them with 
the basic and essential ideas of the quantum mechanics in a way that would, 
hopefully, be comprehensible. The approach you will find here is novel, particu- 
larly at the level of a sophomore course, and was considered very much an experi- 
ment. After seeing how easily some of the students take to it, however, I believe 
that the experiment was a success. There is, of course, room for improvement, 
and it will come with more experience in the classroom. What you will find here 
is a record of that first experiment. 

In the two-year sequence of the Feynman Lectures on Physics which were given 
from September 1961 through May 1963 for the introductory physics course at 
Caltech, the concepts of quantum physics were brought in whenever they were 
necessary for an understanding of the phenomena being described. In addition, 
the last twelve lectures of the second year were given over to a more coherent 
introduction to some of the concepts of quantum mechanics. It became clear as 
the lectures drew to a close, however, that not enough time had been left for the 
quantum mechanics. As the material was prepared, it was continually discovered 
that other important and interesting topics could be treated with the elementary 
tools that had been developed. There was also a fear that the too brief treatment 
of the Schrodinger wave function which had been included in the twelfth lecture 
would not provide a sufficient bridge to the more conventional treatments of many 
books the students might hope to read. It was therefore decided to extend the 
series with seven additional lectures; they were given to the sophomore class in 
May of 1964. These lectures rounded out and extended somewhat the material 
developed in the earlier lectures. 

In this volume we have put together the lectures from both years with some 
adjustment of the sequence. In addition, two lectures originally given to the fresh- 
man class as an introduction to quantum physics have been lifted bodily from 
Volume I (where they were Chapters 37 and 38) and placed as the first two chapters 
here to make this volume a self-contained unit, relatively independent of the 
first two. A few ideas about the quantization of angular momentum (including a 
discussion of the Stern-Gerlach experiment) had been introduced in Chapters 34 
and 35 of Volume II, and familiarity with them is assumed; for the convenience 
of those who will not have that volume at hand, those two chapters are reproduced 
here as an Appendix. 

This set of lectures tries to elucidate from the beginning those features of the 
quantum mechanics which are most basic and most general. The first lectures 
tackle head on the ideas of a probability amplitude, the interference of amplitudes, 
the abstract notion of a state, and the superposition and resolution of states — and 
the Dirac notation is used from the start. In each instance the ideas are introduced 
together with a detailed discussion of some specific examples — to try to make the 
physical ideas as real as possible. The time dependence of states including states 
of definite energy comes next, and the ideas are applied at once to the study of 
two-state systems. A detailed discussion of the ammonia maser provides the frame- 
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work for the introduction to radiation absorption and induced transitions. The 
lectures then go on to consider more complex systems, leading to a discussion ol 
the propagation of electrons in a crystal, and to a rather complete treatment of the 
quantum mechanics of angular momentum. Our introduction to quantum me- 
chanics ends in Chapter 20 with a discussion of the Schrodinger wave function, 
its differential equation, and the solution for the hydrogen atom. 

The last chapter of this volume is not intended to be a part of the “course. 

It is a “seminar” on superconductivity and was given in the spirit of some of the 
entertainment lectures of the first two volumes, with the intent of opening to the 
students a broader view of the relation of what they were learning to the general 
culture of physics. Feynman’s “epilogue” serves as the period to the three- 
volume series. 

As explained in the Foreword to Volume I, these lectures were but one aspect 
of a program for the development of a new introductory course carried out at the 
California Institute of Technology under the supervision of the Physics Course 
Revision Committee (Robert Leighton, Victor Neher, and Matthew Sands). The 
program was made possible by a grant from the Ford Foundation. Many people 
helped with the technical details of the preparation of this volume: Marylou 
Clayton, Julie Curcio, James Hartle, Tom Harvey, Martin Israel, Patricia Preuss, 
Fanny Warren, and Barbara Zimmerman. Professors Gerry Neugebauer and 
Charles Wilts contributed greatly to the accuracy and clarity of the material by 
reviewing carefully much of the manuscript. 

But the story of quantum mechanics you will' find here is Richard Feynman’s. 
Our labors will have been well spent if we have been able to bring to others even 
some of the intellectual excitement we experienced as we saw the ideas unfold in 
his real-life Lectures on Physics. 

December , 1964 Matthew Sands 
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Quantum Behavior 


1-1 Atomic mechanics 


“Quantum mechanics” is the description of the behavior of matter and light 
in all its details and, in particular, of the happenings on an atomic scale. Things 
on a very small scale behave like nothing that you have any direct experience 
about. They do not behave like waves, they do not behave like particles, they do 
not behave like clouds, or billiard balls, or weights on springs, or like anything 
that you have ever seen. 

Newton thought that light was made up of particles, but then it was discovered 
that it behaves like a wave. Later, however (in the beginning of the twentieth 
century), it was found that light did indeed sometimes behave like a particle. 
Historically, the electron, for example, was thought to behave like a particle, and 
then it was found that in many respects it behaved like a wave. So it really behaves 
like neither. Now we have given up. We say: “It is like neither .” 

There is one lucky break, however — electrons behave just like light. The 
quantum behavior of atomic objects (electrons, protons, neutrons, photons, and 
so on) is the same for all, they are all “particle waves,” or whatever you want to 
call them. So what we learn about the properties of electrons (which we shall use 
for our examples) will apply also to all “particles,” including photons of light. 

The gradual accumulation of information about atomic and small-scale be- 
havior during the first quarter of this century, which gave some indications about 
how small things do behave, produced an increasing confusion which was finally 
resolved in 1926 and 1927 by Schrodinger, Heisenberg, and Born. They finally 
obtained a consistent description of the behavior of matter on a small scale. We 
take up the main features of that description in this chapter. 

Because atomic behavior is so unlike ordinary experience, it is very difficult 
to get used to, and it appears peculiar and mysterious to everyone — both to the 
novice and to the experienced physicist. Even the experts do not understand it 
the way they would like to, and it is perfectly reasonable that they should not, 
because all of direct, human experience and of human intuition applies to large 
objects. We know how large objects will act, but things on a small scale just do 
not act that way. So we have to learn about them in a sort of abstract or imagi- 
native fashion and not by connection with our direct experience. 

In this chapter we shall tackle immediately the basic element of the mysterious 
behavior in its most strange form. We choose to examine a phenomenon which is 
impossible, absolutely impossible, to explain in any classical way, and which has 
in it the heart of quantum mechanics. In reality, it contains the only mystery. 
We cannot make the mystery go away by “explaining” how it works. We will just 
tell you how it works. In telling you how it works we will have told you about the 
basic peculiarities of all quantum mechanics. 


1-1 Atomic mechanics 

1-2 An experiment with bullets 

1-3 An experiment with waves 

1-4 An experiment with electrons 

1-5 The interference of electron 
waves 

1-6 Watching the electrons 

1-7 First principles of quantum 
mechanics 

1-8 The uncertainty principle 

Note : This chapter is almost exactly 
the same as Chapter 37 of Volume I. 


1-2 An experiment with bullets 

To try to understand the quantum behavior of electrons, we shall compare 
and contrast their behavior, in a particular experimental setup, with the more 
familiar behavior of particles like bullets, and with the behavior of waves like 
water waves. We consider first the behavior of bullets in the experimental setup 
shown diagrammatically in Fig. 1-1. We have a machine gun that shoots a stream 
of bullets. It is not a very good gun, in that it sprays the bullets (randomly) over a 
fairly large angular spread, as indicated in the figure. In front of the gun we have 
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a wall (made of armor plate) that has in it two holes just about big enough to let a 
bullet through. Beyond the wall is a backstop (say a thick wall of wood) which will 
“absorb” the bullets when they hit it. In front of the wall we have an object which 
we shall call a “detector” of bullets. It might be a box containing sand. Any bullet 
that enters the detector will be stopped and accumulated. When we wish, we can 
empty the box and count the number of bullets that have been caught. The 
detector can be moved back and forth (in what we will call the x-direction). With 
this apparatus, we can find out experimentally the answer to the question: “What 
is the probability that a bullet which passes through the holes in the wall will 
arrive at the backstop at the distance x from the center?” First, you should 
realize that we should talk about probability, because we cannot say definitely 
where any particular bullet will go. A bullet which happens to hit one of the holes 
may bounce off the edges of the hole, and may end up anywhere at all. By “prob- 
ability” we mean the chance that the bullet will arrive at the detector, which we can 
measure by counting the number which arrive at the detector in a certain time and 
then taking the ratio of this number to the total number that hit the backstop during 
that time. Or, if we assume that the gun always shoots at the same rate during the 
measurements, the probability we want is just proportional to the number that 
reach the detector in some standard time interval. 


Fig. 1-1. Interference experiment 
with bullets. 
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(a) 


(b) (c) 


For our present purposes we would like to imagine a somewhat idealized 
experiment in which the bullets are not real bullets, but are indestructible bullets — 
they cannot break in half. In our experiment we find that bullets always arrive in 
lumps, and when we find something in the detector, it is always one whole bullet. 
If the rate at which the machine gun fires is made very low, we find that at any given 
moment either nothing arrives, or one and only one — exactly one — bullet arrives 
at the backstop. Also, the size of the lump certainly does not depend on the rate 
of firing of the gun. We shall say: ” Bullets always arrive in identical lumps.” What 
we measure with our detector is the probability of arrival of a lump. And we meas- 
ure the probability as a function of x. The result of such measurements with this 
apparatus (we have not yet done the experiment, so we are really imagining the 
result) are plotted in the graph drawn in part (c) of Fig. 1-1 . In the graph we plot 
the probability to the right and x vertically, so that the x-scale fits the diagram of 
the apparatus. We call the probability F 12 because the bullets may have come 
either through hole 1 or through hole 2. You will not be surprised that P\ 2 is 
large iiiear the middle of the graph but gets small if x is very large. You may 
wonder, however, why P 12 has its maximum value at x = 0. We can understand 
this fact if we do our experiment again after covering up hole 2, and once more 
while covering up hole 1. When hole 2 is covered, bullets can pass only through 
hole 1, and we get the curve marked P y in part (b) of the figure. As you would 
expect, the maximum of Pi occurs at the value of x which is on a straight line with 
the gun and hole L When hole 1 is closed, we get the symmetric curve P 2 drawn 
in the figure. P 2 is the probability distribution for bullets that pass through hole 
2. Comparing parts (b) and (c) of Fig. 1-1, we find the important result that 
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P 12 - Pi + P 2 - 


( 1 . 1 ) 



The probabilities just add together. The effect with both holes open is the sum of 
the effects with each hole open alone. We shall call this result an observation of 
"no interference for a reason that you will see later. So much for bullets. They 
come in lumps, and their probability of arrival shows no interference. 


1-3 An experiment with waves 

Now we wish to consider an experiment with water waves. The apparatus is 
shown diagrammatically in Fig. 1~2. We have a shallow trough of water. A small 
object labeled the “wave source” is jiggled up and down by a motor and makes 
circular waves. To the right of the source we have again a wall with two holes, 
and beyond that is a second wall, which, to keep things simple, is an “absorber,” 
so that there is no reflection of the waves that arrive there. This can be done by 
building a gradual sand “beach.” In front of the beach we place a detector which 
can be moved back and forth in the x-direction, as before. The detector is now a 
device which measures the “intensity” of the wave motion. You can imagine a 
gadget which measures the height of the wave motion, but whose scale is calibrated 
in proportion to the square of the actual height, so that the reading is proportional 
to the intensity of the wave. Our detector reads, then, in proportion to the energy 
being carried by the wave — or rather, the rate at which energy is carried to the 
detector. 



(a) (b) (c) 


Fig. 1-2. Interference experime 
with water waves. 


With our wave apparatus, the first thing to notice is that the intensity can 
have any size. If the source just moves a very small amount, then there is just a 
little bit of wave motion at the detector. When there is more motion at the source, 
there is more intensity at the detector. The intensity of the wave can have any 
value at all. We would not say that there was any “lumpiness” in the wave intensity. 

Now let us measure the wave intensity for various values of x (keeping the 
wave source operating always in the same way). We get the interesting-looking 
curve marked / 12 in part (c) of the figure. 

We have already worked out how such patterns can come about when we 
studied the interference of electric waves in Volume I. In this case we would 
observe that the original wave is diffracted at the holes, and new circular waves 
spread out from each hole. If we cover one hole at a time and measure the intensity 
distribution at the absorber we find the rather simple intensity curves shown in part 
(b) of the figure. Ii is the intensity of the wave from hole 1 (which we find by 
measuring when hole 2 is blocked off) and I 2 is the intensity of the wave from hole 
2 (seen when hole 1 is blocked). 

The intensity / x 2 observed when both holes are open is certainly not the sum 
of 1 1 and 1 2 . We say that there is “interference” of the two waves. At some places 
(where the curve / 12 has its maxima) the waves are “in phase” and the wave 
peaks add together to give a large amplitude and, therefore, a large intensity. We 
say that the two waves are “interfering constructively” at such places. There will 
be such constructive interference wherever the distance from the detector to one 
hole is a whole number of wavelengths larger (or shorter) than the distance from 
the detector to the other hole. 
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At those places where the two waves arrive at the detector with a phase differ- 
ence of 7 r (where they are “out of phase”) the resulting wave motion at the detector 
will be the difference of the two amplitudes. The waves “interfere destructively,” 
and we get a low value for the wave intensity. We expect such low values wherever 
the distance between hole 1 and the detector is different from the distance between 
hole 2 and the detector by an odd number of half-wavelengths. The low values of 
1 1 2 in Fig. 1-2 correspond to the places where the two waves interfere destructively. 

You will remember that the quantitative relationship between / T , / 2 , and / 12 
can be expressed in the following way: The instantaneous height of the water wave 
at the detector for the wave from hole 1 can be written as (the real part of)/?^ 1 ^, 
where the “amplitude” h\ is, in general, a complex number. The intensity is 
proportional to the mean squared height or, when we use the complex numbers, 
to the absolute value squared \h j | 2 . Similarly, for hole 2 the height is h 2 e ltat and the 
intensity is proportional to \h 2 f. When both holes are open, the wave heights 
add to give the height^ + h 2 )e lbit and the intensity | h x + h 2 | “. Omitting the 
constant of proportionality for our present purposes, the proper relations for 
interfering waves are 

I\ — |/?1 1 2 , /o = | /? 2 1 ^ i = \h\ + h 2 \~. (1*2) 

You will notice that the result is quite different from that obtained with bullets 
(Eq. 1 - 1 ). If we expand \h x + h 2 1 2 we see that 

\hi + h 2 1 2 - |/?i | 2 + \h 2 \ 2 + l\hi\\h 2 \ cos 6 , (1.3) 

where 5 is the phase difference between h 1 and h 2 . In terms of the intensities, we 
could write 

^12 — *fj + ^2 + 2\/7i/ 2 cos 5. (1*4) 

The last term in (1,4) is the “interference term.” So much for water waves. The 
intensity can have any value, and it shows interference. 


1-4 An experiment with electrons 

Now we imagine a similar experiment with electrons. It is shown diagram- 
matically in Fig. 1-3. We make an electron gun which consists of a tungsten wire 
heated by an electric current and surrounded by a metal box with a hole in it. If 
the wire is at a negative voltage with respect to the box, electrons emitted by the 
wire will be accelerated toward the walls and some will pass through the hole. 
All the electrons which come out of the gun will have (nearly) the same energy. 
In front of the gun is again a wall (just a thin metal plate) with two holes in it. 
Beyond the wall is another plate which will serve as a “backstop.” In front of the 
backstop we place a movable detector. The detector might be a geiger counter or, 
perhaps better, an electron multiplier, which is connected to a loudspeaker. 

We should say right away that you should not try to set up this experiment 
(as you could have done with the two we have already described). This experiment 


Fig. 1-3. Interference experiment 
with electrons. 
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has never been done in just this way. The trouble is that the apparatus would have 
to be made on an impossibly small scale to show the effects we are interested in. 
We are doing a “thought experiment/’ which we have chosen because it is easy to 
think about. We know the results that would be obtained because there are many 
experiments that have been done, in which the scale and the proportions have 
been chosen to show the effects we shall describe. 

The first thing we notice with our electron experiment is that we hear sharp 
“clicks” from the detector (that is, from the loudspeaker). And all “clicks” are 
the same. There are no “half-clicks.” 

We would also notice that the “clicks” come very erratically. Something like: 

click click-click . . . click click .... click-click click 

etc., just as you have, no doubt, heard a geiger counter operating. If we count 
the clicks which arrive in a sufficiently long time— say for many minutes— and 
then count again for another equal period, we find that the two numbers are very 
nearly the same. So we can speak of the average rate at which the clicks are heard 
(so-and-so-many clicks per minute on the average). 

As we move the detector around, the rate at which the clicks appear is faster 
or slower, but the size (loudness) of each click is always the same. If we lower the 
temperature of the wire in the gun, the rate of clicking slows down, but still each 
click sounds the same. We would notice also that if we put two separate detectors 
at the backstop, one or the other would click, but never both at once. (Except that 
once in a while, if there were two clicks very close together in time, our ear might 
not sense the separation.) We conclude, therefore, that whatever arrives at the 
backstop arrives in “lumps.” All the “lumps” are the same size: only whole 
“lumps” arrive, and they arrive one at a time at the backstop. We shall say: 
“Electrons always arrive in identical lumps.” 

Just as for our experiment with bullets, we can now proceed to find experi- 
mentally the answer to the question: “What is the relative probability that an 
electron dump’ will arrive at the backstop at various distances * from the center?” 
As before, we obtain the relative probability by observing the rate of clicks, holding 
the operation of the gun constant. The probability that lumps will arrive at a 
particular x is proportional to the average rate of clicks at that x. 

The result t>f our experiment is the interesting curve marked P 1 2 in part (c) 
of Fig. 1-3. Yes! That is the way electrons go. 


1-5 The interference of electron waves 

Now let us try to analyze the curve of Fig. 1-3 to see whether we can under- 
stand the behavior of the electrons. The first thing we would say is that since they 
come in lumps, each lump, which we may as well call an electron, has come either 
through hole 1 or through hole 2. Let us write this in the form of a “Proposition”: 

Proposition A: Each electron either goes through hole 1 or it goes through 
hole 2. 

Assuming Propositon A, all electrons that arrive at the backstop can be di- 
vided into two classes: (1) those that come through hole 1, and (2) those that come 
through hole 2. So our observed curve must be the sum of the effects of the elec- 
trons which come through hole 1 and the electrons which come through hole 2. 
Let us check this idea by experiment. First, we will make a measurement for those 
electrons that come through hole 1. We block off hole 2 and make our counts of 
the clicks from the detector. From the clicking rate, we get P x . The result of the 
measurement is shown by the curve marked P x in part (b) of Fig. 1-3. The result 
seems quite reasonable. In a similar way, we measure P 2 , the probability distribu- 
tion for the electrons that come through hole 2. The result of this measurement 
is also drawn in the figure. 

The result P 12 obtained with both holes open is clearly not the sum of P i and 
P 2 , the probabilities for each hole alone. In analogy with our water-wave experi- 
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ment, we say: “There is interference.” 


For electrons: P 12 ^ P 1 + /> 2 . (15) 

How can such an interference come about? Perhaps we should say: “Well, 
that means, presumably, that it is not true that the lumps go either through hole 
1 or hole 2, because if they did, the probabilities should add. Perhaps they go in a 
more complicated way. They split in half and . . ” But no! They cannot, they 
always arrive in lumps . . . “Well, perhaps some of them go through 1, and then 
they go around through 2, and then around a few more times, or by some other 
complicated path . then by closing hole 2, we changed the chance that an elec- 
tron that started out through hole 1 would finally get to the backstop ” But 
notice! Theie are some points at which very few electrons arrive when both holes 
are open, but which receive many electrons if we close one hole, so closing one 
hole increased the number from the other. Notice, however, that at the center 
of the pattern, P 1 2 is more than twice as large as/ 5 ! + P 2 . It is as though closing 
one hole decreased the number of electrons which come through the other hole. 
It seems hard to explain both effects by proposing that the electrons travel in 
complicated paths. 

It is all quite mysterious. And the more you look at it the more mysterious 
it seems. Many ideas have been concocted to try to explain the curve for P v > in 
terms of individual electrons going around in complicated ways through the holes. 
None of them has succeeded. None of them can get the right curve for /V in 
terms of P x and P 2 . 

Yet, surprisingly enough, the mathematics for relating P } and P 2 jo P ]0 is 
extremely simple. For P VI is just like the curve I 12 of Fig. 1-2, and that was 
simple. What is going on at the backstop can be described by two complex numbers 
that we can call 0j and 0 2 (they are functions of x, of course). The absolute square 
of gives the effect with only hole 1 open. That is, P^ =■ j^j 2 The effect with 
only hole 2 open is given by 0 2 in the same way. That is, P 2 = |0 2 | 2 And the 
combined effect of the two holes is just P l2 = \<p 1 + 0 2 | 2 The mathematics 
is the same as that we had for the water waves! (It is hard to see how one could 
get such a simple result from a complicated game of electrons going back and forth 
through the plate on some strange trajectory.) 

We conclude the following: The electrons arrive in lumps, like particles, and 
the probability of arrival of these lumps is distributed like the distribution of 
intensity of a wave. It is in this sense that an electron behaves “sometimes like a 
particle and sometimes like a wave.” 

Incidentally, when we were dealing with classical waves we defined the in- 
tensity as the mean over time of the square of the wave amplitude, and we used 
complex numbers as a mathematical trick to simplify the analysis. But in quantum 
mechanics it turns out that the amplitudes must be represented by complex num- 
bers. The real paits alone will not do. That is a technical point, for the moment, 
because the formulas look just the same. 

Since the probability of arrival through both holes is given so simply, although 
it is not equal to (P t + P 2 ), that is really all there is to say. But there are a large 
number of subtleties involved in the fact that nature does work this way. We 
would like to illustrate some of these subtleties for you now. First, since the num- 
ber that arrives at a particular point is not equal to the number that arrives through 
1 plus the number that arrives through 2, as we would have concluded from 
Proposition A, undoubtedly we should conclude that Proposition A is false. It is 
not true that the electrons go cither through hole 1 or hole 2. But that conclusion 
can be tested by another experiment. 


1-6 Watching the electrons 

We shall now try the following experiment. To our electron apparatus we 
add a very strong light source, placed behind the wall anti between the two holes, 
as shown in Fig. 1-4. We know that electric charges scatter light. So when an 
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Fig. 1-4. A different electron 
periment. 


electron passes, however it does pass, on its way to the detector, it will scatter some 
light to our eye, and we can see where the electron goes. If, for instance, an electron 
were to take the path via hole 2 that is sketched in Fig. 1-4, we should see a flash 
of light coming from the vicinity of the place marked A in the figure. If an electron 
passes through hole 1, we would expect to see a flash from the vicinity of the upper 
hole. If it should happen that we get light from both places at the same time, 
because the electron divides in half ... Let us just do the experiment! 

Here is what we see: every time that we hear a “click” from our electron de- 
tector (at the backstop), we also see a flash of light either near hole l or near hole 
2, but never both at once! And we observe the same result no matter where we put 
the detector. From this observation we conclude that when we look at the electrons 
we find that the electrons go either through one hole or the other. Experimentally, 
Proposition A is necessarily true. 

What, then, is wrong with our argument against Proposition A? Why isn't 
Pj 2 just equal to Pi + P 2 ? Back to experiment! Let us keep track of the electrons 
and find out what they are doing. For each position (^-location) of the detector 
we will count the electrons that arrive and also keep track of which hole they went 
through, by watching for the flashes. We can keep track of things this way: 
whenever we hear a “click” we will put a count in Column 1 if we see the flash near 
hole 1 , and if we see the flash near hole 2, we will record a count in Column 2. 
Every electron which arrives is recorded in one of two classes: those which come 
through 1 and those which come through 2. From the number recorded in Column 
1 we get the probability P[ that an electron will arrive at the detector via hole 1 ; 
and from the number recorded in Column 2 we get P 2 , the probability that an 
electron will arrive at the detector via hole 2. If we now repeat such a measurement 
for many values of x, we get the curves for P[ and P 2 shown in part (b) of Fig. 1-4. 

Well, that is not too surprising! We get for P[ something quite similar to 
what we got before for Pi by blocking off hole 2; and P 2 is similar to what we got 
by blocking hole 1. So there is not any complicated business like going through 
both holes. When we watch them, the electrons come through just as we would 
expect them to come through. Whether the holes are closed or open, those which 
we see come through hole 1 are distributed in the same way whether hole 2 is open 
or closed. 

But wait! What do we have now for the total probability, the probability that 
an electron will arrive at the detector by any route? We already have that informa- 
tion. We just pretend that we never looked at the light flashes, and we lump to- 
gether the detector clicks which we have separated into the two columns. We 
must just add the numbers. For the probability that an electron will arrive at the 
backstop by passing through either hole, we do find P[ 2 = Pi + ^ 2 - That is, 
although we succeeded in watching which hole our electrons come through, we 
no longer get the old interference curve Pi 2 , but a new one, P[ 2 , showing no 
interference! If we turn out the light Pi 2 is restored. 

We must conclude that when we look at the electrons the distribution of them 
on the screen is different than when we do not look. Perhaps it is turning on our 
light source that disturbs things? It must be that the electrons are very delicate, 
and the light, when it scatters off the electrons, gives them a jolt that changes their 
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motion. We know that the electric field of the light acting on a charge will exert 
a force on it. So perhaps we should expect the motion to be changed. Anyway, 
the light exerts a big influence on the electrons. By trying to “watch” the electrons 
we have changed their motions. That is, the jolt given to the electron when the 
photon is scattered by it is such as to change the electron’s motion enough so that 
if it might have gone to where P 12 was at a maximum it will instead land where 
Pi 2 was a minimum; that is why we no longer see the wavy interference effects. 

You may be thinking: “Don’t use such a bright source! Turn the brightness 
down! The light waves will then be weaker and will not disturb the electrons so 
much. Surely, by making the light dimmer and dimmer, eventually the wave 
will be weak enough that it will have a negligible effect.” O.K. Let’s try it. The 
first thing we observe is that the flashes of light scattered from the electrons as 
they pass by does not get weaker. It is always the same-sized flash. The only thing 
that happens as the light is made dimmer is that sometimes we hear a “click” 
from the detector but see no flash at all. The electron has gone by without being 
“seen.” What we are observing is that light also acts Ifte electrons, we knew that 
it was “wavy,” but now we find that it is also “lumpy.” It always arrives — or is 
scattered — in lumps that we call “photons.” As we turn down the intensity of 
the light source we do not change the size of the photons, only the rate at which 
they are emitted. That explains why, when our source is dim, some electrons get 
by without being seen. There did not happen to be a photon around at the time 
the electron went through. 

This is all a little discouraging. If it is true that whenever we “see” the electron 
we see the same-sized flash, then those electrons we see are always the disturbed 
ones. Let us try the experiment with a dim light anyway. Now whenever we hear 
a click in the detector we will keep a count in three columns: in Column (1) those 
electrons seen by hole 1, in Column (2) those electrons seen by hole 2, and in 
Column (3) those electrons not seen at all. When we work up our data (computing 
the probabilities) we find these results: Those “seen by hole 1” have a distribution 
like P[ ; those “seen by hole 2” have a distribution like Pl 2 (so that those “seen by 
either hole 1 or 2” have a distribution like and those “not seen at all” have a 
“wavy” distribution just like P 12 of Fig- 1-3! If the electrons are not seen , we 
have interference ! 

That is understandable. When we do not see the electron, no photon disturbs 
it, and when we do see it, a photon has disturbed it. There is always the same 
amount of disturbance because the light photons all produce the same-sized effects 
and the effect of the photons being scattered is enough to smear out any inter- 
ference effect. 

Is there not some way we can see the electrons without disturbing them? 
We learned in an earlier chapter that the momentum carried by a “photon” 
is inversely proportional to its wavelength (p ~ h/ A). Certainly the jolt given 
to the electron when the photon is scattered toward our eye depends on the 
momentum that photon carries. Aha! If we want to disturb the electrons only 
slightly we should not have lowered the intensity of the light, we should have 
lowered its frequency (the same as increasing its wavelength). Let us use light of 
a redder color. We could even use infrared light, or radiowaves (like radar), and 
“see” where the electron went with the help of some equipment that can “see” 
light of these longer wavelengths. If we use “gentler” light perhaps we can avoid 
disturbing the electrons so much. 

Let us try the experiment with longer waves. We shall keep repeating our ex- 
periment, each time with light of a longer wavelength. At first, nothing seems to 
change. The results are the same. Then a terrible thing happens. You remember 
that when we discussed the microscope we pointed out that, due to the wave nature 
of the light, there is a limitation on how close two spots can be and still be seen 
as two separate spots. This distance is of the order of the wavelength of light. So 
now, when we make the wavelength longer than the distance between our holes, 
we see a big fuzzy flash when the light is scattered by the electrons. We can no 
longer tell which hole the electron went through! We just know it went somewhere! 
And it is just with light of this color that we find that the jolts given to the electron 
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are small enough so that P [ 2 begins to look like P 12 — that we begin to get some 
interference effect. And it is only for wavelengths much longer than the separation 
of the two holes (when we have no chance at all of telling where the electron went) 
that the disturbance due to the light gets sufficiently small that we again get the 
curve Pi 2 shown in Fig. 1-3. 

In our experiment we find that it is impossible to arrange the light in such a 
way that one can tell which hole the electron went through, and at the same time 
not disturb the pattern. It was suggested by Heisenberg that the then new laws of 
nature could only be consistent if there were some basic limitation on our experi- 
mental capabilities not previously recognized. He proposed, as a general principle, 
his uncertainty principle , which we can state in terms of our experiment as follows: 
“It is impossible to design an apparatus to determine which hole the electron passes 
through, that will not at the same time disturb the electrons enough to destroy the 
interference pattern.” If an apparatus is capable of determining which hole the 
electron goes through, it cannot be so delicate that it does not disturb the pattern in 
an essential way. No one has ever found (or even thought of) a way around the 
uncertainty principle. So we must assume that it describes a basic characteristic 
of nature. 

The complete theory of quantum mechanics which we now use to describe 
atoms and, in fact, all matter, depends on the correctness of the uncertainty prin- 
ciple. Since quantum mechanics is such a successful theory, our belief in the 
uncertainty principle is reinforced. But if a way to “beat” the uncertainty principle 
were ever discovered, quantum mechanics would give inconsistent results and 
would have to be discarded as a valid theory of nature. 

“Well,” you say, “what about Proposition A? Is it true, or is it not true, 
that the electron either goes through hole 1 or it goes through hole 2?” The only 
answer that can be given is that we have found from experiment that there is a 
certain special way that we have to think in order that we do not get into incon- 
sistencies. What we must say (to avoid making wrong predictions) is the following. 
If one looks at the holes or, more accurately, if one has a piece of apparatus which 
is capable of determining whether the electrons go through hole 1 or hole 2, then 
one can say that it goes either through hole 1 or hole 2. But , when one does not 
try to tell which way the electron goes, when there is nothing in the experiment to 
disturb the electrons, then one may not say that an electron goes either through 
hole 1 or hole 2. If one does say that, and starts to make any deductions from the 
statement, he will make errors in the analysis. This is the logical tightrope on 
which we must walk if we wish to describe nature successfully. 


If the motion of all matter — as well as electrons — must be described in terms 
of waves, what about the bullets in our first experiment? Why didn’t we see an 
interference pattern there? It turns out that for the bullets the wavelengths were so 
tiny that the interference patterns became very fine. So fine, in fact, that with any 
detector of finite size one could not distinguish the separate maxima and minima. 
What we saw was only a kind of average, which is the classical curve. In Fig. 1-5 
we have tried to indicate schematically what happens with large-scale objects. 
Part (a) of the figure shows the probability distribution one might predict for 
bullets, using quantum mechanics. The rapid wiggles are supposed to represent 
the interference pattern one gets for waves of very short wavelength. Any physical 
detector, however, straddles several wiggles of the probability curve, so that the 
measurements show the smooth curve drawn in part (b) of the figure. 

1-7 First principles of quantum mechanics 

% 

We will now write a summary of the main conclusions of our experiments. 
We will, however, put the results in a form which makes them true for a general 
class of such experiments. We can write our summary more simply if we first 
define an “ideal experiment” as one in which there are no uncertain external 
influences, i.e., no jiggling or other things going on that we cannot take into ac- 
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(a) (b) 


Fig. 1-5. Interference pattern with 
bullets: (a) actual (schematic), (b) ob- 
served. 



count. We would be quite precise if we said: “An ideal experiment is one in which 
all of the initial and final conditions of the experiment are completely specified." 
What we will call “an event" is, in general, just a specific set of initial and final 
conditions. (For example: “an electron leaves the gun, arrives at the detector, and 
nothing else happens.") Now for our summary. 

Summary 

(1) The probability of an event in an ideal experiment is given by the square of 
the absolute value of a complex number 0 which is called the probability 
amplitude: 

P = probability, 

<t> = probability amplitude, (1 6) 

P- \<t>\ 2 

(2) When an event can occur in several alternative ways, the probability ampli- 
tude for the event is the sum of the probability amplitudes for each way 
considered separately. There is interference: 

4> = <f> l + 02s 

P — 1 0! + 02| 2 (1.7) 

(3) If an experiment is performed which is capable of determining whether one or 
another alternative is actually taken, the probability of the event is the sum 
of the probabilities for each alternative. The interference is lost: 

P - Pi + P 2 . (1.8) 

One might still like to ask: How does it work? What is the machinery behind 
the law?" No one has found any machinery behind the law. No one can “explain" 
any more than we have just “explained." No one will give you any deeper repre- 
sentation of the situation. We have no ideas about a more basic mechanism from 
which these results can be deduced. 

We would like to emphasize a very important difference between classical and 
quantum mechanics. We have been talking about the probability that an electron 
will arrive in a given circumstance. We have implied that in our experimental 
arrangement (or even in the best possible one) it would be impossible to predict 
exactly what would happen. We can only predict the odds! This would mean, if 
it weie true, that physics has given up on the problem of trying to predict exactly 
what will happen in a definite circumstance. Yes! physics has given up. We do 
not know how to predict what would happen in a given circumstance , and we believe 
now that it is impossible that the only thing that can be predicted is the prob- 
ability of different events. It must be recognized that this is a retrenchment in our 
earlier ideal of understanding nature. It may be a backward step, but no one 
has seen a way to avoid it. 

We make now a few remarks on a suggestion that has sometimes been made 
to try to avoid the description we have given: “Perhaps the electron has some kind 
of internal works— some inner variables— that we do not yet know about. Perhaps 
that is why we cannot predict what will happen. If we could look more closely at 
the electron, we could be able to tell where it would end up." So far as we know, 
that is impossible. We would still be in difficulty. Suppose we were to assume that 
inside the electron there is some kind of machinery that determines where it is 
going to end up. That machine must also determine which hole it is going to go 
through on its way. But we must not forget that what is inside the electron should 
not be dependent on what we do, and in particular upon whether we open or close 
one of the holes. So if an electron, before it starts, has already made up its mind 
(a) which hole it is going to use, and (b) where it is going to land, we should find 
P 1 for those electrons that have chosen hole 1, P 2 for those that have chosen hole 
2, and necessarily the sum P 1 + P 2 for those that arrive through the two holes. 
There seems to be no way around this. But we have verified experimentally that 
that is not the case. And no one has figured a way out of this puzzle. So at the 
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present time we must limit ourselves to computing probabilities. We say “at the 
present time,” but we suspect very strongly that it is something that will be with 
us forever — that it is impossible to beat that puzzle — that this is the way nature 
really is. 


1-8 The uncertainty principle 

This is the way Heisenberg stated the uncertainty principle originally: If you 
make the measurement on any object, and you can determine the x-component of 
its momentum with an uncertainty A/?, you cannot, at the same time, know its 
x-position more accurately than Ax = h/Ap, where h is a definite fixed number 
given by nature. It is called “Planck’s constant,” and is approximately 6.63 X 
10“ 34 joule-seconds. The uncertainties in the position and momentum of a 
particle at any instant must have their product greater than Planck’s constant. 
This is a special case of the uncertainty principle that was stated above more 
generally. The more general statement was that one cannot design equipment in 
any way to determine which of two alternatives is taken, without, at the same 
time, destroying the pattern of interference. 

Let us show for one particular case that the kind of relation given by Heisen- 
berg must be true in order to keep from getting into trouble. We imagine a modifi- 
cation of the experiment of Fig. 1-3, in which the wall with the holes consists of a 
plate mounted on rollers so that it can move freely up and down (in the x-direction), 
as shown in Fig. 1-6. By watching the motion of the plate carefully we can try to 
tell which hole an electron goes through. Imagine what happens when the detector 
is placed at x = 0. We would expect that an electron which passes through hole 1 
must be deflected downward by the plate to reach the detector. Since the vertical 
component of the electron momentum is changed, the plate must recoil with an 
equal momentum in the opposite direction. The plate will get an upward kick. 
If the electron goes through the lower hole, the plate should feel a downward kick. 
It is clear that for every position of the detector, the momentum received by the 
plate will have a different value for a traversal via hole 1 than for a traversal via 
hole 2. So! Without disturbing the electrons at all , but just by watching the plate , 
we can tell which path the electron used. 

Now in order to do this it is necessary to know what the momentum of the 
screen is, before the electron goes through. So when we measure the momentum 
after the electron goes by, we can figure out how much the plate’s momentum has 
changed. But remember, according to the uncertainty principle we cannot at the 
same time know the position of the plate with an arbitrary accuracy. But if we do 
not know exactly where the plate is, we cannot say precisely where the two holes are. 
They will be in a different place for every electron that goes through. This means 
that the center of our interference pattern will have a different location for each 
electron. The wiggles of the interference pattern will be smeared out. We shall show 
quantitatively in the next chapter that if we determine the momentum of the plate 
sufficiently accurately to determine from the recoil measurement which hole was 
used, then the uncertainty in the x-position of the plate will, according to the un- 
certainty principle, be enough to shift the pattern observed at the detector up and 
down in the x-direction about the distance from a maximum to its nearest minimum. 
Such a random shift is just enough to smear out the pattern so that no interference 
is observed. 

The uncertainty principle “protects” quantum mechanics. Heisenberg recog- 
nized that if it were possible to measure the momentum and the position simultane- 
ously with a greater accuracy, the quantum mechanics would collapse. So he 
proposed that it must be impossible. Then people sat down and tried to figure out 
ways of doing it, and nobody could figure out a way to measure the position and 
the momentum of anything — a screen, an electron, a billiard ball, anything — with 
any greater accJfacy. Quantum mechanics maintains its perilous but still correct 
existence. 



WALL BACKSTOP 


Fig. 1-6. An experiment in which 
the recoil of the wall is measured. 
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2 


The Relation of Wave and 
Particle Viewpoints 


2-1 Probability wave amplitudes 


In this chapter we shall discuss the relationship of the wave and particle 
viewpoints. We already know, from the last chapter, that neither the wave view- 
point nor the particle viewpoint is correct. We would always like to present things 
accurately, or at least precisely enough that they will not have to be changed when 
we learn more — it may be extended, but it will not be changed! But when we try 
to talk about the wave picture or the particle picture, both are approximate, and 
both will change. Therefore what we learn in this chapter will not be accurate in a 
certain sense; we will deal with some half-intuitive arguments which will be made 
more precise later. But certain things will be changed a little bit when we interpret 
them correctly in quantum mechanics. We are doing this so that you can have 
some qualitative feeling for some quantum phenomena before we get into the 
mathematical details of quantum mechanics. Furthermore, all our experiences 
are with waves and with particles, and so it is rather handy to use the wave and 
particle ideas to get some understanding of what happens in given circumstances 
before we know the complete mathematics of the quantum-mechanical amplitudes. 
We shall try to indicate the weakest places as we go along, but most of it is very 
nearly correct — it is just a matter of interpretation. 

First of all, we know that the new way of representing the world in quantum 
mechanics — the new framework — is to give an amplitude for every event that can 
occur, and if the event involves the reception of one particle, then we can give the 
amplitude to find that one particle at different places and at different times. The 
probability of finding the particle is then proportional to the absolute square of 
the amplitude. In general, the amplitude to find a particle in different places at 
different times varies with position and time. 

In some special case it can be that the amplitude varies sinusoidally in space 
and time like e ^ at ~ kmr ^ where r is the vector position from some origin. (Do not 
forget that these amplitudes are complex numbers, not real numbers.) Such an 
amplitude varies according to a definite frequency co and wave number k . Then 
it turns out that this corresponds to a classical limiting situation where we would 
have believed that we have a particle whose energy E was known and is related to 
the frequency by 

E = (2.1) 


2-1 Probability wave amplitudes 

2-2 Measurement of position and 
momentum 

2-3 Crystal diffraction 

2-4 The size of an atom 

2-5 Energy levels 

2-6 Philosophical implications 

Note: This chapter is almost exactly 
the same as Chapter 38 of Volume 1. 


and whose momentum p is also known and is related to the wave number by 

p = hk . (2.2) 

(The symbol h represents the number h divided by 2ir; h = h/lir.) 

This means that the idea of a particle is limited. The idea of a particle — its 
location, its momentum, etc. — which we use so much, is in certain ways unsatis- 
factory. For instance, if an amplitude to find a particle at different places is given 
by e l (< at ~ k,r \ whose absolute square is a constant, that would mean that the prob- 
ability of finding a particle is the same at all points. That means we do not know 
where it is — it can be anywhere — there is a great uncertainty in its location. 

On the other hand, if the position of a particle is more or less well known and 
we can predict it fairly accurately, then the probability of finding it in different 
places must be confined to a certain region, whose length we call Ax. Outside this 
region, the probability is zero. Now this probability is the absolute square of an 
amplitude, and if the absolute square is zero, the amplitude is also zero, so that 
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AX 


Fig, 2-1. A wave packet of length 
Ax. 


we have a wave train whose length is Ax (Fig. 2-1), and the wavelength (the 
distance between nodes of the waves in the train) of that wave train is what corre- 
sponds to the particle momentum. m 

Here we encounter a strange thing about waves; a very simple thing which has 
nothing to do with quantum mechanics strictly. It is something that anybody 
who works with waves, even if he knows no quantum mechanics, knows: namely, 
we cannot define a unique wavelength for a short wave train. Such a wave train does 
not have a definite wavelength; there is an indefiniteness in the wave number that 
is related to the finite length of the train, and thus there is an indefiniteness in 
the momentum. 


2-2 Measurement of position and momentum 



Fig. 2-2. Diffraction of particles 
passing through a slit. 


Let u,s consider two examples of this idea — to see the reason that there is an 
uncertainty in the position and/or the momentum, if quantum mechanics is right. 
We have also seen before that if there were not such a thing — if it were possible to 
measure the position and the momentum of anything simultaneously — we would 
have a paradox; it is fortunate that we do not have such a paradox, and the fact 
that such an uncertainty comes naturally from the wave picture shows that every- 
thing is mutually consistent. 

Here is one example which shows the relationship between the position and 
the momentum in a circumstance that is easy to understand. Suppose we have a 
single slit, and particles are coming from very far away with a certain energy — so 
that they are all coming essentially horizontally (Fig. 2-2). We are going to 
concentrate on the vertical components of momentum. All of these particles have 
a certain horizontal momentum /> 0 , say, in a classical sense. So, in the classical 
sense, the vertical momentum p y , before the particle goes through the hole, is 
definitely known. The particle is moving neither up nor down, because it came from 
a source that is far away — and so the vertical momentum is of course zero. But 
now let us suppose that it goes through a hole whose width is B. Then after it has 
come out through the hole, we know the position vertically — the jr-position — with 
considerable accuracy — namely ±Z?.f That is, the uncertainty in position, Ay, is 
of order B , Now we might also want to say, since we known the momentum is 
absolutely horizontal, that Ap y is zero; but that is wrong. We once knew the mo- 
mentum was horizontal, but we do not know it any more. Before the particles 
passed through the hole, we did not know their vertical positions. Now that we 
have found the vertical position by having the particle come through the hole, we 
have lost our information on the vertical momentum! Why? According to the 
wave theory, there is a spreading out, or diffraction, of the waves after they go 
through the slit, just as for light. Therefore there is a certain probability that 
particles coming out of the slit are not coming* exactly straight. The pattern is 
spread out by the diffraction effect, and the angle of spread, which we can define 
as the angle of the first minimum, is a measure of the uncertainty in the final angle. 

How does the pattern become spread? To say it is spread means that there is 
some chance for the particle to be moving up or down, that is, to have a component 
of momentum up or down. We say chance and particle because we can detect this 
diffraction pattern with a particle counter, and when the counter receives the 
particle, say at C in Fig. 2-2, it receives the entire particle, so that, in a classical 
sense, the particle has a vertical momentum, in order to get from the slit up to C. 

To get a rough idea of the spread of the momentum, the vertical momentum 
p y has a spread which is equal to /? O A0, where p {) is the horizontal momentum. 
And how big is A 6 in the spread-out pattern? We know that the first minimum 
occurs at an angle A 6 such that the waves from one edge of the slit have to travel 
one wavelength farther than the waves from the other side — we worked that out 
before (Chapter 30 of Vol. I). Therefore A 6 is X/ B, and so A p v in this experiment 
is pfk/B. Note that if we make B smaller and make a more accurate measurement 


f More precisely, the error in our knowledge of v is ±B; 2. But we are now only in- 
terested in the general idea, so we won’t worry about factors of 2. 


2-2 



of the position of the particle, the diffraction pattern gets wider. So the narrower 
we make the slit, the wider the pattern gets, and the more is the likelihood that we 
would find that the particle has sidewise momentum. Thus the uncertainty in the 
vertical momentum is inversely proportional to the uncertainty of y. In fact, we 
see that the product of the two is equal to p 0 \. But X is the wavelength and p 0 is 
the momentum, and in accordance with quantum mechanics, the wavelength times 
the momentum is Planck’s constant h. So we obtain the rule that the uncertainties 
in the vertical momentum and in the vertical position have a product of the order h ; 


Ay Ap v ~ h. 


(2.3) 


We cannot prepare a system in which we know the vertical position of a particle 
and can predict how it will move vertically with greater certainty than given by 
(2.3). That is, the uncertainty in the vertical momentum must exceed h/Ay, where 
Ay is the uncertainty in our knowledge of the position. 

Sometimes people say quantum mechanics is all wrong. When the particle 
arrived from the left, its vertical momentum was zero. And now that it has gone 
through the slit, its position is known. Both position and momentum seem to 
be known with arbitrary accuracy. It is quite true that we can receive a particle, 
and on reception determine what its position is and what its momentum would 
have had to have been to have gotten there. That is true, but that is not what the 
uncertainty relation (2.3) refers to. Equation (2.3) refers to the predictability 
of a situation, not remarks about the past. It does no good to say “I knew what 
the momentum was before it went through the slit, and now I know the position,” 
because now the momentum knowledge is lost. The fact that it went through the 
slit no longer permits us to predict the vertical momentum. We are talking about 
a predictive theory, not just measurements after the fact. So we must talk about 
what we can predict. 

Now let us take the thing the other way around. Let us take another example 
of the same phenomenon, a little more quantitatively. In the previous example 
we measured the momentum by a classical method. Namely, we considered the 
direction and the velocity and the angles, etc., so we got the momentum by classical 
analysis. But since momentum is related to wave number, there exists in nature 
still another way to measure the momentum of a particle — photon or otherwise — 
which has no classical analog, because it uses Eq. (2.2). We measure the wave- 
lengths of the waves. Let us try to measure momentum in this way. 

Suppose we have a grating with a large number of lines (Fig. 2-3), and send 
a beam of particles at the grating. We have often discussed this problem: if the 
particles have a definite momentum, then we get a very sharp pattern in a certain 
direction, because of the interference. And we have also talked about how accu- 
rately we can determine that momentum, that is to say, what the resolving power 
of such a grating is. Rather than derive it again, we refer to Chapter 30 of Volume 
I, where we found that the relative uncertainty in the wavelength that can be 
measured with a given grating is l/TVm, where TV is the number of lines on the grat- 
ing and m is the order of the diffraction pattern. That is, 



Fig. 2-3. 
turn by using 


Determination of momen 
a diffraction grating. 


AX/X = l/Nm. 

Now formula (2.4) can be rewritten as 

AX/X 2 = 1 /Nm\ = l/L, 


(2.4) 

(2.5) 


where L is the distance shown in Fig. 2-3. This distance is the difference between 
the total distance that the particle or wave or whatever it is has to travel if it is 
reflected from the bottom of the grating, and the distance that it has to travel if 
it is reflected from the top of the grating. That is, the waves which form the diffrac- 
tion pattern are waves which come from different parts of the grating. The first 
ones that arrive come from the bottom end of the grating, from the beginning of 
the wave train, and the rest of them come from later parts of the wave train, coming 
from different parts of the grating, until the last one finally arrives, and that involves 
a point in the wave train a distance L behind the first point. So in order that we 
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Fig. 2-4. Scattering 
crystal planes. 


shall have a sharp line in our spectrum corresponding to a definite momentum, 
with an uncertainty given by (2.4), we have to have a wave train of at least length 
L . If the wave train is too short, we are not using the entire grating. The waves 
which form the spectrum are being reflected from only a very short sector of the 
grating if the wave train is too short, and the grating will not work right — we will 
find a big angular spread. In order to get a narrower one, we need to use the whole 
grating, so that at least at some moment the whole wave train is scattering simul- 
taneously from all parts of the grating. Thus the wave train must be of length L 
in order to have an uncertainty in the wavelength less than that given by (2.5). 
Incidentally, 

AX/X 2 - A(l/X) - Ak/lir. (2.6) 

Therefore 

A k = 2 tt/L , (2.7) 

where L is the length of the wave train. 

This means that if we have a wave train whose length is less than L, the un- 
certainty in the wave number must exceed 2ir/L. Or the uncertainty in a wave 
number times the length of the wave train — we will call that for a moment Ax — 
exceeds 27 t. We call it Ax because that is the uncertainty in the location of the 
particle. If the wave train exists only in a finite length, then that is where we could 
find the particle, within an uncertainty Ax. Now this property of waves, that the 
length of the wave train times the uncertainty of the wave number associated with 
it is at least 27 t, is a property that is known to everyone who studies them. It has 
nothing to do with quantum mechanics. It is simply that if we have a finite train, 
we cannot count the waves in it very precisely. 

Let us try another way to see the reason for that. Suppose that we have a 
finite train of length L\ then because of the way it has to decrease at the ends, as 
in Fig. 2-1 , the number of waves in the length L is uncertain by something like ± 1 . 
But the number of waves in L is kL/2ir . Thus k is uncertain, and we again get the 
result (2.7), a property merely of waves. The same thing works whether the waves 
are in space and k is the number of radians per centimeter and L is the length of 
the train, or the waves are in time and co is the number of oscillations per second 
and T is the “length” in time that the wave train comes in. That is, if we have 
a wave train lasting only for a certain finite time T, then the uncertainty in the fre- 
quency is given by 

A co - 2 t t/T. (2.8) 

We have tried to emphasize that these are properties of waves alone, and they are 
well known, for example, in the theory of sound. 

The point is that in quantum mechanics we interpret the wave number as 
being a measure of the momentum of a particle, with the rule that p = hk , so 
that relation (2.7) tells us that A p ~ h/ Ax. This, then, is a limitation of the classi- 
cal idea of momentum. (Naturally, it has to be limited in some ways if we are 
going to represent particles by waves!) It 'is nice that we have found a rule that 
gives us some idea of when there is a failure of classical ideas. 




»f waves by 


2-3 Crystal diffraction 

Next let us consider the reflection of particle waves from a crystal. A crystal 
is a thick thing which has a whole lot of similar atoms — we will include some com- 
plications later— in a nice array. The question is how to set the array so that we 
get a strong reflected maximum in a given direction for a given beam of, say, light 
(x-rays), electrons, neutrons, or anything else. In order to obtain a strong reflection, 
the scattering from all of the atoms must be in phase. There cannot be equal num- 
bers in phase and out of phase, or the waves will cancel out. The way to arrange 
things is to find the regions of constant phase, as we have already explained; 
they are planes which make equal angles with the initial and final directions 
(Fig. 2-4). 

If we consider two parallel planes, as in Fig. 2-4, the waves scattered from the 
two planes will be in phase, provided the difference in distance traveled by a wave 
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front is an integral number of wavelengths. This difference can be seen to be 
2d sin 6 , where d is the perpendicular distance between the planes. Thus the 
condition for coherent reflection is 

2^/sin 6 = n\ (n = 1,2,.. .). (2.9) 

If, for example, the crystal is such that the atoms happen to lie on planes obey- 
ing condition (2.9) with n = 1, then there will be a strong reflection. If, on the 
other hand, there are other atoms of the same nature (equal in density) halfway 
between, then the intermediate planes will also scatter equally strongly and will 
interfere with the others and produce no effect. So d in (2.9) must refer to ad- 
jacent planes; we cannot take a plane five layers farther back and use this formula! 

As a matter of interest, actual crystals are not usually as simple as a single 
kind of atom repeated in a certain way. Instead, if we make a two-dimensional 
analog, they are much like wallpaper, in which there is some kind of figure which 
repeats all over the wallpaper. By “figure” we mean, in the case of atoms, some 
arrangement — calcium and a carbon and three oxygens, etc., for calcium carbonate, 
and so on — which may involve a relatively large number of atoms. But whatever 
it is, the figure is repeated in a pattern. This basic figure is called a unit cell. 

The basic pattern of repetition defines what we call the lattice type ; the lattice 
type can be immediately determined by looking at the reflections and seeing what 
their symmetry is. In other words, where we find any reflections at all determines 
the lattice type, but in order to determine what is in each of the elements of the 
lattice one must take into account the intensity of the scattering at the various 
directions. Which directions scatter depends on the type of lattice, but how strongly 
each scatters is determined by what is inside each unit cell, and in that way the 
structure of crystals is worked out. 

Two photographs of x-ray diffraction patterns are shown in Figs. 2-5 and 
2-6 ; they illustrate scattering from rock salt and myoglobin, respectively. 

Incidentally, an interesting thing happens if the spacings of the nearest planes 
are less than X/2. In this case (2.9) has no solution for n . Thus if X is bigger 
than twice the distance between adjacent planes, then there is no side diffraction 
pattern, and the light — or whatever it is — will go right through the material with- 
out bouncing off or getting lost. So in the case of light, where X is much bigger 
than the spacing, of course it does go through and there is no pattern of reflection 
from the planes of the crystal. 

This fact also has an interesting consequence in the case of piles which make 
neutrons (these are obviously particles, for anybody’s money!). If we take these 
neutrons and let them into a long block of graphite, the neutrons diffuse and 
work their way along (Fig. 2-7). They diffuse because they are bounced by the 
atoms, but strictly, in the wave theory, they are bounced by the atoms because 
of diffraction from the crystal planes. It turns out that if we take a very long piece 
of graphite, the neutrons that come out the far end are all of long wavelength! 
In fact, if one plots the intensity as a function of wavelength, we get nothing except 
for wavelengths longer than a certain minimum (Fig. 2-8). In other words, we 
can get very slow neutrons that way. Only the slowest neutrons come through; 
they are not diffracted or scattered by the crystal planes of the graphite, but keep 
going right through like light through glass, and are not scattered out the sides. 
There are many other demonstrations of the reality of neutron waves and waves 
of other particles. 


2-4 The size of an atom 

We now consider another application of the uncertainty relation, Eq. (2.3). 
It must not be taken too seriously; the idea is right but the analysis is not very 
accurate. The idea has to do with the determination of the size of atoms, and the 
fact that, classically, the electrons would radiate light and spiral in until they settle 
down right on top of the nucleus. But that cannot be right quantum-mechanically 
because then we would know where each electron was and how fast it was moving. 
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Fig. 2-5. The pattern produced by 
the diffraction of a beam of x-rays in a 
crystal of sodium chloride. 



Fig. 2-6. The x-ray diffraction pat- 
tern of myoglobin. 
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Fig. 2-7. Diffusion of pile neutrons 
through graphite block. 



Fig. 2-8. Intensity of neutrons out of 
graphite rod as function of wavelength. 



spectral frequencies was noted before quantum mechanics was discovered, and it is 
called the Ritz combination principle . This is again a mystery from the point of 
view of classical mechanics. Let us not belabor the point that classical mechanics 
is a failure in the atomic domain; we seem to have demonstrated that pretty well. 

We have already talked about quantum mechanics as being represented by 
amplitudes which behave like waves, with certain frequencies and wave numbers. 
Let us observe how it comes about from the point of view of amplitudes that the 
atom has definite energy states. This is something we cannot understand from what 
has been said so far, but we are all familiar with the fact that confined waves have 
definite frequencies. For instance, if sound is confined to an organ pipe, or any- 
thing like that, then there is more than one way that the sound can vibrate, but 
for each such way there is a definite frequency. Thus an object in which the waves 
are confined has certain resonance frequencies. It is therefore a property of waves 
in a confined space — a subject which we will discuss in detail with formulas later 
on — that they exist only at definite frequencies. And since the general relation 
exists between frequencies of the amplitude and energy, we are not surprised to 
find definite energies associated with electrons bound in atoms. 


2-6 Philosophical implications 

Let us consider briefly some philosophical implications of quantum mechanics. 
As always, there are two aspects of the problem: one is the philosophical implica- 
tion for physics, and the other is the extrapolation of philosophical matters to 
other fields. When philosophical ideas associated with science are dragged into 
another field, they are usually completely distorted. Therefore we shall confine 
our remarks as much as possible to physics itself. 

First of all, the most interesting aspect is the idea of the uncertainty principle; 
making an observation affects the phenomenon. It has always been known that 
making observations affects a phenomenon, but the point is that the effect cannot 
be disregarded or minimized or decreased arbitrarily by rearranging the apparatus. 
When we look for a certain phenomenon we cannot help but disturb it in a certain 
minimum way, and the disturbance is necessary for the consistency of the viewpoint. 
The observer was sometimes important in prequantum physics, but only in a 
trivial sense. The problem has been raised: if a tree falls in a forest and there 
is nobody there to hear it, does it make a noise? A real tree falling in a real forest 
makes a sound, of course, even if nobody is there. Even if no one is present to hear 
it, there are other traces left. The sound will shake some leaves, and if we were 
careful enough we might find somewhere that some thorn had rubbed against a 
leaf and made a tiny scratch that could not be explained unless we assumed the 
leaf were vibrating. So in a certain sense we would have to admit that there is a 
sound made. We might ask: was there a sensation of sound? No, sensations have 
to do, presumably, with consciousness. And whether ants are conscious and 
whether there were ants in the forest, or whether the tree was conscious, we do not 
know. Let us leave the problem in that form. 

Another thing that people have emphasized since quantum mechanics was 
developed is the idea that we should not speak about those things which we cannot 
measure. (Actually relativity theory also said this.) Unless a thing can be defined 
by measurement, it has no place in a theory. And since an accurate value of the 
momentum of a localized particle cannot be defined by measurement it therefore 
has no place in the theory. The idea that this is what was the matter with classical 
theory is a false position. It is a careless analysis of the situation. Just because we 
cannot measure position and momentum precisely does not a priori mean that we 
cannot talk about them. It only means that we need not talk about them. The 
situation in the sciences is this: A concept or an idea which cannot be measured 
or cannot be referred directly to experiment may or may not be useful. It need 
not exist in a theory. In other words, suppose we compare the classical theory of 
the world with the quantum theory of the world, and suppose that it is true ex- 
perimentally that we can measure position and momentum only imprecisely. The 
question is whether the ideas of the exact position of a particle and the exact 
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momentum of a particle are valid or not. The classical theory admits the ideas; 
the quantum theory does not. This does not in itself mean that classical physics 
is wrong. When the new quantum mechanics was discovered, the classical people — 
which included everybody except Heisenberg, Schrodinger, and Born — said: 
“Look, your theory is not any good because you cannot answer certain questions 
like: what is the exact position of a particle?, which hole does it go through?, 
and some others.” Heisenberg’s answer was: “I do not need to answer such ques- 
tions because you cannot ask such a question experimentally.” It is that we do 
not have to. Consider two theories (a) and (b); (a) contains an idea that cannot be 
checked directly but which is used in the analysis, and the other, (b), does not 
contain the idea. If they disagree in their predictions, one could not claim that 
(b) is false because it cannot explain this idea that is in (a), because that idea is 
one of the things that cannot be checked directly. It is always good to know which 
ideas cannot be checked directly, but it is not necessary to remove them all. It is 
not true that we can pursue science completely by using only those concepts which 
are directly subject to experiment. 

In quantum mechanics itself there is a probability amplitude, there is a 
potential, and there are many constructs that we cannot measure directly. The basis 
of a science is its ability to predict . To predict means to tell what will happen in an 
experiment that has never been done. How can we do that? By assuming that we 
know what is there, independent of the experiment. We must extrapolate the 
experiments to a region where they have not been done. We must take our con- 
cepts and extend them to places where they have not yet been checked. If we do 
not do that, we have no prediction. So it was perfectly sensible for the classical 
physicists to go happily along and suppose that the position — which obviously 
means something for a baseball — meant something also for an electron. It was 
not stupidity. It was a sensible procedure. Today we say that the law of relativity 
is supposed to be true at all energies, but someday somebody may come along and 
say how stupid we were. We do not know where we are “stupid” until we “stick 
our neck out,” and so the whole idea is to put our neck out. And the only way to 
find out that we are wrong is to find out what our predictions are. It is absolutely 
necessary to make constructs. 

We have already made a few remarks about the indeterminacy of quantum 
mechanics. That is, that we are unable now to predict what will happen in physics 
in a given physical circumstance which is arranged as carefully as possible. If 
we have an atom that is in an excited state and so is going to emit a photon, we 
cannot say when it will emit the photon. It has a certain amplitude to emit the 
photon at any time, and we can predict only a probability for emission; we cannot 
predict the future exactly. This has given rise to all kinds of nonsense and questions 
on the meaning of freedom of will, and of the idea that the world is uncertain. 

Of course we must emphasize that classical physics is also indeterminate, in a 
sense. It is usually thought that this indeterminacy, that we cannot predict the 
future, is an important quantum-mechanical thing, and this is said to explain the 
behavior of the mind, feelings of free will, etc. But if the world were classical — if 
the laws of mechanics were classical — it is not quite obvious that the mind would 
not feel more or less the same. It is true classically that if we knew the position and 
the velocity of every particle in the world, or in a box of gas, we could predict ex- 
actly what would happen. And therefore the classical world is deterministic. 
Suppose, however, that we have a finite accuracy and do not know exactly where 
just one atom is, say to one part in a billion. Then as it goes along it hits another 
atom, and because we did not know the position better than to one part in a billion, 
we find an even larger error in the position after the collision. And that is amplified, 
of course, in the next collision, so that if we start with only a tiny error it rapidly 
magnifies to a very great uncertainty. To give an example: if water falls over a dam, 
it splashes. If we stand nearby, every now and then a drop will land on our nose. 
This appears to be completely random, yet such a behavior would be predicted 
by purely classical laws. The exact position of all the drops depends upon the 
precise wigglings of the water before it goes over the dam. How? The tiniest 
irregularities are magnified in falling, so that we get complete randomness. Ob- 
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viously, we cannot really predict the position of the drops unless we know the 
motion of the water absolutely exactly. 

Speaking more precisely, given an arbitrary accuracy, no matter how precise, 
one can find a time long enough that we cannot make predictions valid for that 
long a time. Now the point is that this length of time is not very large. It is not 
that the time is millions of years if the accuracy is one part in a billion. The time 
goes, in fact, only logarithmically with the error, and it turns out that in only a 
very, very tiny time we lose all our information. If the accuracy is taken to be one 
part in billions and billions and billions — no matter how many billions we wish, 
provided we do stop somewhere — then we can find a time less than the time it 
took to state the accuracy — after which we can no longer predict what is going 
to happen! It is therefore not fair to say that from the apparent freedom and 
indeterminacy of the human mind, we should have realized that classical “deter- 
ministic” physics could not ever hope to understand it, and to welcome quantum 
mechanics as a release from a “completely mechanistic” universe. For already in 
classical mechanics there was indeterminability from a practical point of view. 
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3 


Probability Amplitudes 


3-1 The laws for combining amplitudes 

When Schrodinger first discovered the correct laws of quantum mechanics, 3-1 
he wrote an equation which described the amplitude to find a particle in various 
places. This equation was very similar to the equations that were already known 3-2 
to classical physicists— equations that they had used in describing the motion of 3-3 

air in a sound wave, the transmission of light, and so on. So most of the time at 3-4 

the beginning of quantum mechanics was spent in solving this equation. But at the 
same time an understanding was being developed, particularly by Born and Dirac, 
of the basically new physical ideas behind quantum mechanics. As quantum 
mechanics developed further, it turned out that there were a large number of things 
which were not directly encompassed in the Schrodinger equation — such as the 
spin of the electron, and various relativistic phenomena. Traditionally, all courses 
in quantum mechanics have begun in the same way, retracing the path followed in 
the historical development of the subject. One first learns a great deal about clas- 
sical mechanics so that he will be able to understand how to solve the Schrodinger 
equation. Then he spends a long time working out various solutions. Only after 
a detailed study of this equation does he get to the “advanced” subject of the 
electron’s spin. 

We had also originally considered that the right way to conclude these lectures 
on physics was to show how to solve the equations of classical physics in compli- 
cated situations — such as the description of sound waves in enclosed regions, modes 
of electromagnetic radiation in cylindrical cavities, and so on. That was the original 
plan for this course. However, we have decided to abandon that plan and to give 
instead an introduction to the quantum mechanics. We have come to the con- 
clusion that what are usually called the advanced parts of quantum mechanics are, 
in fact, quite simple. The mathematics that is involved is particularly simple, 
involving simple algebraic operations and no differential equations or at most 
only very simple ones. The only problem is that we must jump the gap of no 
longer being able to describe the behavior in detail, of particles in space. So this 
is what we are going to try to do; to tell you about what conventionally would be 
called the “advanced” parts of quantum mechanics. But they are, we assure you, 
by all odds the simplest parts — in a deep sense of the word — as well as the most 
basic parts. This is frankly a pedagogical experiment; it has never been done 
before, as far as we know. 

In this subject we have, of course, the difficulty that the quantum mechanical 
behavior of things is quite strange. Nobody has an everyday experience to lean 
on to get a rough, intuitive idea of what will happen. So there are two ways of 
presenting the subject; We could either describe what can happen in a rather 
rough physical way, telling you more or less what happens without giving the 
precise laws of everything; or we could, on the other hand, give the precise laws 
in their abstract form. But, then because of the abstractions, you wouldn’t know 
what they were all about, physically. The latter method is unsatisfactory because 
it is completely abstract, and the first way leaves an uncomfortable feeling because 
one doesn’t know exactly what is true and what is false. We are not sure how to 
overcome this difficulty. You will notice, in fact, that Chapters 1 and 2 showed 
this problem. The first chapter was relatively precise; but the second chapter was 
a rough description of the characteristics of different phenomena. Here, we will 
try to find a happy medium between the two extremes. 


The laws for combining 
amplitudes 

The two-slit interference pattern 
Scattering from a crystal 
Identical particles 
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Fig 3-1. Interference experiment with electrons. 


We will begin in this chapter by dealing with some general quantum me- 
chanical ideas. Some of the statements will be quite precise, others only partially 
precise. It will be hard to tell you as we go along which is which, but by the time 
you have finished the rest of the book, you will understand in looking back which 
parts hold up and which parts were only explained roughly. The chapters which 
follow this one will not be so imprecise. In fact, one of the reasons we have tried 
carefully to be precise in the succeeding chapters is so that we can show you one of 
the most beautiful things about quantum mechanics — how much can be deduced 
from so little. 

We begin by discussing again the superposition of probability amplitudes. 
As an example we will refer to the experiment described in Chapter 1 , and shown 
again here in Fig. 3-1. There is a source s of particles, say electrons; then there 
is a wall with two slits in it; after the wall, there is a detector located at some 
position x. We ask for the probability that a particle will be found at x. Our first 
general principle in quantum mechanics is that the probability that a particle will 
arrive at x, when let out at the source s , can be represented quantitatively by the 
absolute square of a complex number called a probability amplitude — in this case, 
the “amplitude that a particle from 5 will arrive at x.” We will use such amplitudes 
so frequently that we will use a shorthand notation — invented by Dirac and 
generally used in quantum mechanics — to represent this idea. We write the proba- 
bility amplitude this way: 

{Particle arrives at x ] particle leaves s). (3.1) 

In other words, the two brackets { ) are a sign equivalent to “the amplitude that”; 
the expression at the right of the vertical line always gives the starting condition, 
and the one at the left, the final condition. Sometimes it will also be convenient to 
abbreviate still more and describe the initial and final conditions by single letters. 
For example, we may on occasion write the amplitude (3.1) as 

{x | 5 ). (3.2) 

We want to emphasize that such an amplitude is, of course, just a single number — 
a complex number. 

We have already seen in the discussion of Chapter 1 that when there are two 
ways for the particle to reach the detector, the resulting probability is not the 
sum of the two probabilities, but must be written as the absolute square of the 
sum of two amplitudes. We had that the probability that an electron arrives at the 
detector when both paths are open is 

Pl2 = |<£l + 02| 2 * 0*3) 
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Fig. 3-2. A more complicated inter- 
ference experiment. 
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We wish now to put this result in terms of our new notation. First, however, we 
want to state our second general principle of quantum mechanics: When a particle 
can reach a given state by two possible routes, the total amplitude for the process 
is the sum of the amplitudes for the two routes considered separately. In our new 
notation we write that 


C*" 1 s)both holes open (pt | ^)through 1 {-V | -^through 2* 0*4) 

Incidentally, we are going to suppose that the holes 1 and 2 are small enough that 
when we say an electron goes through the hole, we don’t have to discuss which part 
of the hole. We could, of course, split each hole into pieces with a certain amplitude 
that the electron goes to the top of the hole and the bottom of the hole and so on. 
We will suppose that the hole is small enough so that we don’t have to worry about 
this detail. That is part of the roughness involved; the matter can be made more 
precise, but we don’t want to do so at this stage. 

Now we want to write out in more detail what we can say about the amplitude 
for the process in which the electron reaches the detector at x by way of hole 1 . 
We can do that by using our third general principle : When a particle goes by some 
particular route the amplitude for that route can be written as the product of the 
amplitude to go part way with the amplitude to go the rest of the way. For the 
setup of Fig. 3-1 the amplitude to go from s to x by way of hole 1 is equal to the 
amplitude to go from s to 1, multiplied by the amplitude to go from 1 to x. 

(x | s)via 1 = (x | 1)(1 I s). (3.5) 

Again this result is not completely precise. We should also include a factor for the 
amplitude that the electron will get through the hole at 1 ; but in the present case 
it is a simple hole, and we will take this factor to be unity. 

You will note that Eq. (3.5) appears to be written in reverse order. It is to 
be read from right to left: The electron goes from 5 to 1 and then from 1 to x. 
In summary, if events occur in succession — that is, if you can analyze one of the 
routes of the particle by saying it does this, then it does this, then it does that — the 
resultant amplitude for that route is calculated by multiplying in succession the 
amplitude for each of the successive events. Using this law we can rewrite Eq. 
(3.4) as 

(x I s)both = {x I 1)(1 I 5) + {x I 2)(2 j s). 

Now we wish to show that just using these principles we can calculate a much 
more complicated problem like the one shown in Fig. 3-2. Here we have two 
walls, one with two holes, 1 and 2, and another which has three holes, a , b , and c. 
Behind the second wall there is a detector at x, and we want to know the amplitude 
for a particle to arrive there. Well, one way you can find this is by calculating the 
superposition, or interference, of the waves that go through; but you can also do 
it by saying that there are six possible routes and superposing an amplitude for 
each. The electron can go through hole 1, then through' hole a , and then to x; or 
it could go through hole 1, then through hole b y and then to x; and so on. Accord- 
ing to our second principle, the amplitudes for alternative routes add, so we should 
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be able to write the amplitude from s to x as a sum of six separate amplitudes. 
On the other hand, using the third principle, each of these separate amplitudes 
can be written as a product of three amplitudes. For example, one of them is the 
amplitude for s to 1, times the amplitude for 1 to a , times the amplitude for a to x. 
Using our shorthand notation, we can write the complete amplitude to go from 
s to x as 

<*!*>= (* I a )(a | 1)(1 | j ) + (x | b)(b | 1)(1 | s ) + ••• + <* I c){c | 2)(2 | s). 
We can save writing by using the summation notation 

(x I s) = I “)<“ I *><* I S). (3.6) 

1=1,2 

a=a, b, c 

In order to make any calculations using these methods, it is, naturally, neces- 
sary to know the amplitude to get from one place to another. We will give a rough 
idea of a typical amplitude. It leaves out certain things like the polarization of 
light or the spin of the electron, but aside from such features it is quite accurate. 
We give it so that you can solve problems involving various combinations of slits. 
Suppose a particle with a definite energy is going in empty space from a location 
#*! to a location r 2 - In other words, it is a free particle with no forces on it. Except 
for a numerical factor in front, the amplitude to go from to is 

JP tr 12/ft 

<'2 i n> - — * (3.7) 

r 12 

where r 12 = — »T, an d p is the momentum which is related to the energy E 

by the relativistic equation 

p 2 c 2 = E 2 - (m 0 c 2 ) 2 , 
or the nonrelativistic equation 


= Kinetic energy. 

Equation (3.7) says in effect that the particle has wavelike properties, the amplitude 
propagating as a wave with a wave number equal to the momentum divided by h. 

In the most general case, the amplitude and the corresponding probability 
will also involve the time. For most of these initial discussions we will suppose 
that the source always emits the particles with a given energy so we will not need to 
worry about the time. But we could, in the general case, be interested in some 
other questions. Suppose that a particle is liberated at a certain place P at a certain 
time, and you would like to know the amplitude for it to arrive at some location, 
say r, at some later time. This could be represented symbolically as the amplitude 
(r, t ~ t\ \ P, t = 0). Clearly, this will depend upon both r and t. You will get 
different results if you put the detector in different places and measure at different 
times. This function of r and t , in general, satisfies a differential equation which is 
a wave equation. For example, in a nonrelativistic case it is the Schrodinger equa- 
tion. One has then a wave equation analogous to the equation for electromagnetic 
waves or waves of sound in a gas. However, it must be emphasized that the wave 
function that satisfies the equation is not like a real wave in space; one cannot 
picture any kind of reality to this wave as one does for a sound wave. 

Although one may be tempted to think in terms of “particle waves” when 
dealing with one particle, it is not a good idea, for if there are, say, two particles, 
the amplitude to find one at r x and the other at r 2 is not a simple wave in three- 
dimensional space, but depends on the six space variables r x and r 2 . If we are, 
for example, dealing with two (or more) particles, we will need the following 
additional principle: Provided that the two particles do not interact, the amplitude 
that one particle will do one thing and the other one something else is the product 
of the two amplitudes that the two particles would do the two things separately. 
For example, if (a \ s\) is the amplitude for particle 1 to go from to a, and (b \ s 2 ) 
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is the amplitude for particle 2 to go from s 2 to b , the amplitude that both things 
will happen together is 

(a | si)<6 | s 2 >. 


There is one more point to emphasize. Suppose that we didn’t know where 
the particles in Fig. 3-2 come from before arriving at holes 1 and 2 of the first 
wall. We can still make a prediction of what will happen beyond the wall (for 
example, the amplitude to arrive at x) provided that we are given two numbers: 
the amplitude to have arrived at 1 and the amplitude to have arrived at 2. In other 
words, because of the fact that the amplitude for successive events multiplies, as 
shown in Eq. (3.6), all you need to know to continue the analysis is two numbers — 
in this particular case (1 | s) and (2 | s). These two complex numbers are enough 
to predict all the future. That is what really makes quantum mechanics easy. It 
turns out that in later chapters we are going to do just such a thing when we specify 
a starting condition in terms of two (or a few) numbers. Of course, these numbers 
depend upon where the source is located and possibly other details about the 
apparatus, but given the two numbers, we do not need to know any more about 
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Fig. 3-3. An experiment to deter- 
mine which hole the electron goes through. 


3-2 The two-slit interference pattern 

Now we would like to consider a matter which was discussed in some detail 
in Chapter 1. This time we will do it with the full glory of the amplitude idea 
to show you how it works out. We take the same experiment shown in Fig. 
3-1, but now with the addition of a light source behind the two holes, as shown 
in Fig. 3-3. In Chapter 1, we discovered the following interesting result. If 
we looked behind slit 1 and saw a photon scattered from there, then the distribu- 
tion obtained for the electrons at x in coincidence with these photons was the same 
as though slit 2 were closed. The total distribution for electrons that had been 
“seen” at either slit 1 or slit 2 was the sum of the separate distributions and was 
completely different from the distribution with the light turned off. This was true 
at least if we used light of short enough wavelength. If the wavelength was made 
longer so we could not be sure at which hole the scattering had occurred, the 
distribution became more like the one with the light turned off. 

Let’s examine what is happening by using our new notation and the principles 
of combining amplitudes. To simplify the writing, we can again let stand for 
the amplitude that the electron will arrive at x by way of hole 1, that is, 

4>i = (x\ 1><1 | s). 

Similarly, we’ll let <f> 2 stand for the amplitude that the electron gets to the detector 
by way of hole 2 : 

( 2><2 | s). 


These are the amplitudes to go through the two holes and arrive at x if there is no 
light. Now if there is light, we ask ourselves the question: What is the amplitude 
for the process in which the electron starts at s and a photon is liberated by the 
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light source L, ending with the electron at x and a photon seen behind slit 1 ? 
Suppose that we observe the photon behind slit 1 by means of a detector D u as 
shown in Fig. 3-3, and use a similar detector D 2 to count photons scattered 
behind hole 2. There will be an amplitude for a photon to arrive at D x and an 
electron at x, and also an amplitude for a photon to arrive at Z) 2 and an electron 
at x. Let’s try to calculate them. 

Although we don’t have the correct mathematical formula for all the factors 
that go into this calculation, you will see the spirit of it in the following discussion. 
First, there is the amplitude (1 | s) that an electron goes from the source to hole 1. 
Then we can suppose that there is a certain amplitude that while the electron is at 
hole 1 it scatters a photon into the detector Let us represent this amplitude by 
a. Then there is the amplitude (x | 1) that the electron goes from slit 1 to the elec- 
tron detector at x. The amplitude that the electron goes from s to x via slit 1 and 
scatters a photon into D t is then 


(x | 1 ) a ( 1 ] s). 

Or, in our previous notation, it is just a(t> x . 

There is also some amplitude that an electron going through slit 2 will scatter 
a photon into counter D x . You say, “That’s impossible; how can it scatter into 
counter Z>i if it is only looking at hole 1 ?’’ If the wavelength is long enough, there 
are diffraction effects, and it is certainly possible. If the apparatus is built well and 
if we use photons of short wavelength, then the amplitude that a photon will be 
scattered into detector 1, from an electron at 2 is very small. But to keep the 
discussion general we want to take into account that there is always some such 
amplitude, which we will call b. Then the amplitude that an electron goes via 
slit 2 and scatters a photon into D x is 

(x | 2) b <2 \s) = b <$> 2 . 


The amplitude to find the electron at x and the photon in D x is the sum of 
two terms, one for each possible path for the electron. Each term is in turn made 
up of two factors; first, that the electron went through a hole, and second, that the 
photon is scattered by such an electron into detector ] ; we have 



Fig. 3-4. The probability of count- 
ing an electron at x in coincidence with a 
photon at D in the experiment of Fig. 
3 3: (a) for b -- 0; (b) for b — a; (c) 
for 0 < b < a. 


'electron at x 
v photon at D x 


electron from s\ , 

photon from U ~ a + l + 


(3.8) 


We can get a similar expression when the photon is found in the other detector 
D 2 . If we assume for simplicity that the system is symmetrical, then a is also the 
amplitude for a photon in D 2 when an electron passes through hole 2, and b is 
the amplitude for a photon in D 2 when the electron passes through hole 1. The 
corresponding total amplitude for a photon at D 2 and an electron at x is 


/electron at x electron from a\ __ , 

\photon at D > \ photon from U a ^ 2 


Now we are finished. We can easily calculate the probability for various 
situations. Suppose that we want to know with what probability we get a count 
in Z>i and an electron at x. That will be the absolute square of the amplitude 
given in Eq. (3.8), namely, just \atj>i -f- b<t> 2 1 2 . Let’s look more carefully at this 
expression. First of all, if b is zero — which is the way we would like to design the 
apparatus — then the answer is simply \<j>i\ 2 diminished in total amplitude by the 
factor |a| 2 . This is the probability distribution that you would get if there were 
only one hole — as shown in the graph of Fig. 3-4(a). On the other hand, if the 
wavelength is very long, the scattering behind hole 2 into D x may be just about 
the same as for hole 1, Although there may be some phases involved in a and b f 
we can ask about a simple case in which the two phases are equal. If a is practically 
equal to b, then the total probability becomes \<j>i + 4> 2 \ 2 multiplied by |«| 2 , 
since the common factor a can be taken out. This, however, is just the probability 
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light source L, ending with the electron at x and a photon seen behind slit 1 ? 
Suppose that we observe the photon behind slit 1 by means of a detector D u as 
shown in Fig. 3-3, and use a similar detector D 2 to count photons scattered 
behind hole 2. There will be an amplitude for a photon to arrive at D x and an 
electron at x, and also an amplitude for a photon to arrive at Z) 2 and an electron 
at x. Let’s try to calculate them. 

Although we don’t have the correct mathematical formula for all the factors 
that go into this calculation, you will see the spirit of it in the following discussion. 
First, there is the amplitude (1 | s) that an electron goes from the source to hole 1. 
Then we can suppose that there is a certain amplitude that while the electron is at 
hole 1 it scatters a photon into the detector Let us represent this amplitude by 
a. Then there is the amplitude (x | 1) that the electron goes from slit 1 to the elec- 
tron detector at x. The amplitude that the electron goes from s to x via slit 1 and 
scatters a photon into D t is then 


(x | 1 ) a ( 1 ] s). 

Or, in our previous notation, it is just a(t> x . 

There is also some amplitude that an electron going through slit 2 will scatter 
a photon into counter D x . You say, “That’s impossible; how can it scatter into 
counter Z>i if it is only looking at hole 1 ?’’ If the wavelength is long enough, there 
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if we use photons of short wavelength, then the amplitude that a photon will be 
scattered into detector 1, from an electron at 2 is very small. But to keep the 
discussion general we want to take into account that there is always some such 
amplitude, which we will call b. Then the amplitude that an electron goes via 
slit 2 and scatters a photon into D x is 

(x | 2) b <2 \s) = b <$> 2 . 


The amplitude to find the electron at x and the photon in D x is the sum of 
two terms, one for each possible path for the electron. Each term is in turn made 
up of two factors; first, that the electron went through a hole, and second, that the 
photon is scattered by such an electron into detector ] ; we have 



Fig. 3-4. The probability of count- 
ing an electron at x in coincidence with a 
photon at D in the experiment of Fig. 
3 3: (a) for b -- 0; (b) for b — a; (c) 
for 0 < b < a. 


'electron at x 
v photon at D x 


electron from s\ , 

photon from U ~ a + l + 


(3.8) 


We can get a similar expression when the photon is found in the other detector 
D 2 . If we assume for simplicity that the system is symmetrical, then a is also the 
amplitude for a photon in D 2 when an electron passes through hole 2, and b is 
the amplitude for a photon in D 2 when the electron passes through hole 1. The 
corresponding total amplitude for a photon at D 2 and an electron at x is 


/electron at x electron from a\ __ , 

\photon at D > \ photon from U a ^ 2 


Now we are finished. We can easily calculate the probability for various 
situations. Suppose that we want to know with what probability we get a count 
in Z>i and an electron at x. That will be the absolute square of the amplitude 
given in Eq. (3.8), namely, just \atj>i -f- b<t> 2 1 2 . Let’s look more carefully at this 
expression. First of all, if b is zero — which is the way we would like to design the 
apparatus — then the answer is simply \<j>i\ 2 diminished in total amplitude by the 
factor |a| 2 . This is the probability distribution that you would get if there were 
only one hole — as shown in the graph of Fig. 3-4(a). On the other hand, if the 
wavelength is very long, the scattering behind hole 2 into D x may be just about 
the same as for hole 1, Although there may be some phases involved in a and b f 
we can ask about a simple case in which the two phases are equal. If a is practically 
equal to b, then the total probability becomes \<j>i + 4> 2 \ 2 multiplied by |«| 2 , 
since the common factor a can be taken out. This, however, is just the probability 
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distribution we would have gotten without the photons at all. Therefore, in the 
case that the wavelength is very long — and the photon detection ineffective — you 
return to the original distribution curve which shows interference effects, as shown 
in Fig. 3-4(b). In the case that the detection is partially effective, there is an inter- 
ference between a lot of <£i and a little of <j> 2 , and you will get an intermediate 
distribution such as is sketched in Fig. 3-4(c). Needless to say, if we look for 
coincidence counts of photons at D 2 and electrons at x 9 we will get the same kinds 
of results. If you remember the discussion in Chapter 1, you will see that these 
results give a quantitative description of what was described there. 

Now we would like to emphasize an important point so that you will avoid 
a common error. Suppose that you only want the amplitude that the electron ar- 
rives at x, regardless of whether the photon was counted at D x or D 2 . Should you 
add the amplitudes given in Eqs. (3.8) and (3.9)? No! You must never add 
amplitudes for different and distinct final states. Once the photon is accepted by 
one of the photon counters, we can always determine which alternative occurred 
if we want, without any further disturbance to the system. Each alternative has a 
probability completely independent of the other. To repeat, do not add amplitudes 
for different final conditions, where by “final” we mean at that moment the 
probability is desired — that is, when the experiment is “finished.” You do add the 
amplitudes for the different indistinguishable alternatives inside the experiment, 
before the complete process is finished. At the end of the process you may say that 
you “don’t want to look at the photon.” That’s your business, but you still do not 
add the amplitudes. Nature does not know what you are looking at, and she 
behaves the way she is going to behave whether you bother to take down the data 
or not. So here we must not add the amplitudes. We first square the amplitudes 
for all possible different final events and then sum. The correct result for an 
electron at x and a photon at either or D 2 is 


/e at x 

e from s \ 

2 

/e at x 

e from s \ 2 

\ph at D ! 

ph from L/ 

+ 

\ph at D 2 

ph from L/ 


— \a<t>x + b<j> 2 \ 2 + \a<t> 2 + b<t>i\ 2 . (3.10) 


3-3 Scattering from a crystal 


Our next example is a phenomenon in which we have to analyze the inter- 
ference of probability amplitudes somewhat carefully. We look at the process of 
the scattering of neutrons from a crystal. Suppose we have a crystal which has a 
lot of atoms with nuclei at their centers, arranged in a periodic array, and a neutron 
beam that comes from far away. We can label the various nuclei in the crystal by 
an index /, where / runs over the integers 1, 2, 3, ... N, with N equal to the total 
number of atoms. The problem is to calculate the probability of getting a neutron 
into a counter with the arrangement shown in Fig. 3-5. For any particular atom 
/, the amplitude that the neutron arrives at the counter C is the amplitude that the 
neutron gets from the source S to nucleus /, multiplied by the amplitude a that it 
gets scattered there, multiplied by the amplitude that it gets from i to the counter 
C. Let’s write that down : 


NEUTRON 

SOURCE 


crystal 



Fig. 3-5. Measuring the 
of neutrons by a crystal. 


scattering 


(neutron at C | neutron from S) via { = (C \ i) a ( i \ S ). (3.1 1) 

In writing this equation we have assumed that the scattering amplitude a is the 
same for all atoms. We have here a large number of apparently indistinguishable 
routes. They are indistinguishable because a low-energy neutron is scattered from 
a nucleus without knocking the atom out of its place in the crystal — no “record” 
is left of the scattering. According to the earlier discussion, the total amplitude 
for a neutron at C involves a sum of Eq. (3.1 1) over all the atoms; 

N 

(neutron at C | neutron from S) = ^ (C [ i) a ( i | S >. (3.12) 

i— 1 
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Fig. 3-6. The neutron counting rate 
as a function of angle: (a) for spin zero 
nuclei; (b) the probability of scattering 
with spin flip; (c) the observed counting 
rate with a spin one-half nucleus. 


Because we are adding amplitudes of scattering from atoms with different space 
positions, the amplitudes will have different phases giving the characteristic inter- 
ference pattern that we have already analyzed in the case of the scattering of light 
from a grating. 

The neutron intensity as a function of angle in such an experiment is indeed 
often found to show tremendous variations, with very sharp interference peaks 
and almost nothing in between— as shown in Fig. 3-6(a). However, for certain 
kinds of crystals it does not work this way, and there is — along with the interference 
peaks discussed above — a general background of scattering in all directions. We 
must try to understand the apparently mysterious reasons for this. Well, we have 
not considered one important property of the neutron. It has a spin of one-half, 
and so there are two conditions in which it can be: either spin “up” (say perpendicu- 
lar to the page in Fig. 3-5) or spin “down.” If the nuclei of the crystal have no 
spin, the neutron spin doesn't have any effect. But when the nuclei of the crystal 
also have a spin, say a spin of one-half, you will observe the background of smeared- 
out scattering described above. The explanation is as follows. 

If the neutron has one direction of spin and the atomic nucleus has the same 
spin, then no change of spin can occur in the scattering process. If the neutron and 
atomic nucleus have opposite spin, then scattering can occur by two processes, 
one in which the spins are unchanged and another in which the spin directions are 
exchanged. This rule for no net change of the sum of the spins is analogous to our 
classical law of conservation of angular momentum. We can begin to understand 
the phenomenon if we assume that all the scattering nuclei are set up with spins in 
one direction. A neutron with the same spin will scatter with the expected sharp 
interference distribution. What about one with opposite spin? If it scatters without 
spin flip, then nothing is changed from the above; but if the two spins flip over in 
the scattering, we could, in principle, find out which nucleus had done the scatter- 
ing, since it would be the only one with spin turned over. Well, if we can tell which 
atom did the scattering, what have the other atoms got to do with it? Nothing, of 
course. The scattering is exactly the same as that from a single atom. 

To include this effect, the mathematical formulation of Eq. (3.12) must be 
modified since we haven't described the states completely in that analysis. Let’s 
start with all neutrons from the source having spin up and all the nuclei of the 
crystal having spin down. First, we would like the amplitude that at the counter 
the spin of the neutron is up and all spins of the crystal are still down. This is 
not different from our previous discussion. We will let a be the amplitude to 
scatter with no flip or spin. The amplitude for scattering from the ith atom is, of 
course, 


(C up , crystal all down ( S up , crystal all down) = (C | /) a ( i \ S). 

Since all the atomic spins are still down, the various alternatives (different values 
of /) cannot be distinguished. There is clearly no way to tell which atom did the 
scattering. For this process, all the amplitudes interfere. 

We have another case, however, where the spin of the detected neutron is 
down although it started from S with spin up. In the crystal, one of the spins must 
be changed to the up direction — let’s say that of the kth atom. We will assume that 
there is the same scattering amplitude with spin flip for every atom, namely b. 
(In a real crystal there is the disagreeable possibility that the reversed spin moves 
to some other atom, but let’s take the case of a crystal for which this probability 
is very low.) The scattering amplitude is then 


(Cdown, nucleus k up | S up , crystal all down) = (C \ k) b (fc \ S ). (3.13) 


If we ask for the probability of finding the neutron spin down and the A:th nucleus 
spin up, it is equal to the absolute square of this amplitude, which is simply \b\ 2 
times | (C \ k){k \ S)| 2 . The second factor is almost independent of location in the 
crystal, and all phases have disappeared in taking the absolute square. The 
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probability of scattering from any nucleus in the crystal with spin flip is now 

\b\ 2 jr \{c\k){k\s)\\ 

k= 1 

which will show a smooth distribution as in Fig. 3-6(b). 

You may argue, “I don't care which atom is up.” Perhaps you don’t, but 
nature knows; and the probability is, in fact, what we gave above— there isn’t any 
interference. On the other hand, if we ask for the probability that the spin is up at 
the detector and all the atoms still have spin down, then we must take the absolute 
square of 

E (C\i)a(i\S). 

1=1 

Since the terms in this sum have phases, they do interfere, and we get a sharp 
interference pattern. If we do an experiment in which we don’t observe the spin 
of the detected neutron, then both kinds of events can occur; and the separate 
probabilities add. The total probability (or counting rate) as a function of angle 
then looks like the graph in Fig. 3-6(c). 

Let’s review the physics of this experiment. If you could, in principle , distin- 
guish the alternative//^// states (even though you do not bother to do so), the total, 
final probability is obtained by calculating the probability for each state (not the 
amplitude) and then adding them together. If you cannot distinguish the final 
states even in principle , then the probability amplitudes must be summed before 
taking the absolute square to find the actual probability. The thing you should 
notice particularly is that if you were to try to represent the neutron by a wave 
alone, you would get the same kind of distribution for the scattering of a down- 
spinning neutron as for an up-spinning neutron. You would have to say that the 
“wave” would come from all the different atoms and interfere just as for the up- 
spinning one with the same wavelength. But we know that is not the way it works. 
So as we stated earlier, we must be careful not to attribute too much reality to the 
waves in space. They are useful for certain problems but not for all. 

3-4 Identical particles 

The next experiment we will describe is one which shows one of the beautiful 
consequences of quantum mechanics. It again involves a physical situation in 
which a thing can happen in two indistinguishable ways, so that there is an inter- 
ference of amplitudes — as is always true in such circumstances. We are going to 
discuss the scattering, at relatively low energy, of nuclei on other nuclei. We 
start by thinking of a-particles (which, as you know, are helium nuclei) bombarding, 
say, oxygen. To make it easier for us to analyze the reaction, we will look at it in 
the center-of-mass system, in which the oxygen nucleus and the a-particle have 
their velocities in opposite directions before the collision and again in exactly 
opposite directions after the collision. See Fig. 3-7 (a). (The magnitudes of the 
velocities are, of course, different, since the masses are different.) We will also 
suppose that there is conservation of energy and that the collision energy is low 
enough that neither particle is broken up or left in an excited state. The reason that 
the two particles deflect each other is, of course, that each particle carries a positive 
charge and, classically speaking, there is an electrical repulsion as they go by. 
The scattering will happen at different angles with different probabilities, and we 
would like to discuss something about the angle dependence of such scatterings. 
(It is possible, of course, to calculate this thing classically, and it is one of the most 
remarkable accidents of quantum mechanics that the answer to this problem 
comes out the same as it does classically. This is a curious point because it happens 
for no other force except the inverse square law — so it is indeed an accident.) 

The probability of scattering in different directions can be measured by an 
experiment as shown in Fig. 3-7(a). The counter at position 1 could be designed 
to detect only a-particles; the counter at position 2 could be designed to detect 
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Fig. 3-7. The scattering of a-particles from oxygen nuclei, as seen in the center-of-mass system. 


only oxygen — just as a check. (In the laboratory system the detectors would not 
be opposite; but in the CM system they are.) Our experiment consists in measuring 
the probability of scattering in various directions. Let’s call /(0) the amplitude to 
scatter into the counters when they are at the angle 0 ; then |/(0)| 2 will be our 
experimentally determined probability. 

Now we could set up another experiment in which our counters would respond 
to either the a-particle or the oxygen nucleus. Then we have to work out what 
happens when we do not bother to distinguish which particles are counted. Of 
course, if we are to get an oxygen in the position 0, there must be an a-particle on 
the opposite side at the angle (7T — 0), as shown in Fig. 3-7(b). So if / (0) is the 
amplitude for a-scattering through the angle 0, then f{ir — 0) is the amplitude 
for oxygen scattering through the angle 0. f Thus, the probability for having 
some particle in the detector at position 1 is: 

Probability of some particle in D x = |/(0)| 2 + |/(7r — 0)| 2 . (3.14) 

Note that the two states are distinguishable in principle. Even though in this 
experiment we do not distinguish them, we could. According to the earlier dis- 
cussion, then, we must add the probabilities, not the amplitudes. 

The result given above is correct for a variety of target nuclei — for a-particles 
on oxygen, on carbon, on beryllium, on hydrogen. But it is wrong for a-particles on 
a-particles. For the one case in which both particles are exactly the same, the 
experimental data disagree with the prediction of (3.14). For example, the 
scattering probability at 90° is exactly twice what the above theory predicts and 
has nothing to do with the particles being “helium” nuclei. If the target is He^, 
but the projectiles are a-particles (He 4 ), then there is agreement. Only when the 
target is He 4 — so its nuclei are identical with the incoming a-particle — does the 
scattering vary in a peculiar way with angle. 

Perhaps you can already see the explanation. There are two ways to get an 
a-particle into the counter: by scattering the bombarding a-particle at an angle 0, 
or by scattering it at an angle of (it — 0). How can we tell whether the bombarding 
particle or the target particle entered the counter? The answer is that we cannot. 
In the case of a-particles with a-particles there are two alternatives that cannot be 
distinguished. Here, we must let the probability amplitudes interfere by addition, 

f In general, a scattering direction should, of course, be described by two angles, the 
polar angle (#>, as well as the azimuthal angle 0. We would then say that an oxygen nucleus 
at (0, </ >) means that the a-particle is at (7r — 0, </> + 7r). However, for Coulomb scattering 
(and for many other cases), the scattering amplitude is independent of 4>. Then the ampli- 
tude to get an oxygen at 0 is the same as the amplitude to get the a-particle at (7 r — 0). 
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Fig. 3-8. The scattering of electrons on electrons. If the incoming electrons have parallel spins, the 
processes (a) and (b) are indistinguishable. 


and the probability of finding an a-particle in the counter is the square of their sum : 

Probability of an a-particle at D x = \f{8) + f(jr — 6) | 2 . (3.15) 

This is quite a different result than that in Eq. (3.14). We can take an angle 
of 7 t/ 2 as an example, because it is easy to figure out. For 8 = x/2, we obviously 
have f(6) = f(ir — 8 ), so the probability in Eq. (3.15) becomes |/(7r/2) + 
/( tt/2)| 2 = 4|/(7t/2)| 2 . 

On the other hand, if they did not interfere, the result of Eq. (3.14) gives 
only 2|/(7t/2)| 2 . So there is twice as much scattering at 90° as we might have 
expected. Of course, at other angles the results will also be different. And so you 
have the unusual result that when particles are identical, a certain new thing hap- 
pens that doesn’t happen when particles can be distinguished. In the mathematical 
description you must add the amplitudes for alternative process in which the two 
particles simply exchange roles and there is an interference. 

An even more perplexing thing happens when we do the same kind of experi- 
ment by scattering electrons on electrons, or protons on protons. Neither of the 
above results is then correct! For these particles, we must invoke still a new rule, 
a most peculiar rule, which is the following: When you have a situation in which 
the identity of the electron which is arriving at a point is exchanged with another 
one, the new amplitude interferes with the old one with an opposite phase . It is 
interference all right, but with a minus sign. In the case of a-particles, when you 
exchange the a-particle entering the detector, the interfering amplitudes interfere 
with the positive sign. In the case of electrons , the interfering amplitudes for exchange 
interfere with a negative sign . Except for another detail to be discussed below, the 
proper equation for electrons in an experiment like the one shown in Fig. 3-8 is 

Probability of eat Di = | f(8) — /( ir — 6) | 2 . (3.16) 

The above statement must be qualified, because we have not considered the 
spin of the electron (a-particles have no spin). The electron spin may be considered 
to be either “up” or “down” with respect to the plane of the scattering. If the 
energy of the experiment is low enough, the magnetic forces due to the currents 
will be small and the spin will not be affected. We will assume that this is the case 
for the present analysis, so that there is no chance that the spins are changed during 
the collision. Whatever spin the electron has, it carries along with it. Now you 
see there are many possibilities. The bombarding and target particles can have 
both spins up, both down, or opposite spins. If both spins are up, as in Fig. 3-8 
(or if both spins are down), the same will be true of the recoil particles and the 
amplitude for the process is the difference of the amplitudes for the two possibilities 
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Fig. 3-9. The scattering of electrons with antiparallel spins. 


shown in Fig. 3-8(a) and (b). The probability of detecting an electron in D x is 
then given by Eq. (3.16). 

Suppose, however, the “bombarding” spin is up and the “target” spin is down. 
The electron entering counter 1 can have spin up or spin down, and by measuring 
this spin we can tell whether it came from the bombarding beam or from the target. 
The two possibilities are shown in Fig. 3-9(a) and (b); they are distinguishable in 
principle, and hence there will be no interference — merely an addition of the two 
probabilities. The same argument holds if both of the original spins are reversed — 
that is, if the left-hand spin is down and the right-hand spin is up. 

Now if we take our electrons at random — as from a tungsten filament in which 
the electrons are completely unpolarized — then the odds are fifty-fifty that any 
particular electron comes out with spin up or spin down. If we don't bother to 
measure the spin of the electrons at any point in the experiment, we have what we 
call an unpolarized experiment. The results for this experiment are best calculated 
by listing all of the various possibilities as we have done in Table 3-1. A separate 
probability is computed for each distinguishable alternative. The total probability 
is then the sum of all the separate probabilities. Note that for unpolarized beams 
the result for 6 = t/2 is one-half that of the classical result with independent 
particles. The behavior of identical particles has many interesting consequences; 
we will discuss them in greater detail in the next chapter. 

Table 3-1 

Scattering of unpolarized spin one-half particles 


Fraction 
of cases 

Spin of 
particle 1 

Spin of 
particle 2 

Spin at 
Di 

Spin at 

d 2 

Probability 

i 

4 

up 

up 

up 

up 

!N 

S' 

1 

1 

i 

4 

down 

down 

down 

down 

1/(0) - /(ir - 0)| 2 

t 

4 

up 

down | 

up 

down 

down 

up 

1/(0) 2 

I/Or - 0)1 2 

1 

4 

down 

up | 

up 

down 

down 

up 

I/Or - 0)| 2 

1/(0) I 2 

Total probability — \ 

m ~ /Or - 

0)1 2 + il/Wl 2 + i|/0r - 0)| 2 
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4 


Identical Particles 


4-1 Bose particles and Fermi particles 


In the last chapter we began to consider the special rules for the interference 
that occurs in processes with two identical particles. By identical particles we 
mean things like electrons which can in no way be distinguished one from another. 
If a process involves two particles that are identical, reversing which one arrives 
at a counter is an alternative which cannot be distinguished and — like all cases of 
alternatives which cannot be distinguished — interferes with the original, un- 
exchanged case. The amplitude for an event is then the sum of the two interfering 
amplitudes; but, interestingly enough, the interference is in some cases with the 
same phase and, in others, with the opposite phase. 

Suppose we have a collision of two particles a and b in which particle a scatters 
in the direction 1 and particle b scatters in the direction 2, as sketched in Fig. 
4-1 (a). Let’s call /(0) the amplitude for this process; then the probability P\ of 
observing such an event is proportional to |/(0)| 2 Of course, it could also happen 
that particle b scattered into counter 1 and particle a went into counter 2, as shown 
in Fig. 4-1 (b). Assuming that there are no special directions defined by spins 
or such, the probability P 2 for this process is just |/(x — 0)| 2 , because it is just 
equivalent to the first process with counter 1 moved over to the angle x — 0. 
You might also think that the amplitude for the second process is just /(x — 0). 
But that is not necessarily so, because there could be an arbitrary phase factor. 
That is, the amplitude could be 

e u f(ir - 0). 


4-1 Bose particles and Fermi 
particles 

4-2 States with two Bose particles 

4-3 States with n Bose particles 
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4-7 The exclusion principle 


Review: Blackbody radiation in: 

Chapter 41 , Yol. I, The Brown- 
ian Movement 

Chapter 42, Vol. I, Applica- 
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Such an amplitude still gives a probability P 2 equal to |/(x — 0)| 2 

Now let’s see what happens if a and b are identical particles. Then the two 
different processes shown in the two diagrams of Fig. 4-1 cannot be distinguished. 
There is an amplitude that either a or b goes into counter 1, while the other goes 
into counter 2. This amplitude is the sum of the amplitudes for the two processes 
shown in Fig. 4-1. If we call the first one /(0), then the second one is e l5 /(x — 0), 
where now the phase factor is very important because we are going to be adding 
two amplitudes. Suppose we have to multiply the amplitude by a certain phase 
factor when we exchange the roles of the two particles. If we exchange them 
again we should get the same factor again. But we are then back to the first process. 



4-1 




Fig. 4-2. The scattering of two a-particles. In (a) the two particles retain their 
identity; in (b) a neutron is exchanged during the collision. 


The phase factor taken twice must bring us back where we started — its square 
must be equal to I . There are only two possibilities: e a is equal to + 1 , or is equal 
to — 1. Either the exchanged case contributes with the same sign, or it contributes 
with the opposite sign. Both cases exist in nature, each for a different class of par- 
ticles. Particles which interfere with a positive sign are called Bose particles and 
those which interfere with a negative sign are called Fermi particles . The Bose 
particles are the photon, the mesons, and the graviton. The Fermi particles are 
the electron, the muon, the neutrinos, the nucleons, and the baryons. We have, 
then, that the amplitude for the scattering of identical particles is: 

Bose particles: 

(Amplitude direct) + (Amplitude exchanged). (4.1) 

Fermi particles: 

(Amplitude direct) — (Amplitude exchanged). (4.2) 

For particles with spin— like electrons — there is an additional complication. 
We must specify not only the location of the particles but the direction of their spins. 
It is only for identical particles with identical spin states that the amplitudes interfere 
when the particles are exchanged. If you think of the scattering of unpolarized 
beams — which are a mixture of different spin states — there is some extra arithmetic. 

Now an interesting problem arises when there are two or more particles bound 
tightly together. For example, an a-particle has four particles in it — two neutrons 
and two protons. When two a-particles scatter, there are several possibilities. 
It may be that during the scattering there is a certain amplitude that one of the 
neutrons will leap across from one a-particle to the other, while a neutron from the 
other a-particle leaps the other way so that the two alphas which come out of the 
scattering are not the original ones — there has been an exchange of a pair of 
neutrons. See Fig. 4-2. The amplitude for scattering with an exchange of a pair 
of neutrons will interfere with the amplitude for scattering with no such exchange, 
and the interference must be with a minus sign because there has been an exchange 
of one pair of Fermi particles. On the other hand, if the relative energy of the two 
a-particles is so low that they stay fairly far apart — say, due to the Coulomb 
repulsion — and there is never any appreciable probability of exchanging any of 
the internal particles, we can consider the a-particle as a simple object, and we do 
not need to worry about its internal details. In such circumstances, there are only 
two contributions to the scattering amplitude. Either there is no exchange, or all 
four of the nucleons are exchanged in the scattering. Since the protons and the 
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neutrons in the a-particle are all Fermi particles, an exchange of any pair reverses 
the sign of the scattering amplitude. So long as there are no internal changes in 
the a-particles, interchanging the two a-particles is the same as interchanging four 
pairs of Fermi particles. There is a change in sign for each pair, so the net result 
is that the amplitudes combine with a positive sign. The a-particle behaves like a 
Bose particle. 

So the rule is that composite objects, in circumstances in which the composite 
object can be considered as a single object, behave like Fermi particles or Bose 
particles, depending on whether they contain an odd number or an even number 
of Fermi particles. 

All the elementary Fermi particles we have mentioned— such as the electron, 
the proton, the neutron, and so on— have a spin j = 1 /2. If several such Fermi 
particles are put together to form a composite object, the resulting spin may be 
either integral or half-integral. For example, the common isotope of helium, 
He 4 , which has two neutrons and two protons, has a spin of zero, whereas Li 7 
which has three protons and four neutrons, has a spin of 3/2. We will learn later the 
rules for compounding angular momentum, and will just mention now that every 
composite object which has a half-integral spin imitates a Fermi particle , whereas 
every composite object with an integral spin imitates a Bose particle . 

This brings up an interesting question : Why is it that particles with half-integral 
spin are Fermi particles whose amplitudes add with the minus sign, whereas 
particles with integral spin are Bose particles whose amplitudes add with the posi- 
tive sign? We apologize for the fact that we cannot give you an elementary ex- 
planation. An explanation has been worked out by Pauli from complicated argu- 
ments of quantum field theory and relativity. He has shown that the two must 
necessarily go together, but we have not been able to find a way of reproducing his 
arguments on an elementary level. It appears to be one of the few places in physics 
where there is a rule which can be stated very simply, but for which no one 
has found a simple and easy explanation. The explanation is deep down in rela- 
tivistic quantum mechanics. This probably means that we do not have a complete 
understanding of the fundamental principle involved. For the moment, you will 
just have to take it as one of the rules of the world. 


4-2 States with two Bose particles 

Now we would like to discuss an interesting consequence of the addition rule 
for Bose particles. It has to do with their behavior when there are several particles 
present. We begin by considering a situation in which two Bose particles are scat- 
tered from two different scatterers. We won’t worry about the details of the scatter- 
ing mechanism. We are interested only in what happens to the scattered particles. 
Suppose we have the situation shown in Fig. 4-3. The particle a is scattered into 
the state 1. By a state we mean a given direction and energy, or some other given 
condition. The particle h is scattered into the state 2. We want to assume that the 
two states 1 and 2 are nearly the same. (What we really want to find out eventually 
is the amplitude that the two particles are scattered into identical directions, or 
states; but it is best if we think first about what happens if the states are almost 
the same and then work out what happens when they become identical.) 

Suppose that we had only particle a; then it would have a certain amplitude 
for scattering in direction 1, say (1 | a). And particle b alone would have the ampli- 
tude (2 | b) for landing in direction 2. If the two particles are not identical, the 
amplitude for the two scatterings to occur at the same time is just the product 

(1 | a){ 2 | b). 

The probability for such an event is then 



which is also equal to 


Id \a)(2 |6)| : 2 , 

l(l|a>| 2 K2|6>| 2 
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To save writing for the present arguments, we will sometimes set 

<1 I a) = a u <2 | b) = b 2 . 

Then the probability of the double scattering is 

|ail 2 N 2 . 

It could also happen that particle b is scattered into direction 1, while particle 
a goes into direction 2. The amplitude for this process is 

(2 | a)(l | b\ 

and the probability of such an event is 

|(2 | a)(l | b) | 2 = |a 2 l 2 |6i| 2 . 

Imagine now that we have a pair of tiny counters that pick up the two scattered 
particles. The probability P 2 that they will pick up two particles together is just 
the sum 

P 2 ~ |«l| 2 |^2| 2 + |«2t 2 |^l| 2 * (4.3) 

Now let’s suppose that the directions 1 and 2 are very close together. We 

expect that a should vary smoothly with direction, so a x and a 2 must approach 
each other as 1 and 2 get close together. If they are close enough, the amplitudes a x 
and a 2 will be equal. We can set a x — a 2 and call them both just a ; similarly, we 
set b i = b 2 = b. Then we get that 

P 2 - 2\a\ 2 \b\ 2 . (4.4) 

Now suppose, however, that a and b are identical Bose particles. Then the 
process of a going into 1 and b going into 2 cannot be distinguished from the ex- 
changed process in which a goes into 2 and b goes into 1 . In this case the amplitudes 
for the two different processes can interfere. The total amplitude to obtain a 
particle in each of the two counters is 

(1 | a><2 \b) + (2 \ a)( 1 ] b). (4.5) 

And the probability that we get a pair is the absolute square of this amplitude, 

P 2 = \d\b 2 + a 2 b 1 1 2 = 4\a\ 2 \b\ 2 . (4.6) 

We have the result that it is twice as likely to find two identical Bose particles 
scattered into the same state as you would calculate assuming the particles were 
different. 

Although we have been considering that the two particles are observed in 
separate counters, this is not essential — as we can see in the following way. Let’s 
imagine that both the directions 1 and 2 would bring the particles into a single 
small counter which is some distance away. We will let the direction 1 be defined 
by saying that it heads toward the element of area dS x of the counter. Direction 2 
heads toward the surface element dS 2 of the counter. (We imagine that the counter 
presents a surface at right angles to the line from the scatterings.) Now we cannot 
give a probability that a particle will go into a precise direction or to a particular 
point in space. Such a thing is impossible — the chance for any exact direction is 
zero. When we want to be so specific, we shall have to define our amplitudes so 
that they give the probability of arriving per unit area of a counter. Suppose that 
we had only particle a ; it would have a certain amplitude for scattering in direction 
1. Let’s define (1 | a) = a x to be the amplitude that a will scatter into a unit area 
of the counter in the direction 1. In other words, the scale of a x is chosen — we 
say it is “normalized” so that the probability that it will scatter into an element 
of area clS x is 

\(l\a)\ 2 dS l ^ ^dSt. (4.7) 
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If our counter has the total area AS, and we let dSi range over this area, the total 
probability that the particle a will be scattered into the counter is 

[ |fli| 2 </Si. (4.8) 

J A S 

As before, we want to assume that the counter is sufficiently small so that the 
amplitude doesn’t vary significantly over the surface of the counter; a x is then a 
constant amplitude which we can call a . Then the probability that particle a is 
scattered somewhere into the counter is 


Pa =\a\ 2 AS. (4.9) 

In the same way, we will have that the probability that particle b — when it is 
alone — scatters into some element of area, say dS 2 , is 

\b 2 \ 2 dS 2 . 

(We use dS 2 instead of dS i because we will later want a and b to go into different 
directions.) Again we set b 2 equal to the constant amplitude b; then the probability 
that particle b is counted in the detector is 

Pi =\b\ 2 AS. (4.10) 

Now when both particles are present, the probability that a is scattered into 
dS i and b is scattered into dS 2 is 

\a l b 2 \ 2 dS 1 dS 2 = \a\ 2 \b\ 2 dS 1 dS 2 . (4.11) 

If we want the probability that both a and b get into the counter, we integrate both 
dS i and dS 2 over AS and find that 

P 2 = \a\ 2 \b \ 2 (AS) 2 . (4.12) 

We notice, incidentally, that this is just equal to p a * p b , just as you would suppose 
assuming that the particles a and b act independently of each other. 

When the two particles are identical, however, there are two indistinguishable 
possibilities for each pair of surface elements dS x and dS 2 . Particle a going into 
dS 2 and particle b going into dSi is indistinguishable from a into dSi and b into 
dS 2 , so the amplitudes for these processes will interfere. (When we had two 
different particles above — although we did not in fact care which particle went 
where in the counter— we could, in principle , have found out; so there was no 
interference. For identical particles we cannot tell, even in principle.) We must 
write, then, that the probability that the two particles arrive at dS i and dS 2 is 

M 2 + a 2 b 1 \ 2 dS 1 dS 2 . (4.13) 

Now, however, when we integrate over the area of the counter, we must be careful. 
If we let dSi and dS 2 range over the whole area AS, we would count each part of 
the area twice since (4.13) contains everything that can happen with any pair of 
surface elements dS i and dS 2 f We can still do the integral that way, if we correct 
for the double counting by dividing the result by 2. We get then that P 2 for identical 
Bose particles is 

P 2 (Bosq) = ^{4|a| 2 \b \ 2 (AS) 2 } = 2\a\ 2 \b\ 2 (AS) 2 . (4.14) 

Again, this is just twice what we got in Eq. (4.12) for distinguishable particles. 

If we imagine for a moment that we knew that the b channel had already sent 
its particle into the particular direction, we can say that the probability that a 
second particle will go into the same direction is twice as great as we would have 


t In (4.11) interchanging dS\ and dS^ gives a different event, so both surface elements 
should range over the whole area of the counter. In (4.13) we are treating dS\ and dS 2 
as a pair and including everything that can happen. If the integrals include again what 
happens when dS i and dS 2 are reversed, everything is counted twice. 


4-5 



expected if we had calculated it as an independent event. It is a property of Bose 
particles that if there is already one particle in a condition of some kind, the 
probability of getting a second one in the same condition is twice as great as it 
would be if the first one were not already there. This fact is often stated in the 
following way: If there is already one Bose particle in a given state, the amplitude 
for putting an identical one on top of it is y/2 greater than if it weren’t there. 
(This is not a proper way of stating the result from the physical point of view we 
have taken, but if it is used consistently as a rule, it will, of course, give the correct 
result.) 


I 



Fig. 4-4. The scattering of n parti- 
cles into nearby final states. 


4-3 States with n Bose particles 

Let’s extend our result to a situation in which there are n particles present. 
We imagine the circumstance shown in Fig. 4-4. We have n particles a, b, c, . . . , 
which are scattered and end up in the directions 1, 2, 3, . . , n. All n directions 
are headed toward a small counter a long distance away. As in the last section, 
we choose to normalize all the amplitudes so that the probability that each particle 
acting alone would go into an element of surface dS of the counter is 

K 

First, let’s assume that the particles are all distinguishable; then the probability 
that n particles will be counted together in n different surface elements is 

M 2 c 3 . . | 2 c?S! dS 2 dS 3 . . (4.15) 

Again we take that the amplitudes don’t depend on where dS is located in the 
counter (assumed small) and call them simply a, b,c, .. . The probability (4.15) 

becomes 

[a! 2 |&| 2 |c| 2 . . . dSi dS 2 dS 3 . . (4.16) 

Integrating each dS over the surface AS of the counter, we have that P n (different), 
the probability of counting n different particles at once, is 


P r (different) = |o| 2 |6| 2 |c| 2 . . (AS)' 1 (4.17) 

This is just the product of the probabilities for each particle to enter the counter 
separately. They all act independently — the probability for one to enter does not 
depend on how many others are also entering. 

Now suppose that all the particles are identical Bose particles. For each set 
of directions 1, 2, 3, . . . there are many indistinguishable possibilities. If there were, 
for instance, just three particles, we would have the following possibilities: 


a — > 1 a —> 1 a — >2 

b-> 2 b^ 3 b^\ 

c —> 3 c —> 2 c — > 3 

a —> 2 a —> 3 a —> 3 

b-*3 b^> 1 b^2 

c — > 1 c — > 2 c — » 1 

There are six different combinations. With n particles, there are n\ different, but 
indistinguishable , possibilities for which we must add amplitudes. The probability 
that n particles will be counted in n surface elements is then 

\a\b2Cz-. +«i6 3 c 2 . +a 2 bic 3 .. 

+ a 2 b 3 ci . . + etc. + etc.| 2 dSi dS 2 dS 3 . . . dS n . (4.18) 

Once more we assume that all the directions are so close that we can set a\ = 
a 2 = = a = a n , and similarly for b, c, . ; the probability of (4. 18) becomes 

\n\abc . . .j 2 dS 1 dS 2 . . . dS n . 
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When we integrate each dS over the area AS of the counter, each possible 
product of surface elements is counted nl times; we correct for this by dividing 
by nl and get 

P n { Bose) = ~ \n\abc . . .\ 2 (AS)” 
or 

P n (Bose) = n\ \ abc . . .| 2 (A S) n . (4.20) 

Comparing this result with Eq. (4.17), we see that the probability of counting n 
Bose particles together is n! greater than we would calculate assuming that the 
particles were all distinguishable. We can summarize our result this way: 

P n (Bose) = nl P n (different). (4.21) 

Thus, the probability in the Bose case is larger by nl than you would calculate 
assuming that the particles acted independently. 

We can see better what this means if we ask the following question: What is 
the probability that a Bose particle will go into a particular state when there are 
already n others present? Let’s call the newly added particle w. If we have (n + 1) 
particles, including w 9 Eq. (4.20) becomes 

^n+i(Bose) = (n + 1)! \abc . . . w| 2 (A S)" +1 . (4.22) 

We can write this as 

^n+i(Bose) = {(n + l)|w| 2 AS}nl \ abc . . .| 2 AS" \ _ 

or V — 

P n+1 (Bose) = (n + \)\w\ 2 ASP n (Bo$e)._ (4.23) 

We can look at this result in the following way: The number |w| 2 AS is the 
probability for getting particle w into the detector if no other particles were present; 
P n (Bose) is the chance that there are already n other Bose particles present. So 
Eq. (4.23) says that when there are n other identical Bose particles present, the 
probability that one more particle will enter the same state is enhanced by the factor 
(n + 1). The probability of getting a boson, where there are already n, is (n + 1) 
times stronger than it would be if there were none before. The presence of the other 
particles increases the probability of getting one more. 

4-4 Emission and absorption of photons 

Throughout our discussion we have talked about a process like the scattering 
of a-particles. But that is not essential; we could have been speaking of the creation 
of particles, as for instance the emission of light. When the light is emitted, a 
photon is “created.” In such a case, we don’t need the incoming lines in Fig. 
4-4; we can consider merely that there are n atoms a, b, c t .. . emitting light, as in 
Fig. 4-5. So our result can also be stated: The probability that an atom will emit 
a photon into a particular final state is increased by the factor (n + 1) if there are 
already n photons in that state . 

People like to summarize this result by saying that the amplitude to emit a 
photon is increased by the factor \/n + 1 when there are already n photons 
present. It is, of course, another way of saying the same thing if it is understood to 
mean that this amplitude is just to be squared to get the probability. 

It is generally true in quantum mechanics that the amplitude to get from any 
condition <f> to any other condition X is the complex conjugate of the amplitude to 
get from X to <f>: 

<X | 0> = <* | *>*. (4-24) 

We will learn about this law a little later, but for the moment we will just assume 
it is true. We can use it to find out how photons are scattered or absorbed out of a 
given state. We have that the amplitude that a photon will be added to some state, 
say /, when there are already n photons present is, say, 

(n + 1 | n) — yfn + 1 a 9 (4.25) 
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where a = (/ 1 a) is the amplitude when there are no others present. Using Eq. 
(4.24), the amplitude to go the other way— from (n + 1) photons to n — is 


(n | n + 1) = y/n + 1 a*. (4.26) 

This isn’t the way people usually say it; they don’t like to think of going from 
(n + 1) to n, but prefer always to start with n photons present. Then they say 
that the amplitude to absorb a photon when there are n present— in other words, 
to go from n to (n — 1) — is 

(n - 1 \n) = vW. (4.27) 

which is, of course, just the same as Eq. (4.26). Then they have trouble trying to 
remember when to use y/n or \/n +T Here’s the way to remember: The factor 
is always the square root of the largest number of photons present, whether it is 
before or after the reaction. Equations (4.25) and (4.26) show that the law is 
really symmetric — it only appears unsymmetric if you write it as Eq. (4.27). 

There are many physical consequences of these new rules; we want to describe 
one of them having to do with the emission of light. Suppose we imagine a situation 
in which photons are contained in a box — you can imagine a box with mirrors for 
walls. Now say that in the box we have n photons, all of the same state — the same 
frequency, direction, and polarization— so they can’t be distinguished, and that 
also there is an atom in the box that can emit another photon into the same state. 
Then the probability that it will emit a photon is 

(H+l)|a| 2 , (4.28) 

and the probability that it will absorb a photon is 

n\a\ 2 , (4.29) 

where ]a| 2 is the probability it would emit if no photons were present. We have 
already discussed these rules in a somewhat different way in Chapter 42 of Vol. I. 
Equation (4.29) says that the probability that an atom will absorb a photon and 
make a transition to a higher energy state is proportional to the intensity of the 
light shining on it. But, as Einstein first pointed out, the rate at which an atom will 
make a transition downward has two parts. There is the probability that it will 
make a spontaneous transition |a| 2 , plus the probability of an induced transition 
n\a \ 2 , which is proportional to the intensity of the light— that is, to the number of 
photons present. Furthermore, as Einstein said, the coefficients of absorption and 
of induced emission are equal and are related to the probability of spontaneous 
emission. What we learn here is that if the light intensity is measured in terms of 
the number of photons present (instead of as the energy per unit area, and per sec), 
the coefficients of absorption of induced emission and of spontaneous emission are 
all equal. This is the content of the relation between the Einstein coefficients 
A and B of Chapter 42, Vol. I, Eq. (42.18). 


' 1 4-5 The blackbody spectrum 

Fig. 4-6. Radiation and absorption We would like to use our rules for Bose particles to discuss once more the 

of a photon with the frequency w. spectrum of blackbody radiation (see Chapter 42, Vol. I). We will do it by finding 

out how many photons there are in a box if the radiation is in thermal equilibrium 
with some atoms in the box. Suppose that for each light frequency oj, there are a 
certain number TV of atoms which have two energy states separated by the energy 
A E = hco. See Fig. 4-6. We’ll call the lower-energy state the “ground” state 
and the upper state the “excited” state. Let N g and N e be the average numbers of 
atoms in the ground and excited states; then in thermal equilibrium at the tem- 
perature T, we have from statistical mechanics that 


AE =Tuu 

j_ 


GROUND STATE 

(a) 


AE“ticd 

i 


GROUND STATE 


Ne 


e -6E/kT _ e -t u /kT. 


(4.30) 
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Each atom in the ground state can absorb a photon and go into the excited 
state, and each atom in the excited state can emit a photon and go to the ground 
state. In equilibrium, the rates for these two processes must be equal. The rates 
are proportional to the probability for the event and to the number of atoms 
present. Let’s let n be the average number of photons present in a given state 
with the frequency co. Then the absorption rate from that state is N g n\a \ 2 , and the 
emission rate into that state is N e (n + l)\a\ 2 . Setting the two rates equal, we have 
that 

NgJi = N e (n + 1). (4.31) 

Combining this with Eq. (4.30), we have 


Solving for w, we have 


n -hajkT 

n + 1 

1 

n = 5 


(4.32) 


which is the mean number of photons in any state with frequency co, for a cavity in 
thermal equilibrium. Since each photon has the energy £co, the energy in the 
photons of a given state is nhai, or 


hoo 

~to*ikT 


(4.33) 


Incidentally, we once found a similar equation in another context [Chapter 
41, Vol. I, Eq. (41.15)]. You remember that for any harmonic oscillator — such as 
a weight on a spring — the quantum mechanical energy levels are equally spaced 
with a separation hoo, as drawn in Fig. 4-7. If we call the energy of the «th level 
nho), we find that the mean energy of such an oscillator is also given by Eq. (4.33). 
Yet this equation was derived here for photons, by counting particles, and it gives 
the same results. That is one of the marvelous miracles of quantum mechanics. 
If one begins by considering a kind of state or condition for Bose particles which 
do not interact with each other (we have assumed that the photons do not interact 
with each other), and then considers that into this state there can be put either 
zero, or one, or two, ... up to any number n of particles, one finds that this system 
behaves for all quantum mechanical purposes exactly like a harmonic oscillator. 
By such an oscillator we mean a dynamic system like a weight on a spring or a 
standing wave in a resonant cavity. And that is why it is possible to represent the 
electromagnetic field by photon particles. From one point of view, we can analyze 
the electromagnetic field in a box or cavity in terms of a lot of harmonic oscillators, 
treating each mode of oscillation according to quantum mechanics as a harmonic 
oscillator. From a different point of view, we can analyze the same physics in 
terms of identical Bose particles. And the results of both ways of working are 
always in exact agreement . There is no way to make up your mind whether the 
electromagnetic field is really to be described as a quantized harmonic oscillator or 
by giving how many photons there are in each condition. The two views turn out 
to be mathematically identical. So in the future we can speak either about the 
number of photons in a particular state in a box or the number of the energy level 
associated with a particular mode of oscillation of the electromagnetic field. They 
are two ways of saying the same thing. The same is true of photons in free space. 
They are equivalent to oscillations of a cavity whose walls have receded to infinity. 

We have computed the mean energy in any particular mode in a box at the 
temperature T; we need only one more thing to get the blackbody radiation law: 
We need to know how many modes there are at each energy. (We assume that for 
every mode there are some atoms in the box — or in the walls — which have energy 
levels that can radiate into that mode, so that each mode can get into thermal 
equilibrium.) The blackbody radiation law is usually stated by giving the energy 
per unit volume carried by the light in a small frequency interval from co to co + Aco. 
So we need to know how many modes there are in a box with frequencies in the 
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Fig. 4-7. The energy levels of a 
harmonic oscillator. 


4-9 




interval Ato. Although this question continually comes up in quantum mechanics, 
it is purely a classical question about standing waves. 

We will get the answer only for a rectangular box. It comes out the same for a 
box of any shape, but it’s very complicated to compute for the arbitrary case. 
Also, we are only interested in a box whose dimensions are very large compared 
with a wavelength of the light. Then there are billions and billions of modes; 
there will be many in any small frequency interval Aco, so we can speak of the 
“average number” in any Aw at the frequency w. Let's start by asking how many 
modes there are in a one-dimensional case — as for waves on a stretched string. 
You know that each mode is a sine wave that has to go to zero at both ends; 
in other words, there must be an integral number of half-wavelengths in the length 
of the line, as shown in Fig. 4-8. We prefer to use the wave number k = 27t/X; 
calling kj the wave number of the yth mode, we have that 

k > = T’ (4 - 34) 


Fig. 4-8, 


on a line. 


The standing wave modes 


where j is any integer. The separation 8k between successive modes is 

8k = k j+1 - kj = -■ 


We want to assume that kL is so large that in a small interval A k, there are many 
modes. Calling A9d the number of modes in the interval A/c, we have 

A Lr T 

A31 = ~ = - A/c. (4.35) 

8k TT 



Now theoretical physicists working in quantum mechanics usually prefer to 
say that there are one-half as many modes; they write 

A9l = T a/c. (4.36) 

We would like to explain why. They usually like to think in terms of travelling 
waves — some going to the right (with a positive k) and some going to the left 
(with a negative k). But a “mode” is a standing wave which is the sum of two waves, 
one going in each direction. In other words, they consider each standing wave 
as containing two distinct photon “states/’ So if by A31, one prefers to mean the 
number of photon states of a given k (where now k ranges over positive and nega- 
tive values), one should then take A91 half as big. (All integrals must now go from 
k = — x to k = + x , and the total number of states up to any given absolute 
value of k will come out O.K.) Of course, we are not then describing standing 
waves very well, but we are counting modes in a consistent way. 

Now we want to extend the results to three dimensions. A standing wave in a 
rectangular box must have an integral number of half-waves along each axis. The 
situation for two of the dimensions is shown in Fig. 4-9. Each wave direction 
and frequency is described by a vector wave number whose x, y, and z compo- 
nents must satisfy equations like Eq. (4.34). So we have that 
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Fig. 4-9. Standing wave modes in 
two dimensions. 


The number of inodes with k x in an interval A k x is, as before, 



Xf 


and similarly for A k v and A k z . If we call A c Jl(A:) the number of modes for a vector 
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wave number k whose x-component is between k x and k x + A k x , whose ^-com- 
ponent is between k y and k y + A k y , and whose z-component is between k z and 
k z + A k z , then 

A3 d(A) = A k x A k y Ak,,. (4.37) 

The product L x L y L z is equal to the volume V of the box. So we have the important 
result that for high frequencies (wavelengths small compared with the dimensions), 
the number of modes in a cavity is proportional to the volume V of the box and 
to the “volume in k- space” A k x Ak y A k z . This result comes up again and again in 
many problems and should be memorized: 

dW(k) = V ~~ • (4.38) 

Although we have not proved it, the result is independent of the shape of the box. 

We will now apply this result to find the number of photon modes for photons 
with frequencies in the range Ate. We are just interested in the energy in various 
modes — but not interested in the directions of the waves. We would like to know 
the number of modes in a given range of frequencies. In a vacuum the magnitude 
of k is related to the frequency by 

1*1 = "- (4.39) 

So in a frequency interval Aoj, these are all the modes which correspond to A:’s 
with a magnitude between k and k + A k, independent of the direction. The 
“volume in A:-space” between k and k + Ak is a spherical shell of volume 

47 rk 2 Ak. 


The number of modes is then 


A3l(co) = 


V4wk 2 Ak 

~~(2~7ry~ 


(4.40) 


However, since we are now interested in frequencies, we should substitute k = co/c, 
so we get 


A3l(co) = 


V47TC0 2 Aco 
(2 ? t)3 C 3 


(4.41) 


There is one more complication. If we are talking about modes of an electro- 
magnetic wave, for any given wave vector k there can be either of two polarizations 
(at right angles to each other). Since these modes are independent, we must— for 
light — double the number of modes. So we have 

A3l(co) = - ^ 2 ^ (for light). (4.42) 


We have shown, Eq. (4.33), that each mode (or each “state”) has on the 
average the energy 


nftco 


hoj 

gfiwfkT __ i 


Multiplying this by the number of modes, we get the energy A E in the modes that 
lie in the interval Ac o: 


AE 


hoi Vco 2 Aco 

gfiulkT | 7T 2 C 3 


(4.43) 


This is the law for the frequency spectrum of blackbody radiation, which we have 
already found in Chapter 41 of Vol. I. The spectrum is plotted in Fig. 4-10. You 
see now that the answer depends on the fact that photons are Bose particles, which 
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Fig. 4-10. The frequency spectrum 
of radiation in a cavity in thermal equilib- 
rium, the “blackbody" spectrum. 



have a tendency to try to get all into the same state (because the amplitude for 
doing so is large). You will remember, it was Planck's study of the blackbody 
spectrum (which was a mystery to classical physics), and his discovery of the for- 
mula in Eq. (4.43) that started the whole subject of quantum mechanics. 


4-6 Liquid helium 

Liquid helium has at low temperatures many odd properties which we cannot 
unfortunately take the time to describe in detail right now, but many of them arise 
from the fact that a helium atom is a Bose particle. One of the things is that liquid 
helium flows without any viscous resistance. It is, in fact, the ideal '‘dry” water 
we have been talking about in one of the earlier chapters — provided that the 
velocities are low enough. The reason is the following. In order for a liquid to have 
viscosity, there must be internal energy losses; there must be some way for one part 
of the liquid to have a motion that is different from that of the rest of the liquid. 
This means that it must be possible to knock some of the atoms into states that 
are different from the states occupied by other atoms. But at sufficiently low 
temperatures, when the thermal motions get very small, all the atoms try to get 
into the same condition. So, if some of them are moving along, then all the atoms 
try to move together in the same state. There is a kind of rigidity to the motion, 
and it is hard to break the motion up into irregular patterns of turbulence, as 
would happen, for example, with independent particles. So in a liquid of Bose 
particles, there is a strong tendency for all the atoms to go into the same state— 
which is represented by the \/n + 1 factor we found earlier. (For a bottle of 
liquid helium n is, of course, a very large number!) This cooperative motion 
does not happen at high temperatures, because then there is sufficient thermal 
energy to put the various atoms into various different higher states. But at a 
sufficiently low temperature there suddenly comes a moment in which all the helium 
atoms try to go into the same state. The helium becomes a superfluid. Incidentally, 
this phenomenon only appears with the isotope of helium which has atomic weight 
4. For the helium isotope of atomic weight 3, the individual atoms are Fermi 
particles, and the liquid is a normal fluid. Since superfluidity occurs only with 
He 4 , it is evidently a quantum mechanical effect — due to the Bose nature of the 
a-particle. 


4-7 The exclusion principle 

Fermi particles act in a completely different way. Let's see what happens 
if we try to put two Fermi particles into the same state. We will go back to our 
original example and ask for the amplitude that two identical Fermi particles will 
be scattered into almost exactly the same direction. The amplitude that particle 
a will go in direction 1 and particle b will go in direction 2 is 

(1 | a)(2 | b), 

whereas the amplitude that the outgoing directions will be interchanged is 

<2|a)<l I b). 

Since we have Fermi particles, the amplitude for the process is the difference of 
these two amplitudes: 

(l \a)(2\b) - (2\a)(\ \ b). (4.44) 

Let’s say that by “direction 1” we mean that the particle has not only a certain 
direction but also a given direction of its spin, and that “direction 2” is almost 
exactly the same as direction 1 and corresponds to the same spin direction. Then 
(1 | a) and (2 \ a) are nearly equal. (This would not necessarily be true if the 
outgoing states 1 and 2 did not have the same spin, because there might be some 
reason why the amplitude would depend on the spin direction.) Now if we let 
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Fig. 4-1 1. How atoms might look if electrons behaved like Bose particles. 


directions 1 and 2 approach each other, the total amplitude in Eq. (4.44) becomes 
zero. The result for Fermi particles is much simpler than for Bose particles. It 
just isn't possible at all for two Fermi particles— such as two electrons — to get 
into exactly the same state. You will never find two electrons in the same position 
with their two spins in the same direction. It is not possible for two electrons to 
have the same momentum and the same spin directions. If they are at the same 
location or with the same state of motion, the only possibility is that they must be 
spinning opposite to each other. 

What are the consequences of this? There are a number of most remarkable 
effects which are a consequence of the fact that two Fermi particles cannot get into 
the same state. In fact, almost all the peculiarities of the material world hinge on 
this wonderful fact. The variety that is represented in the periodic table is basically 
a consequence of this one rule. 

Of course, we cannot say what the world would be like if this one rule were 
changed, because it is just a part of the whole structure of quantum mechanics, and it 
is impossible to say what else would change if the rule about Fermi particles were 
different. Anyway, let’s just try to see what would happen if only this one rule were 
changed. First, we can show that every atom would be more or less the same. 
Let’s start with the hydrogen atom. It would not be noticeably affected. The 
proton of the nucleus would be surrounded by a spherically symmetric electron 
cloud, as shown in Fig. 4-1 1(a). As we have described in Chapter 2, the electron 
is attracted to the center, but the uncertainty principle requires that there be 
a balance between the concentration in space and in momentum. The balance 
means that there must be a certain energy and a certain spread in the electron 
distribution which determines the characteristic dimension of the hydrogen atom. 

Now suppose that we have a nucleus with two units of charge, such as the 
helium nucleus. This nucleus would attract two electrons, and if they were Bose 
particles, they would — except for their electric repulsion — both crowd in as close 
as possible to the nucleus. A helium atom might look as shown in part (b) of the 
figure. Similarly, a lithium atom which has a triply charged nucleus would have 
an electron distribution like that shown in part (c) of Fig. 4-1 1 . Every atom would 
look more or less the same— a little round ball with all the electrons sitting near 
the nucleus, nothing directional and nothing complicated. 

Because electrons are Fermi particles, however, the actual situation is quite 
different. For the hydrogen atom the situation is essentially unchanged. The only 
difference is that the electron has a spin which we indicate by the little arrow in 
Fig. 4- 12(a). In the case of a helium atom, however, we cannot put two electrons 
on top of each other. But wait, that is only true if their spins are the same. Two 
electrons can occupy the same state if their spins are opposite. So the helium atom 
does not look much different either. It would appear as shown in part (h) of 
Fig. 4-12. For lithium, however, the situation becomes quite different. Where 
can we put the third electron? The third electron cannot go on top of the other 
two because both spin directions are occupied. (You remember that for an electron 
or any particle with spin 1/2 there are only two possible directions for the spin.) 
The third electron can’t go near the place occupied by the other two, so it must 
take up a special condition in a different kind of state farther away from the 
nucleus in part (c) of the figure. (We are speaking only in a rather rough way here, 
because in reality all three electrons are identical; since we cannot really distinguish 
which one is which, our picture is only an approximate one.) 





Fig. 4-12. Atomic configurations for 
real, Fermi-type, spin one-half electrons. 
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Fig. 4-13. The hydrogen molecule. 



Fig. 4-14. Helium with one electron 
in a higher energy state. 



Fig. 4-15. The likely mechanism in a 
ferromagnetic crystal; the conduction 
electron is antiparallel to the unpaired 
inner electrons. 


Now we can begin to see why different atoms will have different chemical 
properties. Because the third electron in lithium is farther out, it is relatively more 
loosely bound. It is much easier to remove an electron from lithium than from 
helium. (Experimentally, it takes 25 volts to ionize helium but only 5 volts to 
ionize lithium.) This accounts for the valence of the lithium atom. The directional 
properties of the valence have to do with the pattern of the waves of the outer 
electron, which we will not go into at the moment. But we can already see the im- 
portance of the so-called exclusion principle — which states that no two electrons 
can be found in exactly the same state (including spin). 

The exclusion principle is also responsible for the stability of matter on a 
large scale. We explained earlier that the individual atoms in matter did not 
collapse because of the uncertainty principle; but this does not explain why it is 
that two hydrogen atoms can’t be squeezed together as close as you want — why 
it is that all the protons don’t get close together with one big smear of electrons 
around them. The answer is, of course, that since no more than two electrons — 
with opposite spins — can be in roughly the same place, the hydrogen atoms must 
keep away from each other. So the stability of matter on a large scale is really a 
consequence of the Fermi particle nature of the electrons. 

Of course, if the outer electrons on two atoms have spins in opposite directions, 
they can get close to each other. This is, in fact, just the way that the chemical 
bond comes about. It turns out that two atoms together will generally have the 
lowest energy if there is an electron between them. It is a kind of an electrical 
attraction for the two positive nuclei toward the electron in the middle. It is 
possible to put two electrons more or less between the two nuclei so long as their 
spins are opposite, and the strongest chemical binding comes about this way. 
There is no stronger binding, because the exclusion principle does not allow there 
to be more than two electrons in the space between the atoms. We expect the 
hydrogen molecule to look more or less as shown in Fig. 4-13. 

We want to mention one more consequence of the exclusion principle. You 
remember that if both electrons in the helium atom are to be close to the nucleus, 
their spins are necessarily opposite. Now suppose that we would like to try to 
arrange to have both electrons with the same spin — as we might consider doing by 
putting on a fantastically strong magnetic field that would try to line up the spins 
in the same direction. But then the two electrons could not occupy the same state 
in space. One of them would have to take on a different geometrical position, as 
indicated in Fig. 4-14. The electron which is located farther from the nucleus has 
less binding energy. The energy of the whole atom is therefore quite a bit higher. 
In other words, when the two spins are opposite, there is a much stronger total 
attraction. 

So, there is an apparent, enormous force trying to line up spins opposite to 
each other when two electrons are close together. If two electrons are trying to go 
in the same place, there is a very strong tendency for the spins to become lined 
opposite. This apparent force trying to orient the two spins opposite to each other 
is much more powerful than the tiny force between the two magnetic moments of 
the electrons. You remember when we were speaking of ferromagnetism there was 
the mystery of why the electrons in different atoms had a strong tendency to line 
up parallel. Although there is still no quantitative explanation, it is believed that 
what happens is that the electrons around the core of one atom interact through 
the exclusion principle with the outer electrons which have become free to wander 
throughout the crystal. This interaction causes the spins of the free electrons and 
the inner electrons to take on opposite directions. But the free electrons and the 
inner atomic electrons can only be opposite provided all the inner electrons have 
the same spin direction, as indicated in Fig. 4-15. It seems probable that it is the 
effect of the exclusion principle acting indirectly through the free electrons that 
gives rise to the strong aligning forces responsible for ferromagnetism. 

We will mention one further example of the influence of the exclusion principle. 
We have said earlier that the nuclear forces are the same between the neutron and 
the proton, between the proton and the proton, and between the proton and the 
neutron. Why is it then that a proton and a neutron can stick together to make a 
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deuterium nucleus, whereas there is no nucleus with just two protons or with just 
two neutrons? The deuteron is, as a matter of fact, bound by an energy of about 
2.2 million volts, yet, there is no corresponding binding between a pair of protons 
to make an isotope of helium with the atomic weight 2. Such nuclei do not exist. 
The combination of two protons does not make a bound state. 

The answer is a result of two effects: first, the exclusion principle; and second, 
the fact that the nuclear forces are somewhat sensitive to the direction of spin. The 
force between a neutron and a proton is attractive and somewhat stronger when 
the spins are parallel than when they are opposite. It happens that these forces 
are just different enough that a deuteron can only be made if the neutron and 
proton have their spins parallel; when their spins are opposite, the attraction is 
not quite strong enough to bind them together. Since the spins of the neutron and 
proton are each one-half and are in the same direction, the deuteron has a spin of 
one. We know, however, that two protons are not allowed to sit on top of each 
other if their spins are parallel. If it were not for the exclusion principle, two 
protons would be bound, but since they cannot exist at the same place and with 
the same spin directions, the He 2 nucleus does not exist. The protons could come 
together with their spins opposite, but then there is not enough binding to make 
a stable nucleus, because the nuclear force for opposite spins is too weak to 
bind a pair of nucleons. The attractive force between neutrons and protons of 
opposite spins can be seen by scattering experiments. Similar scattering experiments 
with two protons with parallel spins show that there is the corresponding attraction. 
So it is the exclusion principle that helps explain why deuterium can exist when 
He 2 cannot. 
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5 


Spin One 


5-1 Filtering atoms with a Stern-Gerlach apparatus 

In this chapter we really begin the quantum mechanics proper— in the sense 
that we are going to describe a quantum mechanical phenomenon in a completely 
quantum mechanical way. We will make no apologies and no attempt to find con- 
nections to classical mechanics. We want to talk about something new in a new 
language. The particular situation which we are going to describe is the behavior 
of the so-called quantization of the angular momentum, for a particle of spin one. 
But we won’t use words like “angular momentum” or other concepts of classical 
mechanics until later. We have chosen this particular example because it is rela- 
tively simple, although not the simplest possible example. It is sufficiently com- 
plicated that it can stand as a prototype which can be generalized for the description 
of all quantum mechanical phenomena. Thus, although we are dealing with a 
particular example, all the laws which we mention are immediately generalizable, 
and we will give the generalizations so that you will see the general characteristics 
of a quantum mechanical description. We begin with the phenomenon of the 
splitting of a beam of atoms into three separate beams in a Stern-Gerlach experi- 
ment. 

You remember that if we have an inhomogeneous magnetic field made by a 
magnet with a pointed pole tip and we send a beam through the apparatus, the 
beam of particles may be split into a number of beams — the number depending 
on the particular kind of atom and its state. We are going to take the case of an 
atom which gives three beams, and we are going to call that a particle -ef spin one. 
You can do for yourself the case of five beams, seven beams, two beams, etc. — you 
just copy everything down and where we have three terms, you will have five 
terms, seven terms, and so on. 

Imagine the apparatus drawn schematically in Fig. 5-1. A beam of atoms 
(or particles of any kind) is collimated by some slits and passes through a non- 
uniform field. Let’s say that the beam moves in the ^-direction and that the 
magnetic field and its gradient are both in the z-direction. Then, looking from the 
side, we will see the beam split vertically into three beams, as shown in the figure. 
Now at the output end of the magnet we could put small counters which count 
the rate of arrival of particles in any one of the three beams. Or we can block 
off two of the beams and let the third one go on. 

Suppose we block off the lower two beams and let the top-most beam go on 
and enter a second Stern-Gerlach apparatus of the same kind, as shown in Fig. 
5-2. What happens? There are not three beams in the second apparatus; there 
is only the top beam.f This is what you would expect if you think of the second 
apparatus as simply an extension of the first. Those atoms which are being pushed 
upward continue to be pushed upward in the second magnet. 
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Fig. 5—1. In a Stern-Gerlach experi- 
ment, atoms of spin one are split into 
three beams. 



Fig. 5-2. The atoms from one of the 
beams are sent into a second identical 
apparatus. 


t We are assuming that the deflection angles are very small. 
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Fig. 5-3. (a) An imagined modification of a Stern-Gerlach apparatus, (b) The paths of spin-one atoms. 


You can see then that the first apparatus has produced a beam of “purified” 
objects — atoms that get bent upward in the particular inhomogeneous field. The 
atoms, as they enter the original Stern-Gerlach apparatus, are of three “varieties,” 
and the three kinds take different trajectories. By filtering out all but one of the 
varieties, we can make a beam whose future behavior in the same kind of apparatus 
is determined and predictable. We will call this a filtered beam, or a polarized 
beam, or a beam in which the atoms all are known to be in a definite state . 

For the rest of our discussion, it will be more convenient if we consider a 
somewhat modified apparatus of the Stern-Gerlach type. The apparatus looks 
more complicated at first, but it will make all the arguments simpler. Anyway, 
since they are only “thought experiments,” it doesn’t cost anything to complicate 
the equipment. (Incidentally, no one has ever done all of the experiments we will 
describe in just this way, but we know what would happen from the laws of quantum 
mechanics, which are, of course, based on other similar experiments. These other 
experiments are harder to understand at the beginning, so we want to describe 
some idealized — but possible — experiments.) 

Figure 5-3(a) shows a drawing of the “modified Stern-Gerlach apparatus” 
we would like to use. It consists of a sequence of three high-gradient magnets. 
The first one (on the left) is just the usual Stern-Gerlach magnet and splits the 
incoming beam of spin-one particles into three separate beams. The second 
magnet has the same cross section as the first, but is twice as long and the polarity 
of its magnetic field is opposite the field in magnet 1. The second magnet pushes 
in the opposite direction on the atomic magnets and bends their paths back toward 
the axis, as shown in the trajectories drawn in the lower part of the figure. The 
third magnet is just like the first, and brings the three beams back together again, 
so that leaves the exit hole along the axis. Finally, we would like to imagine that 
in front of the hole at A there is some mechanism which can get the atoms started 
from rest and that after the exit hole at B there is a decelerating mechanism that 
brings the atoms back to rest at B. That is not essential, but it will mean that in 
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our analysis we won’t have to worry about including any effects of the motion as 
the atoms come out, and can concentrate on those matters having only to do with 
the spin. The whole purpose of the “improved” apparatus is just to bring all the 
particles to the same place, and with zero speed. 

Now if we want to do an experiment like the one in Fig. 5-2, we can first 
make a filtered beam by putting a plate in the middle of the apparatus that blocks 
two of the beams, as shown in Fig. 5-4. If we now put the polarized atoms through 
a second identical apparatus, all the atoms will take the upper path, as can be 
verified by putting similar plates in the way of the various beams of the second 
S filter and seeing whether particles get through. 


n r- 



Fig. 5-4. The “improved” Stern-Gerlach apparatus as a filter. 


Suppose we call the first apparatus by the name S . (We are going to consider 
all sorts of combinations, and we will need labels to keep things straight.) We will 
say that the atoms which take the top path in S are in the “plus state with respect 
to S ”; the ones which take the middle path are in the “zero state with respect to 
5”; and the ones which take the lowest path are in the “minus state with respect 
to S.” (In the more usual language we would say that the z-component of the 
angular momentum was + 16, 0, and - 1 h, but we are not using that language now.) 
Now in Fig. 5-4 the second apparatus is oriented just like the first, so the filtered 
atoms will all go on the upper path. Or if we had blocked off the upper and lower 
beams in the first apparatus and let only the zero state through, all the filtered 
atoms would go through the middle path of the second apparatus. And if we 
had blocked off all but the lowest beam in the first, there would be only a low 
beam in the second. We can say that in each case our first apparatus has 
produced a filtered beam in a pure state with respect to S (+, 0, or — ), and we 
can test which state is present by putting the atoms through a second, identical 
apparatus. 

We can make our second apparatus so that it transmits only atoms of a 
particular state — by putting masks inside it as we did for the first one — and then 
we can test the state of the incoming beam just by seeing whether anything comes 
out the far end. For instance, if we block off the two lower paths in the second 
apparatus, 100 percent of the atoms will still come through; but if we block off the 
upper path, nothing will get through. 

To make this kind of discussion easier, we are going to invent a shorthand 
symbol to represent one of our improved Stern-Gerlach apparatuses. We will let 
the symbol 

(5.1) 

s 

stand for one complete apparatus. (This is not a symbol you will ever find used in 
quantum mechanics; we’ve just invented it for this chapter. It is simply meant to 
be a shorthand picture of the apparatus of Fig. 5-3.) Since we are going to want 
to use several apparatuses at once, and with various orientations, we will identify 
each with a letter underneath. So the symbol in (5.1) stands for the apparatus S. 
When we block off one or more of the beams inside, we will show that by some 
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vertical bars indicating which beam is blocked, like this: 


Oil- ( 5 . 2 ) 


s 

The various possible combinations we will be using are shown in Fig. 5-5. 

If we have two filters in succession (as in Fig. 5-4), we will put the two sym- 
bols next to each other, like this: 



s s 



Fig. 5-5. Special shorthand symbols 
for Stern-Gerlach type filters. 


For this setup, everything that comes through the first also gets through 
In fact, even if we block off the “zero” and “minus” channels of 
apparatus, so that we have 



the second, 
the second 


(5.4) 


s s 


we still get 100 percent transmission through the second apparatus. On the other 
hand, if we have 


nothing at all comes out of the far end. Similarly, 

•'! I’ll 

s 

would give nothing out. On the other hand, 

(•'I I? 1 ! 

s s 

would be just equivalent to 

(•'! 

5 

by itself. 

Now we want to describe these experiments quantum mechanically. We will 
say that an atom is in the (+5) state if it has gone through the apparatus of Fig. 
5-5(b), that it is in a (0 5 ) state if it has gone through (c), and in a (—5) state if 
it has gone through (d).f Then we let (b | a) be the amplitude that an atom which 
is in state a will get through an apparatus into the b state. We can say: (b \ a) is 
the amplitude for an atom in the state a to get into the state b . The experiment 
(5.4) gives us that 

<+S|+S> - 1, 


(5.5) 


(5.6) 


(5.7) 


f Read: (+5) = “plus-5”; (05) = “zero-5”; (—5) = “minus-5.” 
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whereas (5.5) gives us 


<-S\+S) - 0. 


Similarly, the result of (5.6) is 


and of (5.7) is 


(+s\ - S ) = 0, 

<-s\ - s ) = 1 . 


As long as we deal only with “pure” states — that is, we have only one channel 
open — there are nine such amplitudes, and we can write them in a table: 


from 


to 


+ S 
OS 
-S 


+s OS -s 
1 0 0 
0 1 0 
0 0 1 


(5.8) 


This array of nine numbers — called a matrix — summarizes the phenomena we’ve 
been describing. 


5-2 Experiments with filtered atoms 

Now comes the big question: What happens if the second apparatus is tipped 
to a different angle, so that its field axis is no longer parallel to the first? It 
could be not only tipped, but also pointed in a different direction — for instance, 
it could take the beam off at 90° with respect to the original direction. To take it 
easy at first, let’s first think about an arrangement in which the second Stern- 
Gerlach experiment is tilted by some angle a about the j^-axis, as shown in Fig. 
5-6. We’ll call the second apparatus T. Suppose that we now set up the following 
experiment: 

!oi) ftil 


or the experiment: 


S T 


+ 

0 



T 


What comes out at the far end in these cases? 

The answer is this: If the atoms are in a definite state with respect to S , they 
are not in the same state with respect to T — a (+5) state is not also a (+T) state. 
There is, however, a certain amplitude to find the atom in a (+T) state — or a (0 T) 
state or a (— T) state. 

In other words, as careful as we have been to make sure that we have the 
atoms in a definite condition, the fact of the matter is that if it goes through an 
apparatus which is tilted at a different angle it has, so to speak, to “reorient” 



Fig. 5-6. Two Stern-Gerlach type 
filters in series; the second is tilted at the 
angle a with respect to the first. 
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itself — which it does, don’t forget, by luck. We can put only one particle through 
at a time, and then we can only ask the question: What is the probability that it 
gets through? Some of the atoms that have gone through S will end in a (+T) 
state, some of them will end in a ( 0 T ), and some in a (— T) state — all with different 
odds. These odds can be calculated by the absolute squares of complex amplitudes; 
what we want is some mathematical method, or quantum mechanical description, 
for these amplitudes. What we need to know are various quantities like 

<-r|+s>, 

by which we mean the amplitude that an atom initially in the (+S) state can get 
into the (— T) condition (which is not zero unless T and *S are lined up parallel 
to each other). There are other amplitudes like 

(+r|0S>, or (or|-S>, etc. 

There are, in fact, nine such amplitudes — another matrix — that a theory of particles 
should tell us how to calculate. Just as F = ma tells us how to calculate what hap- 
pens to a classical particle in any circumstance, the laws of quantum mechanics 
permit us to determine the amplitude that a particle will get through a particular 
apparatus. The central problem, then, is to be able to calculate — for any given 
tilt angle a, or in fact for any orientation whatever— the nine amplitudes: 

{+T\ +5), (+T I 0 S) 9 <+r| -s>, 

(0T\ + S), <0T|0S>, ( 0 T | —S), (5.9) 

<-r| +S>, i-T\0S), <-r|-s>. 

We can already figure out some relations among these amplitudes. First, 
according to our definitions, the absolute square 

\(+T\ +5)| 2 

is the probability that an atom in a (+5) state will enter a (+T) state. We will often 
find it more convenient to write such squares in the equivalent form 

<■ +T\+S)(+T\+S >*. 

In the same notation the number 

<or| +s)(ot\ +s >* 

is the probability that a particle in the (+5) state will enter the (0 T) state, and 

<-r|+s><-r|+s>* 

is the probability that it will enter the (— T) state. But the way our apparatuses 
are made, every atom which enters the T apparatus must be found in some one of 
the three states of the T apparatus — there’s nowhere else for a given kind of atom 
to go. So the sum of the three probabilities we’ve just written must be equal to 
100 percent. We have the relation 

(+t | +s)(+t\ +s>* + (or | +s)(ot\ +s>* 

+ <-r| +S)(~T\ +S>* - 1. (5.10) 

There are, of course, two other such equations that we get if we start with a ( 0 S) 
or a (—5) state. But they are all we can easily get, so we’ll go on to some other 
general questions. 

5-3 Stern-Gerlach filters in series 

Here is an interesting question: Suppose we had atoms filtered into the (+*S) 
state, then we put them through a second filter, say into a (Of) state, and then 
through another +S filter. (We’ll call the last filter S' just so we can distinguish 
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it from the first S-fiter.) Do the atoms remember that they were once in a (+5) 
state? In other words, we have the following experiment: 


s 


+1 

o 

-I 


T 


+ 

0 


S' 


(5.11) 


We want to know whether all those that get through T also get through S'. They 
do not. Once they have been filtered by T, they do not remember in any way that 
they were in a (+S) state when they entered T. Note that the second 5 apparatus 
in (5.11) is oriented exactly the same as the first, so it is still an 5-type filter. 
The states filtered by S' are, of course, still (+5), (0 S), and (-5). 

The important point is this: If the T filter passes only one beam, the fraction 
that gets through the second S filter depends only on the setup of the T filter, and 
is completely independent of what precedes it. The fact that the same atoms were 
once sorted by an S filter has no influence whatever on what they will do once they 
have been sorted again into a pure beam by a T apparatus. From then on, the 
probability for getting into different states is the same no matter what happened 
before they got into the T apparatus. 

As an example, let’s compare the experiment of (5.11) with the following 
experiment: 



S T S' 


in which only the first S is changed. Let’s say that the angle a (between S and T) 
is such that in experiment (5.11) one-third of the atoms that get through T also 
get through S'. In experiment (5.12), although there will, in general, be a different 
number of atoms coming through T, the same fraction of these — one-third — will 
also get through S'. 

We can, in fact, show from what you have learned earlier that the fraction of 
the atoms that come out of T and get through any particular S' depends only on 
T and S', not on anything that happened earlier. Let’s compare experiment 
(5.12) with 



S T S' 


The amplitude that an atom that comes out of S will also get through both T and 
S' is, for the experiments of (5.12), 

<+s|or)<or|os>. 

The corresponding probability is 

|(+s|or)(or|os)| 2 = \(+s\ot)\ 2 \(ot\os)\ 2 . 

The probability for experiment (5.13) is 

Kosiorxorios )! 2 = |<os| or >| 2 |<o r|os)| 2 . 

The ratio is 

|(0 5 | 0 T)\ 2 
K+S|0 7’>| 2 

and depends only on T and S', and not at all on which beam (+5), (OS), or (-S) 
is selected by S. (The absolute numbers may go up and down together depending 
on how much gets through T.) We would, of course, find the same result if we 
compared the probabilities that the atoms would go into the plus or the minus 
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states with respect to S ' , or the ratio of the probabilities to go into the zero or 
minus states. 

In fact, since these ratios depend only on which beam is allowed to pass 
through T , and not on the selection made by the first S filter, it is clear that we 
would get the same result even if the last apparatus were not an S filter. If we use 
for the third apparatus — which we will now call R — one rotated by some arbitrary 
angle with respect to T , we would find that a ratio such as |{ 0 R 1 0 T)\ 2 /\(+R | 0 T)\ 2 
was independent of which beam was passed by the first filter 5. 

5-4 Base states 

These results illustrate one of the basic principles of quantum mechanics: 
Any atomic system can be separated by a filtering process into a certain set of 
what we will call base states , and the future behavior of the atoms in any single 
given base state depends only on the nature of the base state — it is independent of 
any previous history. t The base states depend, of course, on the filter used; for 
instance, the three states (+J), ( 0 7 1 ), and (~T) are one set of base states; the three 
states (+5), (0 5), and (—5) are another. There are any number of possibilities 
each as good as any other. 

We should be careful to say that we are considering good filters which do 
indeed produce “pure” beams. If, for instance, our Stern-Gerlach apparatus didn’t 
produce a good separation of the three beams so that we could not separate them 
cleanly by our masks, then we could not make a complete separation into base 
states. We can tell if we have pure base states by seeing whether or not the beams 
can be split again in another filter of the same kind. If we have a pure (+T) state, 
for instance, all the atoms will go through 



T 


and none will go through 



T 


or through 

(•'i 

T 

Our statement about base states means that it is possible to filter to some pure state, 
so that no further filtering by an identical apparatus is possible. 

We must also point out that what we are saying is exactly true only in rather 
idealized situations. In any real Stern-Gerlach apparatus, we would have to worry 
about diffraction by the slits that could cause some atoms to go into states corre- 
sponding to different angles, or about whether the beams might contain atoms with 
different excitations of their internal states, and so on. We have idealized the 
situation so that we are talking only about the states that are split in a magnetic 
field; we are ignoring things having to do with position, momentum, internal 
excitations, and the like. In general, one would need to consider also base states 
which are sorted out with respect to such things also. But to keep the concepts 
simple, we are considering only our set of three states, which is sufficient for the 
exact treatment of the idealized situation in which the atoms don’t get torn up in 


f We do not intend the word “base state” to imply anything more than what is said 
here. They are not to be thought of as “basic” in any sense. We are using the word base 
with the thought of a basis for a description, somewhat in the sense that one speaks of 
“numbers to the base ten.” 
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going through the apparatus, or otherwise badly treated, and come to rest when 
they leave the apparatus. 

You will note that we always begin our thought experiments by taking a 
filter with only one channel open, so that we start with some definite base state. 
We do this because atoms come out of a furnace in various states determined at 
random by the accidental happenings inside the furnace. (It gives what is called 
an “unpolarized” beam.) This randomness involves probabilities of the “classical” 
kind — as in coin tossing—which are different from the quantum mechanical 
probabilities we are worrying about now. Dealing with an unpolarized beam 
would get us into additional complications that are better to avoid until after we 
understand the behavior of polarized beams. So don’t try to consider at this point 
what happens if the first apparatus lets more than one beam through. (We will 
tell you how you can handle such cases at the end of the chapter.) 

Let’s now go back and see what happens when we go from a base state for 
one filter to a base state for a different filter. Suppose we start again with 



S T 


The atoms which come out of T are in the base state (07) and have no memory 
that they were once in the state ( +S ). Some people would say that in the filtering 
by T we have “lost the information” about the previous state (+5) because we 
have “disturbed” the atoms when we separated them into three beams in the 
apparatus T. But that is not true. The past information is not lost by the separation 
into three beams, but by the blocking masks that are put in — as we can see by the 
following set of experiments. 

We start with a +S filter and will call N the number of atoms that come 
through it. If we follow this by- a 0 T filter, the number of atoms that come out is 
some fraction of the original number, say aN. If we then put another + S filter, 
only some fraction /3 of these atoms will get to the far end. We can indicate this 
in the following way: 



If our third apparatus S' selected a different state, say the (0 S) state, a different 
fraction, say 7 , would get through.! We would have 



s 



(5.15) 


Now suppose we repeat these two experiments but remove all the masks from T. 
We would then find the remarkable results as follows: 



(5.16) 


( 5 . 17 ) 


t In terms of our earlier notation a = |( 0 T| + S)| 2 , 0 = |{ + S\ 0 T)| 2 , and 7 = 
|<os|or>| 2 . 
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All the atoms get through S' in the first case, but none in the second case! This is 
one of the great laws of quantum mechanics. That nature works this way is not 
self-evident, but the results we have given correspond for our idealized situation 
to the quantum mechanical behavior observed in innumerable experiments. 

5-5 Interfering amplitudes 

How can it be that in going from (5.15) to (5.17) — by opening more channels 
— we let fewer atoms through? This is the old, deep mystery of quantum mechanics 
— the interference of amplitudes. It’s the same kind of thing we first saw in the 
two-slit interference experiment with electrons. We saw that we could get fewer 
electrons at some places with both slits open than we got with one slit open. It 
works quantitatively this way. We can write the amplitude that an atom will get 
through T and S' in the apparatus of (5.17) as the sum of three amplitudes, one 
for each of the three beams in T ; the sum is equal to zero : 

(0S\+T)(+T\ +5) + (0S\0T)(0T\ +S) + <0S| ~T)(~T \ +S) = 0. 

(5.18) 

None of the three individual amplitudes is zero — for example, the absolute square 
of the second amplitude is 7a, see (5.15) — but the sum is zero. We would have 
also the same answer if S' were set to select the (~S) state. However, in the setup 
of (5.16), the answer is different. If we call a the amplitude to get through T and 
S' , in this case we havef 

a = (+S\+T)(+T\+S) + (+S\0T)(0T\+S) 

+ <+S| -T)(~T | +5) - 1. (5.19) 

In the experiment (5.16) the beam has been split and recombined. Humpty 
Dumpty has been put back together again. The information about the original 
(+5) state is retained — it is just as though the T apparatus were not there at all. 
This is true whatever is put after the “wide-open” T apparatus. We could follow 
it with an R filter — a filter at some odd angle — or anything we want. The answer 
will always be the same as if the atoms were taken directly from the first S filter. 

So this is the important principle: A T filter — or any filter — with wide-open 
masks produces no change at all. We should make one additional condition. The 
wide-open filter must not only transmit all three beams, but it must also not produce 
unequal disturbances on the three beams. For instance, it should not have a strong 
electric field near one beam and not the others. The reason is that even if this 
extra disturbance would still let all the atoms through the filter, it could change the 
phases of some of the amplitudes. Then the interference would be changed, and 
the amplitudes in Eqs. (5.18) and (5.19) would be different. We will always 
assume that there are no such extra disturbances. 

Let’s rewrite Eqs. (5.18) and (5.19) in an improved notation. We will let 
i stand for any one of the three states (+7 1 ), ( 0 T), or (— T); then the equations can 
be written: 

£ (0S\i)(i\+S) = 0 (5.20) 

all i 

and 

Y, (+$ | i){i | +S) = 1. (5.21) 

all i 

Similarly, for an experiment where S' is replaced by a completely arbitrary filter 
R , we have 


S T R 


f We really cannot conclude from the experiment that a = 1, but only that \a\ 2 = 1, 
so a might be e iS , but it can be shown that the choice 5 = 0 represents no real loss of 
generality. 
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The results will always be the same as if the T apparatus were left out and we had 
only 



8 R 


Or, expressed mathematically, 

Y (+^ I *)(* I +*S) = (+R | +S). (5.23) 

all i 

This is our fundamental law, and it is generally true so long as i stands for the three 
base states of any filter. 

You will notice that in the experiment (5.22) there is no special relation of 
S and R to T . Furthermore, the arguments would be the same no matter what 
states they selected. To write the equation in a general way, without having to 
refer to the specific states selected by S and R, let’s call (“phi”) the state prepared 
by the first filter (in our special example, +S ) and X (“khi”) the state tested by 
the final filter (in our example, +7?). Then we can state our fundamental law of 
Eq. (5.23) in the form 

<X |*> - £ < x I W I *>, (5.24) 

all i 

where i is to range over the three base states of some particular filter. 

We want to emphasize again what we mean by base states. They are like the 
three states which can be selected by one of our Stern-Gerlach apparatuses. One 
condition is that if you have a base state, then the future is independent of the past. 
Another condition is that if you have a complete set of base states, Eq. (5.24) is 
true for any set of beginning and ending states $ and X. There is, however, no 
unique set of base states. We began by considering base states with respect to a 
particular apparatus T. We could equally well consider a different set of base 
states with respect to an apparatus S , or with respect to R } etc.f We usually speak 
of the base states “in a certain representation.” 

Another condition on a set of base states in any particular representation is 
that they are all completely different. By that we mean that if we have a (+T) 
state, there is no amplitude for it to go into a ( 0 T) or a (— T) state. If we let / and 
j stand for any two base 'states of a particular set, the general rules discussed in 
connection with (5.8) are that 

(j\i) = 0 

for all i and j that are not equal. Of course, we know that 

a\i) = i. 

These two equations are usually written as 

(j\i) = (5.25) 

where (the “Kronecker delta”) is a symbol that is defined to be zero for i 9 * j 9 
and to be one for i = j. 

Equation (5.25) is not independent of the other laws we have mentioned. 
It happens that we are not particularly interested in the mathematical problem of 
finding the minimum set of independent axioms that will give all the laws as conse- 
quences. t We are satisfied if we have a set that is complete and not apparently 
inconsistent. We can, however, show that Eqs. (5.25) and (5.24) are not inde- 
pendent. Suppose we let <i> in Eq. (5.24) represent one of the base states of the 


t In fact, for atomic systems with three or more base states, there exist other kinds of 
filters — quite different from a Stern-Gerlach apparatus — which can be used to get more 
choices for the set of base states (each set with the same number of states), 
t Redundant truth doesn’t bother us! 
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same set as /, say the yth state; then we have 

<* i ./) = ]C i '><* i 

i 

But Eq. (5.25) says that (i\j) is zero unless i = j, so the sum becomes just (x \ j) 
and we have an identity, which shows that the two laws are not independent. 

We can see that there must be another relation among the amplitudes if both 
Eqs. (5.10) and (5.24) are true. Equation (5.10) is 

(+t | +s)(+r | +5)* + (or | +5)<or| +5)* + <-r| +s><-r| +s)* = 1 . 

Tf we write Eq. (5.24), letting both and x be the state (+5), the left-hand side 
is (+S | +S ), which is clearly — 1 ; so we get once more Eq. (5.19), 

(+S\+T)(+T\ +5) + (+S\OT)(OT\+S) + (+5 | -T){~T\ +5) = 1. 

These two equations are consistent (for all relative orientations of the T and S 
apparatuses) only if 

(+S\+T) - (+T | +5)*, 

(+s | o T) = <or| +5)*, 

(+s\ -T) = <-r| +5)*. 

And it follows that for any states <£ and x, 

<0 I X) = (X | 0>*. (5.26) 

If this were not true, probability wouldn't be “conserved,” and particles would 
get “lost.” 

Before going on, we want to summarize the three important general laws about 
amplitudes. They are Eqs. (5,24), (5.25), and (5.26): 

I (j I i) = 

II (x | <*>)=£ <x | ;></ 1 4>), (5.27) 

nil i 

III (0 I X) = (X I </>)*. 

In these equations the / and j refer to all the base states of some one representation, 
while 4> and x represent any possible states of the atom. It is important to note that 
II is valid only if the sum is carried out over all the base states of the system (in 
our case, three: +T, 0 T, —T). These laws say nothing about what we should 
choose for a base for our set of base states. We began by using a T apparatus, 
which is a Stern-Gerlach experiment with some arbitrary orientation; but any other 
orientation, say W, would be just as good. We would have a different set of states 
to use for i and y, but all the laws would still be good — there is no unique set. One 
of the great games of quantum mechanics is to make use of the fact that things 
can be calculated in more than one way. 

5-6 The machinery of quantum mechanics 

We want to show you why these laws are useful. Suppose we have an atom in 
a given condition (by which we mean that it was prepared in a certain way), and 
we want to know what will happen to it in some experiment. In other words, we 
start with our atom in the state <j> and want to know what are the odds that it will go 
through some apparatus which accepts atoms only in the condition x. The laws 
say that we can describe the apparatus completely in terms of three complex num- 
bers (x | /), the amplitudes for each base state to be in the condition x; and that 
we can tell what will happen if an atom is put into the apparatus if we describe the 
state of the atom by giving three numbers (i | <£), the amplitudes for the atom in its 
original condition to be found in each of the three base states. This is an important 
idea. 
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Let’s consider another illustration. Think of the following problem: We start 
with an S apparatus; then we have a complicated mess of junk, which we can call 
A , and then an R apparatus — like this: 



s 


+1 

o 

-I 


R 


(5.28) 


By A we mean any complicated arrangement of Stem-Gerlach apparatuses with 
masks or half-masks, oriented at peculiar angles, with odd electric and magnetic 
fields . . . almost anything you want to put. (It’s nice to do thought experiments— 
you don’t have to go to all the trouble of actually building the apparatus!) The 
problem then is: With what amplitude does a particle that enters the section A 
in a ( +S ) state come out of A in the ( 0 R) state, so that it will get through the last 
R filter? There is a regular notation for such an amplitude; it is 


(0R\A\ +S ). 


As usual, it is to be read from right to left (like Hebrew): 

(finish | through | start). 

If by chance A doesn’t do anything — but is just an open channel — then we write 

(OR | 1 | +5) = (OR | +S>; (5.29) 

the two symbols are equivalent. For a more general problem, we might replace 
(+S) by a general starting state 0 and ( 0 R) by a general finishing state X, and we 
would want to know the amplitude 

(X\A\ 0). 

A complete analysis of the apparatus A would have to give the amplitude (x | A | 0) 
for every possible pair of states 0 and X — an infinite number of combinations! 
How then can we give a concise description of the behavior of the apparatus A ? 
We can do it in the following way. Imagine that the apparatus of (5.28) is modified 
to be 



ST T R 


This is really no modification at all since the wide-open T apparatuses don’t do 
anything. But they do suggest how we can analyze the problem. There is a certain 
set of amplitudes (i | -\-S) that the atoms from 5 will get into the i state of T. Then 
there is another set of amplitudes that an i state (with respect to T) entering A 
will come out as a j state (with respect to T). And finally there is an amplitude 
that each ./ state will get through the last filter as a ( 0 R) state. For each possible 
alternative path, there is an amplitude of the form 

(0R\j)(j\A\i)(i\+S), 

and the total amplitude is the sum of the terms we can get with all possible combi- 
nations of / and /. The amplitude we want is 

X < 0 * I j)(j I A I /)</ 1 +s). (5.31) 


If (0 R) and ( +S ) are replaced by general states X and 0, we would have the same 
kind of expression ; so we have the general result 

<X I A I 0) = < x I j)(j I A I i)(i I <£>• (5- 32 ) 
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Now notice that the right-hand side of Eq. (5.32) is really “simpler” than the 
left-hand side. The apparatus A is completely described by the nine numbers 
(j | A | i) which tell the response of A with respect to the three base states of the 
apparatus T. Once we know these nine numbers, we can handle any two incoming 
and outgoing states <j> and X if we define each in terms of the three amplitudes for 
going into, or from, each of the three base states. The result of an experiment is 
predicted using Eq. (5.32). 

This then is the machinery of quantum mechanics for a spin-one particle. 
Every state is described by three numbers which are the amplitudes to be in each 
of some selected set of base states. Every apparatus is described by nine numbers 
which are the amplitudes to go from one base state to another in the apparatus. 
From these numbers anything can be calculated. 

The nine amplitudes which describe the apparatus are often written as a 
square matrix — called the matrix ( j | A\i): 




from 




+ 

0 

— 


to + 

<+MI+> 

<+ Mlo> 

<+MI-> 


0 

(o MI+> 

(o M|o> 

(o Mi -> 

(5.33) 

- 

<-MI+> 

<-MI<>> 

(-Ml-) 


The mathematics of quantum mechanics is just an 

extension of this idea. We 

will give you a simple illustration. Suppose we have an apparatus C that we wish to 


analyze — that is, we want to calculate the various (j\C\ i). For instance, we might 
want to know what happens in an experiment like 



s R 


But then we notice that C is just built of two pieces of apparatus A and B in series — 
the particles go through A and then through B — so we can write symbolically 


c = U 


. 


(5.35) 


We can call the C apparatus the “product” of A and B. Suppose also that we 
already know how to analyze the two parts ; so we can get the matrices (with respect 
to 7’) of A and B. Our problem is then solved. We can easily find 

<x|c|*> 

for any input and output states. First we write that 

(x I C j <*>) = (x I B I k)(k \A\4>). 

k 

Do you see why? {Hint: Imagine putting a T apparatus between A and B.) Then 
if we consider the special case in which </> and x are also base states (of T\ say i 
and j, we have 

U\c\f> = £ (5-36) 

k 

This equation gives the matrix for the “product” apparatus C in terms of the two 
matrices of the apparatuses A and B. Mathematicians call the new matrix (j | C | i) 
— formed from two matrices (j \ B \ i) and (j \ A \ /) according to the sum specified 
in Eq. (5.36) — the “product” matrix BA of the two matrices B and A. (Note 
that the order is important, AB ^ BA.) Thus, we can say that the matrix for a 
succession of two pieces of apparatus is the matrix product of the matrices for the 
two apparatuses (putting the first apparatus on the right in the product). Anyone 
who knows matrix algebra then understands that we mean just Eq. (5.36). 
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5-7 Transforming to a different base 

We want to make one final point about the base states used in the calculations. 
Suppose we have chosen to work with some particular base — say the 5 base — and 
another fellow decides to do the same calculations with a different base — say the 
T base. To keep things straight let’s call our base states the (/S) states, where 
i = +, 0, -. Similarly, we can call his base states (jT). How can we compare 
our work with his? The final answers for the result of any measurement should 
come out the same, but in the calculations the various amplitudes and matrices 
used will be different. How are they related? For instance, if we both start with 
the same <f>, we will describe it in terms of the three amplitudes (iS | <j>) that <f> 
goes into our base states in the S representation, whereas he will describe it by the 
amplitudes (jT | cj>) that the state <f> goes into the base states is his T representation. 
How can we check that we are really both describing the same state <f>? We can do 
it with the general rule II in (5.27). Replacing x by any one of his states jT, we have 

<JT\4>) = £ (jT\iS)(iS\4>). (5.37) 

3 

To relate the two representations, we need only give the nine complex numbers of 
the matrix (JT \ iS). This matrix can then be used to convert all of his equations 
to our form. It tells us how to transform from one set of base states to another. 
(For this reason (jT \ iS) is sometimes called “the transformation matrix from 
representation S to representation T.” Big words!) 

For the case of spin-one partitles for which we have only three base states 
(for higher spins, there are more) the mathematical situation is analogous to what 
we have seen in vector algebra. Every vector can be represented by giving three 
numbers — the components along the axes .v, y, and z. That is, every vector can 
be resolved into three “base” vectors which are vectors along the three axes. But 
suppose someone else chooses to use a different set of axes— x', y', and z'. He will 
be using different numbers to represent any particular vector. His calculations will 
look different, but the final results will be the same. We have considered this before 
and know the rules for transforming vectors from one set of axes to another. 

You may want to see how the quantum mechanical transformations work by 
trying some out; so we will give here, without proof, the transformation matrices 
for converting the spin-one amplitudes in one representation 5 to another repre- 
sentation T, for various special relative orientations of the S and T filters. (We 
will show you in a later chapter how to derive these same results.) 

First case: The T apparatus has the same _y-axis (along which the particles 
move) as the S apparatus, but is rotated about the common j’-axis by the angle 
a (as in Fig. 5-6). (To be specific, a set of coordinates x', y', z' is fixed in the T 
apparatus, related to the x, y, z coordinates of the 5 apparatus by : z' = z cos a — 
x sin a, x' = xcos a — zsina,/ = y.) Then the transformation amplitudes are: 

<+r| +5) = K1 + cos a), 

(or I + S> = - sin a, 

V2 

(~T\+S) = 1(1 - cos «), 

(+T | 0 S) = + —p sin a, 

\/2 

( 0 r I 0 S) = cos a, (5.38) 

<-r| OS) = - sin a, 
v2 

<+r| -S) = |(l - cos a), 

( 0 T | — S) = -) — sin a, 

V2 

i-T I -S) = §(1 + COS a). 
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Second Case : The T apparatus has the same z-axis as S , but is rotated around 
the z-axis by the angle 0. (The coordinate transformation is z' = z, x* = 
x cos 0 + y sin 0, y f = y cos 0 — x sin 0.) Then the transformation amplitudes 
are: 

(+T\+S) - e+* 9 

(0T\ OS) - 1, 

<-r| -S) = ^ 5-39) 

all others = 0. 

Note that any rotations of T whatever can be made up of the two rotations 
described. 

If a state 4> is defined by the three numbers 

C + = (+S |*>, C () = < 0 5 | 0), C_ = <-S | *), (5.40) 

and the same state is described from the point of view of T by the three numbers 

c; = <+r|*>, c' 0 = (ot\<i>\ cl = (5.4i) 

then the coefficients (jT \ iS ) of (5.38) or (5.39) give the transformation connect- 

ing C'i and Cl In other words, the Q are very much like the components of a 
vector that appear different from the point of view of S and T. 

For a spin-one particle only — because it requires three amplitudes — the cor- 
respondence with a vector is very close. In each case, there are three numbers that 
must transform with coordinate changes in a certain definite way. In fact, there 
is a set of base states which transform just like the three components of a vector. 
The three combinations 


c, = - 4= (C + - C_), Cy = - (C + + C_), C* = Cn (5.42) 

V2 V2 

transform to C', Cy, and Cl just the way that x, y, z transform to x\ /, z [You 
can check that this is so by using the transformation laws (5.38) and (5.39).] 
Now you see why a spin-one particle is often called a “vector particle.” 


5-8 Other situations 

We began by pointing out that our discussion of spin-one particles would be 
a prototype for any quantum mechanical problem. The generalization has only 
to do with the numbers of states. Instead of only three base states, any particular 
situation may involve n base states. f Our basic laws in Eq. (5.27) have exactly 
the same form — with the understanding that / and j must range over all n base 
states. Any phenomenon can be analyzed by giving the amplitudes that it starts 
in each one of the base states and ends in any other one of the base states, and then 
summing over the complete set of base states. Any proper set of base states can 
be used, and if someone wishes to use a different set, it is just as good ; the two can 
be connected by using an n by n transformation matrix. We will have more to 
say later about such transformations. 

Finally, we promised to remark on what to do if atoms come directly from a 
furnace, go through some apparatus, say A, and are then analyzed by a filter which 
selects the state x. You do not know what the state <j> is that they start out in. It 
is perhaps best if you don’t worry about this problem just yet, but instead concen- 
trate on problems that always start out with pure states. But if you insist, here is 
how the problem can be handled. 

First, you have to be able to make some reasonable guess about the way the 
states are distributed in the atoms that come from the furnace. For example, if 


t The number of base states n may be, and generally is, infinite. 
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there were nothing “special” about the furnace, you might reasonably guess 
that atoms would leave the furnace with random “orientations.” Quantum me- 
chanically, that corresponds to saying that you don’t know anything about the 
states, but that one-third are in the (+5) state, one-third are in the (0 5) state, 
and one-third are in the (— S) state. For those that are in the (+S) state the 
amplitude to get through is <X | A \ +S) and the probability is |<X | A \ + S)| 2 , 
and similarly for the others. The overall probability is then 

il<X | A | +S>| 2 + i\(x \A | 0 S)| 2 + m | A | -5}| 2 . 

Why did we use S rather than, say, T1 The answer is, surprisingly, the same no 
matter what we choose for our initial resolution — so long as we are dealing with 
completely random orientations. It comes about in the same way that 

E i< x I iS >\ 2 = E i< x I W 

i .7 

for any X. (We leave it for you to prove.) 

Note that it is not correct to say that the input state has the amplitudes n/T/T 
to be in (+5), \/T73 to be in ( 0 S ), and \/T ] 3 to be in (- S)\ that would imply that 
certain interferences might be possible. It is simply that you do not know what 
the initial state is; you have to think in terms of the probability that the system 
starts out in the various possible initial states, and then you have to take a weighted 
average over the various possibilities. 
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Spin One-Half f 


6-1 Transforming amplitudes 

In the last chapter, using a system of spin one as an example, we outlined 
the general principles of quantum mechanics: 

Any state \p can be described in terms of a set of base states by giving 
the amplitudes to be in each of the base states. 

The amplitude to go from any state to another can, in general, be written 
as a sum of products, each product being the amplitude to go into one 
of the base states times the amplitude to go from that base state to the 
final condition, with the sum including a term for each base state: 

<X I *> =» E < x I *><* I *>■ (6-1) 


6-1 Transforming amplitudes 

6-2 Transforming to a rotated 
coordinate system 

6-3 Rotations about the z-axis 

6-4 Rotations of 180° and 90° 
about y 

6-5 Rotations about x 
6-6 Arbitrary rotations 


The base states are orthogonal — the amplitude to be in one if you are 
in the other is zero : 

{i I J) = (6.2) 

The amplitude to get from one state to another directly is the complex 
conjugate of the reverse: 


(x I *)* = I x). (6.3) 

We also discussed a little bit about the fact that there can be more than one 
base for the states and that we can use Eq. (6.1) to convert from one base to 
another. Suppose, for example, that we have the amplitudes (iS \ j/) to find the 
state yp in every one of the base states / of a base system S , but that we then decide 
that we would prefer to describe the state in terms of another set of base states, 
say the states j belonging to the base T . In the general formula, Eq. (6.1), we 
could substitute jT for x and obtain this formula: 

(JT\t) = Y J (jT\iS){iS\t). (6.4) 

i 

The amplitudes for the state (\p) to be in the base states (zT) are related to the 
amplitudes to be in the base states ( iS ) by the set of coefficients (jT \ iS). If there 
are N base states, there are N 2 such coefficients. Such a set of coefficients is often 
called the “ transformation matrix to go from the S-representation to the T -represen- 
tation” This looks rather formidable mathematically, but with a little renaming 
we can see that it is really not so bad. If we call C* the amplitude that the state \p 
is in the base state iS — that is, C{ = (iS \ \p) — and call C- the corresponding 
amplitudes for the base system 7 1 — that is, C/ = (jT | ^), then Eq. (6.4) can be 
written as 

C'j - E R H C *> ( 6 - 5 ) 


where Rji means the same thing as (jT \ iS). Each amplitude C' is equal to a sum 


t This chapter is a rather long and abstract side tour, and it does not introduce any 
idea which we will not also come to by a different route in later chapters. You can, 
therefore, skip over it, and come back later if you are interested. 
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over all i of one of the coefficients R 3i times each amplitude Q. It has the same 
form as the transformation of a vector from one coordinate system to another. 

In order to avoid being too abstract for too long, we have given you some 
examples of these coefficients for the spin-one case, so you can see how to use 
them in practice. On the other hand, there is a very beautiful thing in quantum 
mechanics — that from the sheer fact that there are three states and from the 
symmetry properties of space under rotations, these coefficients can be found 
purely by abstract reasoning. Showing you such arguments at this early stage has 
a disadvantage in that you are immersed in another set of abstractions before we 
get “down to earth.” However, the thing is so beautiful that we are going to do 
it anyway. 

We will show you in this chapter how the transformation coefficients can be 
derived for spin one-half particles. We pick this case, rather than spin one, because 
it is somewhat easier. Our problem is to determine the coefficients R 3i for a 
particle— an atomic system — which is split into two beams in a Stern-Gerlach 
apparatus. We are going to derive all the coefficients for the transformation from 
one representation to another by pure reasoning — plus a few assumptions. Some 
assumptions are always necessary in order to use “pure” reasoning! Although 
the arguments will be abstract and somewhat involved, the result we get will be 
relatively simple to state and easy to understand — and the result is the most 
important thing. You may, if you wish, consider this as a sort of cultural excursion. 
We have, in fact, arranged that all the essential results derived here are also 
derived in some other way when they are needed in later chapters. So you need 
have no fear of losing the thread of our study of quantum mechanics if you omit 
this chapter entirely, or study it at some later time. The excursion is “cultural” 
in the sense that it is intended to show that the principles of quantum mechanics 
are not only interesting, but are so deep that by adding only a few extra hypotheses 
about the structure of space, we can deduce a great many properties of physical 
systems. Also, it is important that we know where the different consequences of 
quantum mechanics come from, because so long as our laws of physics are in- 
complete — as we know they are — it is interesting to find out whether the places 
where our theories fail to agree with experiment is where our logic is the best or 
where our logic is the worst. Until now, it appears that where our logic is the most 
abstract it always gives correct results — it agrees with experiment. Only when we 
try to make specific models of the internal machinery of the fundamental particles 
and their interactions are we unable to find a theory that agrees with experiment. 
The theory then that we are about to describe agrees with experiment wherever 
it has been tested — for the strange particles as well as for electrons, protons, 
and so on. 

One remark on an annoying, but interesting, point before we proceed: It is 
not possible to determine the coefficients R 3i uniquely, because there is always 
some arbitrariness in the probability amplitudes. If you have a set of amplitudes 
of any kind, say the amplitudes to arrive at some place by a whole lot of different 
routes, and if you multiply every single amplitude by the same phase factor — 
say by e %h — you have another set that is just as good. So, it is always possible to 
make an arbitrary change in phase of all the amplitudes in any given problem if 
you want to. 

Suppose you calculate some probability by writing a sum of several amplitudes, 
say (A + B + C + • • *) and taking the absolute square. Then somebody else 
calculates the same thing by using the sum of the amplitudes (A' + B' + C ! + 

* ■ *) and taking the absolute square. If all the A', B\ C', etc., are equal to the 
A , B, C, etc., except for a factor all probabilities obtained by taking the absolute 
squares will be exactly the same, since (A' + B f + C + ■ • *) is then equal to 
e t5 (A + B + C + • • *). Or suppose, for instance, that we were computing 
something with Eq. (6.1), but then we suddenly change all of the phases of a 
certain base system. Every one of the amplitudes (i \ $) would be multiplied by 
the same factor e u . Similarly, the amplitudes (i \ x) would also be changed by 
e l \ but the amplitudes (x | i) are the complex conjugates of the amplitudes (/' | x); 
therefore, the former gets changed by the factor e~ i& . The plus and minus /< 5’s 
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in the exponents cancel out, and we would have the same expression we had 
before. So it is a general rule that if we change all the amplitudes with respect 
to a given base system by the same phase — or even if we just change all the ampli- 
tudes in any problem by the same phase — it makes no difference. There is, there- 
fore, some freedom to choose the phases in our transformation matrix. Every now 
and then we will make such an arbitrary choice — usually following the conventions 
that are in general use. 


6-2 Transforming to a rotated coordinate system 

We consider again the “improved” Stern-Gerlach apparatus described in the 
last chapter. A beam of spin one-half particles, entering at the left, would, in 
general, be split into two beams, as shown schematically in Fig, 6-1. (There 
were three beams for spin one.) As before, the beams are put back together again 
unless one or the other of them is blocked off by a “stop” which intercepts the 
beam at its half-way point. In the figure we show an arrow which points in the 
direction of the increase of the magnitude of the field — say toward the magnet pole 
with the sharp edges. This arrow we take to represent the “ up ” axis of any particular 
apparatus. It is fixed relative to the apparatus and will allow us to indicate the 
relative orientations when we use several apparatuses together. We also assume 
that the direction of the magnetic field in each magnet is always the same with 
respect to the arrow. 

We will say that those atoms which go in the “upper” beam are in the (+) 
state with respect to that apparatus and that those in the “lower” beam are in the 
(— ) state. (There is no “zero” state for spin one-half particles.) 

Now suppose we put two of our modified Stern-Gerlach apparatuses in 
sequence, as shown in Fig. 6-2(a). The first one, which we call S, can be used to 
prepare a pure (+5) or a pure (—5) state by blocking one beam or the other. 
[As shown it prepares a pure (+S) state.] For each condition, there is some 
amplitude for a particle that comes out of S to be in either the (+7) or the (—7) 
beam of the second apparatus. There are, in fact, just four amplitudes: the ampli- 
tude to go from ( +S ) to (+7), from (+5) to (-7), from (~S) to (+7), from 
(—5) to (—7). These amplitudes are just the four coefficients of the transformation 
matrix Rji to go from the ^-representation to the 7-representation. We can con- 
sider that the first apparatus “prepares” a particular state in one representation 
and that the second apparatus “analyzes” that state in terms of the second repre- 
sentation. The kind of question we want to answer, then, is this: If an atom has 
been prepared in a given condition— say the (+5) state— by blocking one of the 
beams in the apparatus 5, what is the chance that it will get through the second 
apparatus 7 if this is set for, say, the ( — 7) state. The result will depend, of course, 
on the angles between the two systems S and 7. 

We should explain why it is that we could have any hope of finding the co- 
efficients Rji by deduction. You know that it is almost impossible to believe that 
if a particle has its spin lined up in the +z-direction, that there is some chance of 
finding the same particle with its spin pointing in the +x:-direction — or in any 
other direction at all. In fact, it is almost impossible, but not quite. It is so nearly 
impossible that there is only one way it can be done, and that is the reason we can 
find out what that unique way is. 

The first kind of argument we can make is this. Suppose we have a setup like 
the one in Fig. 6-2(a), in which we have the two apparatuses S and 7, with 7 
cocked at the angle a with respect to S, and we let only the (+) beam through 5 
and the ( — ) beam through 7. We would observe a certain number for the 
probability that the particles coming out of S get through 7. Now suppose we 
make another measurement with the apparatus of Fig. 6— 2(b). The relative 
orientation of S and 7 is the same, but the whole system sits at a different angle in 
space. We want to assume that both of these experiments give the same number 
for the chance that a particle in a pure state with respect to S will get into some 
particular state with respect to 7. We are assuming, in other words, that the result 
of any experiment of this type is the same — that the physics is the same — no matter 
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Fig. 6-1. Top and side views of an 
"improved” Stern-Gerlach apparatus 
with beams of a spin one-half particle. 




Fig. 6-2. Two equivalent experi- 
ments. 
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Fig. 6-3. If T is "wide open,” (b) is equivalent to (a). 


how the whole apparatus is oriented in space. (You say, “That’s obvious.” But 
it is an assumption, and it is “right” only if it is actually what happens.) That 
means that the coefficients R j{ depend only on the relation in space of S and T, 
and not on the absolute situation of S and T. To say this in another way, R j{ 
depends only on the rotation which carries S' to T, for evidently what is the same in 
Fig. 6-2(a) and Fig. 6-2(b) is the three-dimensional rotation which would carry 
apparatus S into the orientation of apparatus T. When the transformation matrix 
Rji depends only on a rotation, as it does here, it is called a rotation matrix. 

For our next step we will need one more piece of information. Suppose we 
add a third apparatus which we can call U, which follows T at some arbitrary 
angle, as in Fig. 6-3(a). (It’s beginning to look horrible, but that’s the fun of 
abstract thinking — you can make the most weird experiments just by drawing 
lines!) Now what is the 5 — > T — » U transformation? What we really want to 
ask for is the amplitude to go from some state with respect to S to some other 
state with respect to U, when we know the transformation from 5 to T and from T 
to U. We are then asking about an experiment in which both channels of T are 
open. We can get the answer by applying Eq. (6.5) twice in succession. For 
going from the .V-representation to the /-representation, we have 

C'i = Z R ™ c i> (6-6) 

i 

where we put the superscripts TS on the R , so that we can distinguish it from the 
coefficients R UT we will have for going from T to U. 

Assuming the amplitudes to be in the base states of the ^/-representation 
CJ', we can relate them to the T’-amplitudes by using Eq. (6.5) once more; we get 

C" = E R *? c r (6.7) 

3 

Now we can combine Eqs. (6.6) and (6.7) to get the transformation to U directly 
from S. Substituting Cj from Eq. (6.6) in Eq. (6.7), we have 

Q! = E E R ?* c <‘ (6.8) 

3 i 

Or, since i does not appear in Rk 3 T , we can put the /-summation also in front, and 
write 

O' = E E R Z TR J?Ci- (6.9) 

i 3 

This is the formula for a double transformation. 

Notice, however, that so long as all the beams in T are unblocked, the state 
coming out of T is the same as the one that went in. We could just as well have 
made a transformation from the S'-representation directly to the t/-representa- 
tion. It should be the same as putting the U apparatus right after S, as in Fig. 
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6-3(b). In that case, we would have written 

C'£ = E R%fCi, 

i 

with the coefficients R[f belonging to this transformation. Now, clearly, Eqs. 
(6.9) and (6.10) should give the same amplitudes C k , and this should be true no 
matter what the original state <j> was which gave us the amplitudes Q. So it must 
be that 

R U J = E (6.1D 

3 

In other words, for any rotation S — > U of a reference base, which is viewed as a 
compounding of two successive rotations S — ► T and T — » U, the rotation matrix 
R ki s can be obtained from the matrices of the two partial rotations by Eq. (6.11). 
If you wish, you can find Eq. (6.1 1) directly from Eq. (6.1), for it is only a different 
notation for (kU \ iS ) = (kU \ jT)(jT | iS). 


To be thorough, we should add the following parenthetical remarks. They are not 
terribly important, however, so you can skip to the next section if you want. What we 
have said is not quite right. We cannot really say that Eq. (6.9) and Eq. (6.10) must 
give exactly the same amplitudes. Only the physics should be the same ; all the amplitudes 
could be different by some common phase factor like e iS without changing the result of 
any calculation about the real world. So, instead of Eq. (6.11), all we can say, really, is 
that 

e^Rg s - E M (6-12) 

3 

where 8 is some real constant. What this extra factor of e ib means, of course, is that the 
amplitudes we get if we use the matrix R us might all differ by the same phase (e~ lS ) from 
the amplitude we would get using the two rotations R UT and R TS . We know that it doesn’t 
matter if all amplitudes are changed by the same phase, so we could just ignore this phase 
factor if we wanted to. It turns out, however, that if we define all of our rotation matrices 
in a particular way, this extra phase factor will never appear — the 8 in Eq. (6.12) will 
always be zero. Although it is not important for the rest of our arguments, we can give a 
quick proof by using a mathematical theorem about determinants. [If you don’t yet know 
much about determinants, don’t worry about the proof and just skip to the definition of 
Eq. (6.15).] 

First, we should say that Eq. (6.11) is the mathematical definition of a “product” 
of two matrices. (It is just convenient to be able to say: “ R us is the product of R UT and 
R TS .”) Second, there is a theorem of mathematics — which you can easily prove for the 
two-by-two matrices we have here — which says that the determinant of a “product” of 
two matrices is the product of their determinants. Applying this theorem to Eq. (6.12), 
we get 

e m (Det R us ) = (Det R UT ) • (Det R TS ). (6.13) 


(We leave off the subscripts, because they don’t tell us anything useful.) Yes, the 25 is 
right. Remember that we are dealing with two-by-two matrices; every term in the matrix 
is multiplied by e l5 , so each product in the determinant — which has two factors — gets 
multiplied by e i25 . Now let’s take the square root of Eq. (6.13) and divide it into Eq. 
(6.12); we get 


RYI _ v m R T ,F 

VDet R us i V' Det R UT V Det R TS 


(6.14) 


The extra phase factor has disappeared. 

Now it turns out that if we want all of our amplitudes in any given representation 
to be normalized (which means, you remember, that £,■ (<f>\i)(i\<t>) = 1), the rotation 
matrices will all have determinants that are pure imaginary exponentials, like e ia . (We 
won’t prove it; you will see that it always comes out that way.) So we can, if we wish, 
choose to make all our rotation matrices R have a unique phase by making Det R = 1 . 
It is done like this. Suppose we find a rotation matrix R in some arbitrary way. We make 
it a rule to “convert” it to “standard form” by defining 

R 

-^standard “ / — 1 (6.15) 

VDet R 
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We can do this because we are just multiplying each term of R by the same phase factor, 
to get the phases we want. In what follows, we will always assume that our matrices have 
been put in the “standard form 1 *; then we can use Eq. (6.11) without having any extra 
phase factors. 


6-3 Rotations about the z - axis 

We are now ready to find the transformation matrix Rji between two different 
representations. With our rule for compounding rotations and our assumption 
that space has no preferred direction, we have the keys we need for finding the 
matrix of any arbitrary rotation. There is only one solution. We begin with the 
transformation which corresponds to a rotation about the z-axis. Suppose we 
have two apparatuses S and T placed in series along a straight line with their axes 
parallel and pointing out of the page, as shown in Fig. 6-4(a). We take our “z-axis” 
in this direction. Surely, if the beam goes “up” (toward +z) in the S apparatus, 
it will do the same in the T apparatus. Similarly, if it goes down in S, it will go 
down in T. Suppose, however, that the T apparatus were placed at some other 
angle, but still with its axis parallel to the axis of 5, as in Fig. 6-4(b). Intuitively, 
you would say that a (+) beam in S would still go with a (+) beam in T, because 
the fields and field gradients are still in the same physical direction. And that 
would be quite right. Also, a (-) beam in S' would still go into a (-) beam in T. 
The same result would apply for any orientation of T in the xy-plane of S. What 
does this tell us about the relation between C'+ = (+T \ ^), Cl = (~T\ and 
C + = (+S | i p), C_ = ( — S | \p)l You might conclude that any rotation about 
the z-axis of the “frame of reference” for base states leaves the amplitudes C + to 
be “up” and “down,” the same as before. We could write C+ = C + and Cl = C_ 
— but that is wrong. All we can conclude is that for such rotations the probabilities 
to be in the “up” beam are the same for the S and T apparatuses. That is, 

\C'+\ = |C + | and |C1| = |C_|. 

We cannot say that the phases of the amplitudes referred to the T apparatus may 
not be different for the two different orientations in (a) and (b) of Fig. 6-4. 


(b) 

✓ \ 

r i 



Fig. 6-4. Rotating 90° about the z-axis. 


The two apparatuses in (a) and (b) of Fig. 6-4 are, in fact, different, as we 
can see in the following way. Suppose that we put an apparatus in front of S which 
produces a pure (+x) state. (The x-axis points toward the bottom of the figure.) 
Such particles would be split into (+z) and ( — z) beams in but the two beams 
would be recombined to give a (+x) state again at P i— the exit of S. The same 
thing happens again in T. If we follow T by a third apparatus C, whose axis is in 
the (+x) direction and, as shown in Fig. 6-5(a), all the particles would go into 
the (+) beam of U. Now imagine what happens if T and U are swung around 
together by 90° to the positions shown in Fig. 6-5(b). Again, the T apparatus 
puts out just what it takes in, so the particles that enter U are in a (+x) state with 
respect to S. But U now analyzes for the (+j;) state with respect to 5, which is 
different. (By symmetry, we would now expect only one-half of the particles to 
get through.) 
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Fig. 6-5. Particle in a ( + x) state behaves differently in (a) and (b). 

What could have changed? The apparatuses T and U are still in the same 
physical relationship to each other. Can the physics be changed just because T 
and U are in a different orientation? Our original assumption is that it should not. 

It must be that the amplitudes with respect to T are different in the two cases shown 
in Fig. 6-5 — and, therefore, also in Fig. 6-4. There must be some way fora 
particle to know that it has turned the corner at P\. How could it tell? Well, all 
we have decided is that the magnitudes of C[ and C' 2 are the same in the two cases, 
but they could — in fact, must — have different phases . We conclude that C+ and 
C + must be related by 

C'+ = e ix c+, 

and that CL and C_ must be related by 

CL = e ifi C- 


where X and p are real numbers which must be related in some way to the angle 
between S and T. 

The only thing we can say at the moment about X and p is that they must not 
be equal [except for the special case shown in Fig. 6-5 (a), when T is in the same 
orientation as 5]. We have seen that equal phase changes in all amplitudes have 
no physical consequence. For the same reason, we can always add the same 
arbitrary amount to both X and p without changing anything. So we are permitted 
to choose to make X and p equal to plus and minus the same number. That is, we 
can always take 


X 7 


= X - 


(x + m) 

2 


P f = P - 


(x + aO 
2 


Then 


X' - 


X 

2 


2 


p'. 


So we adopt the convention! that p = — X. We have then the general rule that 
for a rotation of the reference apparatus by some angle about the z-axis, the trans- 
formation is 

C’+ = e +i *C+, CL = e~ ix C_. (6.16) 

The absolute values are the same, only the phases are different. These phase factors 
are responsible for the different results in the two experiments of Fig. 6-5. 

Now we would like to know the law that relates X to the angle between S 
and T. We already know the answer for one case. If the angle is zero, X is zero. 
Now we will assume that the phase shift X is a continuous function of angle <t> 
between S and T (see Fig. 6-4) as <j> goes to zero — as only seems reasonable. In 


t Looking at it another way, we are just putting the transformation in the “standard 
form” described in Section 6-2 by using Eq. (6.15). 
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ther words, if we rotate T from the straight line through S by the small angle e, the 
X is also a small quantity, say me, where m is some number. We write it this way 
because we can show that X must be proportional to e. Suppose we were to put 
after T another apparatus T f which makes the angle e with T, and, therefore, the 
angle 2e with S. Then, with respect to T, we have 

C'+ = e iX C+, 


and with respect to T f , we have 

C'l = e^C’ + = e in C + . 

But we know that we should get the same result if we put T f right after S. Thus, 
when the angle is doubled, the phase is doubled. We can evidently extend the 
argument and build up any rotation at all by a sequence of infinitesimal rotations. 
We conclude that for any angle <£, X is proportional to the angle. We can, therefore, 
write X = m<j>. 

The general result we get, then, is that for T rotated about the z-axis by the 
angle <j> with respect to S 


C+ - e im +C+, CL = e- im +C-. (6.17) 

For the angle <j > , and for all rotations we speak of in the future, we adopt the stand- 
ard convention that a positive rotation is a right-handed rotation about the positive 
direction of the reference axis. A positive <j> has the sense of rotation of a right- 
handed screw advancing in the positive z-direction. 

Now we have to find what m must be. First, we might try this argument: 
Suppose T is rotated by 360°; then, clearly, it is right back at zero degrees, and we 
should have C+ = C + and CL = CL, or, what is the same thing, e tm2r = 1. 
We get m = 1 . This argument is wrong ! To see that it is, consider that T is rotated 
by 180°. If m were equal to 1, we would have C+ = e* T C + = — C + and CL = 
e~ i7r C_ = — C_. However, this is just the original state all over again. Both 
amplitudes are just multiplied by — 1 which gives back the original physical system. 
(It is again a case of a common phase change.) This means that if the angle between 
T and S in Fig. 6-5(b) is increased to 180°, the system (with respect to T) would be 
indistinguishable from the zero-degree situation, and the particles would again 
go through the (+) state of the U apparatus. At 180°, though, the (+) state of 
the U apparatus is the (— x) state of the original S apparatus. So a (+*) state 
would become a (— jc) state. But we have done nothing to change the original 
state; the answer is wrong. We cannot have m — 1. 

We must have the situation that a rotation by 360° and no smaller angle 
reproduces the same physical state. This will happen if m = Then, and only 
then, will the first angle that reproduces the same physical state be <j> = 360°.| 
It gives 

c '+ = -C+ } o 

/360 about z-axis. (6.18) 

CL = -Cj 

It is very curious to say that if you turn the apparatus 360° you get new amplitudes. 
They aren’t really new, though, because the common change of sign doesn’t give 
any different physics. If someone else had decided to change all the signs of the 
amplitudes because he thought he had turned 360°, that’s all right; he gets the 
same physics. } So our final answer is that if we know the amplitudes C + and C_ for 
spin one-half particles with respect to a reference frame 5, and we then use a base 


f It appears that m — — ^ would also work. However, we see in (6.17) that the change 
in sign merely redefines the notation for a spin-up particle. 

X Also, if something has been rotated by a sequence of small rotations whose net re- 
sult is to return it to the original orientation, it is possible to define the idea that it has 
been rotated 360° — as distinct from zero net rotation — if you have kept track of the 
whole history. (Interestingly enough, this is not true for a net rotation of 720°.) 
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system referred to T which is obtained from S by a rotation of 0 around the z-axis, 
the new amplitudes are given in terms of the old by 

C'+ = e i4>i2 cA 

I 0 about z. (6.19) 

CL = e~^ /2 C_j 


6-4 Rotations of 180° and 90° about y 

Next, we will try to guess the transformation for a rotation of T with respect 
to S of 180° around an axis perpendicular to the z-axis — say, about the y- axis. 
(We have defined the coordinate axes in Fig. 6-1.) In other words, we start with 
two identical Stern-Gerlach equipments, with the second one, T, turned “upside 
down” with respect to the first one, S, as in Fig. 6-6. Now if we think of our par- 
ticles as little magnetic dipoles, a particle that is the (+5) state — so that it goes on 
the “upper” path in the first apparatus — will also take the “upper” path in the 
second, so that it will be in the minus state with respect to T. (In the inverted 
T apparatus, both the gradients and the field direction are reversed; for a particle 
with its magnetic moment in a given direction, the force is unchanged.) Anyway, 
what is “up” with respect to S will be “down” with respect to T. For these relative 
positions of S and T , then, we know that the transformation must give 

icy = |C_|, |C1| = |C+|. 

As before, we cannot rule out some additional phase factors ; we could have (for 
180° about the y-axis) 

C'+ = e*C- and CL = e iy C + , (6.20) 

where p and 7 are still to be determined. 

What about a rotation of 360° about the >>-axis? Well, we already know the 
answer for a rotation of 360° about the z-axis — the amplitude to be in any state 
changes sign. A rotation of 360° around any axis always brings us back to the 
original position. It must be that for any 360° rotation, the result is the same as 
a 360° rotation about the z-axis — all amplitudes simply change sign. Now suppose 
we imagine two successive rotations of 180° about y — using Eq. (6.20) — we should 
get the result of Eq. (6.18). In other words, 



z 


y 

Fig. 6-6. A rotation of ! 80° about 
the y-axis. 



C'l = e ip CL 

= e*e i7 C + = -C+ 

and 



Q 

II 

•3* 

£ 

= e iy e l& C_ = -C_. 

This means that 




<?'V v = - 1 

or e iy = -e~ is . 


So the transformation for a rotation of 180° about the j>-axis can be written 

C’ + = e i(3 C_, CL = — e-*C+. (6.22) 

The arguments we have just used would apply equally well to a rotation of 1 80° 
about any axis in the x^-plane, although different axes can, of course, give different 
numbers for p. However, that is the only way they can differ. Now there is a cer- 
tain amount of arbitrariness in the number p, but once it is specified for one axis 
of rotation in the xy - plane it is determined for any other axis. It is conventional 
to choose to set p = 0 for a 180° rotation about the y- axis. 

To show that we have this choice, suppose we imagine that p was not equal 
to zero for a rotation about the >>-axis; then we can show that there is some other 
axis in the xy- plane, for which the corresponding phase factor will be zero. Let’s 
find the phase factor p A for an axis A that makes the angle a with the y- axis, as 
shown in Fig. 6-7(a). (For clarity, the figure is drawn with a equal to a negative 
number, but that doesn’t matter.) Now if we take a T apparatus which is initially 
lined up with the S apparatus and is then rotated 180° about the axis A, its axes — 
which we will call x” 9 y ", and z" — will be as shown in Fig 6-7(a). The amplitudes 
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with respect to T will then be 



Fig. 6-7. A 180° rotation about the 
axis A is equivalent to a rotation of 1 80° 
about y, followed by a rotation about z . 


C'l = e’^CL, CL = -e^ i?A C + . (6.23) 

We can now think of getting to the same orientation by the two successive 
rotations shown in (b) and (c) of the figure. First, we imagine an apparatus U 
which is rotated with respect to 5 by 180° about the >’-axis. The axes /, and z' 
of U will be as shown in Fig. 6-7(b), and the amplitudes with respect to U are 
given by (6.22). 

Now notice that we can go from U to T by a rotation about the “z-axis” 
of U, namely about z', as shown in Fig. 6-7(c). From the figure you can see that 
the angle required is two times the angle a but in the opposite direction (with 
respect to z'). Using the transformation of (6.19) with <t> = —2a, we get 

C'l = e~ ia C' + , CL = e +i X'- (6.24) 

Combining Eqs. (6.24) and (6.22), we get that 

d = c i(0 - a) C_, CL = - e- i0 ~ a) C+. (6.25) 

These amplitudes must, of course, be the same as we got in 6.23). So p A must 
be related to a and P by 

Pa = P - a. (6.26) 


This means that if the angle a between the T-axis and the f-axis (of S) is equal to 
P, the transformation for a rotation of 180° about A will have Pa = 0. 

Now so long as some axis perpendicular to the z-axis is going to have p = 0, 
we may as well take it to be the j>-axis. It is purely a matter of convention, and we 
adopt the one in general use. Our result: For a rotation of 180° about the y-axis, 
we have 


C+ = c_ 

r 180° about y. 
CL = -C+j 


(6.27) 


While we are thinking about the .y-axis, let’s next ask for the transformation 
matrix for a rotation of 90° about y. We can find it because we know that two 
successive 90° rotations about the same axis must equal one 180° rotation. We 
start by writing the transformation for 90° in the most general form: 

C' + = aC + + bC_, CL = cC+ + tfC_. (6.28) 

A second rotation of 90° about the same axis would have the same coefficients : 


C'l = aC' + + bCL, CL = cC'+ + dCL. (6.29) 

Combining Eqs. (6.28) and (6.29), we have 


C4. = a(nC+ + hC_) + h(cC+ + c?C_), 
CL = c(aC+ + bCJ) + d(cC+ + dCJ). 


(6.30) 


However, from (6.27) we know that 

C'l = C_, CL = -C+, 

so that we must have that 

oh -I - bd = 1 , 

a 2 + be = 0, (6.31) 

ac + cd = — 1 , 
be + d 2 = 0. 


These four equations are enough to determine all our unknowns: a, b, c, and d. 
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It is not hard to do. Look at the second and fourth equations. Deduce that 
a 2 = d 2 } which means that a = d or else that a ~ ~d. But a = —di s out, 
because then the first equation wouldn’t be right. So d = a. Using this, we have 
immediately that b = 1/2 a and that c = -1/2 a. Now we have everything in 
terms of a. Putting, say, the second equation all in terms of a, we have 


a 


2 



or 


a 4 = 


1 

4* 


This equation has four different solutions, but only two of them give the standard 
value for the determinant. We might as well take a = l/\/2; thenf 


a = 1/V2, b = 1/V2, 

c = -1/V2, d = l/\/2. 


In other words, for two apparatuses S and T , with T rotated with respect to 
S by 90° about the j^-axis, the transformation is 


C- + = -(C + + C _) 


CL - - (-C+ I C_) 

V2 


90 about y. 


(6.32) 


We can, of course, solve these equations for C+ and C_, which will give us 
the transformation for a rotation of minus 90° about y. Changing the primes 
around, we would conclude that 




CL 


V2 


(C+ - C_) 


(C+ + C_) 

Vi 


/ —90 about v. 


(6.33) 


6-5 Rotations about x 

You may be thinking: “This is getting ridiculous. What are they going to 
do next, 47° around y, then 33° about x, and so on, forever?” No, we are almost 
finished. With just two of the transformations we have— 90° about y, and an arbi- 
trary angle about z (which we did first if you remember) — we can generate any 
rotation at all. 

As an illustration, suppose that we want the angle a around x. We know how 
to deal with the angle a around z, but now we want it around x. How do we get 
it? First, we turn the axis z down onto x— which is a rotation of +90° about y, 
as shown in Fig. 6-8. Then we turn through the angle a around z'. Then we 
rotate —90° about y" . The net result of the three rotations is the same as turning 
around x by the angle a. It is a property of space. 

(These facts of the combinations of rotations, and what they produce, are hard 
to grasp intuitively. It is rather strange, because we live in three dimensions, but 
it is hard for us to appreciate what happens if we turn this way and then that way. 
Perhaps, if we were fish or birds and had a real appreciation of what happens when 
we turn somersaults in space, we could more easily appreciate such things.) 

Anyway, let’s work out the transformation for a rotation by a around the 
x-axis by using what we know. From the first rotation by +90° around y the 
amplitudes go according to Eq. (6.32). Calling the rotated axes x', /, and z', the 




Fig. 6~8. A rotation by a about 
the x-axis is equivalent to: (a) a rotation 
by +90° about y, followed by (b) a 
rotation by a about z , followed by (c) a 
rotation of —90° about y". 


f The other solution changes all signs of a, b , c, and d and corresponds to a — 270° 
rotation. 
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next rotation by the angle a around z 1 takes us to a frame x", y", z", for which 
C'l = e ial2 C' + , C’l = e~ ial2 CL . 

The last rotation of —90° about y" takes us to x'", y'" , z'"; by (6.33), 

C%' = 4= (C£ - C'l), C'l' = 4= (C+ + Cl). 
v2 V2 

Combining these last two transformations, we get 

= 4= (e +, “ /2 C^ - e~ ial2 CL), 
v2 

CL = — (e +1Q/2 C^ + e~ ial2 C'_). 

V2 

Using Eqs. (6.32) for C+ and CL, we get the complete transformation: 

C'i! = Kc +, '“ /2 (C + + C_) - C + + C_)}, 


CL = i{e +, '“ /2 (C+ + C_) + 


-iaj2. 


! (-C+ + C_)}. 


We can put these formulas in a simpler form by remembering that 


We get 


e + e = 2 cos 0, 


and 


— = 2/ sin 0. 


C+ = (cos §) C+ + / (sin |)C. 
CL = / (sin gc+ + (cos |)C_ J 


> a about x. 


(6.34) 


Here is our transformation for a rotation about the x-axis by any angle a. It is 
only a little more complicated than the others. 


6-6 Arbitrary rotations 


Now we can see how to do any angle at all. First, notice that any relative 
orientation of two coordinate frames can be described in terms of three angles, as 
shown in Fig. 6-9. If we have a set of axes x\ y\ and z' oriented in any way at all 
with respect to x , y, and z, we can describe the relationship between the two frames 
by means of the three Euler angles a, and 7, which define three successive ro- 
tations that will bring the *, y, z frame into the jc', y\ z' frame. Starting at x, y, z, 
we rotate our frame through the angle j 8 about the z-axis, bringing the x-axis to 
the line Xi. Then, we rotate by a about this temporary x-axis, to bring z down to 
z\ Finally, a rotation about the new z-axis (that is, z f ) by the angle 7 will bring 
the x-axis into x ( and the y-axis into y'.f We know the transformations for each 
of the three rotations— they are given in (6.19) and (6.34). Combining them in 
the proper order, we get 


C'+ = cos ~ e i0+y)l2 C + + / sin ^ e - ! ' < ^ )/2 C_, 
CL = / sin ~ e i0 ~ y)l2 C+ + cos ~ e ~ iil)+y)l2 C_. 


(6.35) 


So just starting from some assumptions about the properties of space, we have 
derived the amplitude transformation for any rotation at all. That means that if 


t With a little work you can show that the frame x, y, z can also be brought into the 
frame x', y ; , z' by the following three rotations about the original axes: (1) rotate by the 
angle 7 around the original z-axis; (2) rotate by the angle a around the original x-axis; 
(3) rotate by the angle ft around the original z-axis. 
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Fig. 6-9. The orientation of any 
coordinate frame x f , y ! , z' relative to 
another frame x, y, z can be defined in 
terms of Euler’s angles a, ( 3 , 7. 



Fig. 6-10. An axis A defined by 
the polar angles 6 and <£. 


we know the amplitudes for any state of a spin one-half particle to go into the two 
beams of a Stern-Gerlach apparatus S , whose axes are x, y, and z, we can calculate 
what fraction would go into either beam of an apparatus T with the axes x' , /, 
and z'. In other words, if we have a state ^ of a spin one-half particle, whose 
amplitudes are C + = (+ | rp) and C_ = {— | to be “up” and “down” with 
respect to the z-axis of the x, y 9 z frame, we also know the amplitudes C+ and CL 
to be “up” and “down” with respect to the z'-axis of any other frame x', y\ z\ 
The four coefficients in Eqs. (6.35) are the terms of the “transformation matrix” 
with which we can project the amplitudes of a spin one-half particle into any 
other coordinate system. 

We will now work out a few examples to show you how it all works. Let’s 
take the following simple question. We put a spin one-half atom through a Stern- 
Gerlach apparatus that transmits only the (+z) state. What is the amplitude that 
it will be in the (+x) state? The +x axis is the same as the +z' axis of a system 
rotated 90° about the .y-axis. For this problem, then, it is simplest to use Eqs. 
(6.32) — although you could, of course, use the complete equations of (6.35). 
Since C + = 1 and C_ = 0, we get C+ = 1 /\/2. The probabilities are the abso- 
lute square of these amplitudes ; there is a 50 percent chance that the particle will 
go through an apparatus that selects the (+*) state. If we had asked about tjae 
( — x) state the amplitude would have been — 1/V2, which also gives a probability 
1/2 — as you would expect from the symmetry of space. So if a particle is in the 
(+z) state, it is equally likely to be in (+x) or (— x), but with opposite phase. 

There’s no prejudice in y either. A particle in the (+z) state has a 50-50 
chance of being in (+>>) or in (— y). However, for these (using the formula for 
rotating —90° about x), the amplitudes are l/\/2 and — i/\/2. In this case, the 
two amplitudes have a phase difference of 90° instead of 180°, as they did for the 
(+x) and (— x). In fact, that’s how the distinction between x and y shows up. 

As our final example, suppose that we know that a spin one-half particle is in 
a state such that it is polarized “up” along some axis A, defined by the angles 
6 and <j> in Fig. 6-10. We want to know the amplitude (C + | \p) that the particle 
is “up” along z and the amplitude <C_ 1 1 £) that it is “down” along z. We can find 
these amplitudes by imagining that A is the z-axis of a system whose x-axis lies in 
some arbitrary direction — say in the plane formed by A and z. We can then bring 
the frame of A into x, y, z by three rotations. First, we make a rotation by — 7t/2 
about the axis A, which brings the x-axis into the line B in the figure. Then we 
rotate by 8 about line B (the new x-axis of frame A) to bring A to the z-axis. Finally, 
we rotate by the angle (ir/2 — 4>) about x. Remembering that we have only a (+) 
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state with respect to A, we get 


C+ = cos | e _,W2 , C_ = sin i e +i * 12 . (6.36) 

We would like, finally, to summarize the results of this chapter in a form that 
will be useful for our later work. First, we remind you that our primary result in 
Eqs. (6.35) can be written in another notation. Note that Eqs. (6.35) mean 
just the same thing as Eq. (6.4). That is, in Eqs. (6.35) the coefficients of c + = 
(+S | \p) and C_ = (— S \ \p) are just the amplitudes (jT | iS) of Eq. (6.4) — - the 
amplitudes that a particle in the /-state with respect to S will be in the y-state with 
respect to T (when the orientation of T with respect to S is given in terms of the 
angles a, and 7). We also called them Rjf in Eq. (6.6). (We have a plethora of 
notations!) For example, = (—T\ +S) is the coefficient of C + in the formula 
for C'_, namely, i sin (a/2) We can, therefore, make a summary of our 

results in the form of a table, as we have done in Table 6-1. 

It will occasionally be handy to have these amplitudes already worked out 
for some simple special cases. Let’s let R z (<i>) stand for a rotation by the angle <t> 
about the z-axis. We can also let it stand for the corresponding rotation matrix 
(omitting the subscripts / and /, which are to be implicitly understood). In the 
same spirit R x (<t>) and R y (<j>) will stand for rotations by the angle <j> about the 
jc-axis or the y- axis. We give in Table 6-2 the matrices — the tables of amplitudes 
(jT | /S)— ’ which project the amplitudes from the S-frame into the T-frame, where 
T is obtained from S by the rotation specified. 


Table 6-1 

The amplitudes (jT \ iS ) for a rotation defined by the 
Euler angles a, ( 3 , 7 of Fig. 6-9 

Rji(flt,0 , 7 ) 


(JT\iS) 

+s 

-S 

+T 

COS — ^09+7)/2 

i sin - e -t(0-7)/ 2 


2 

2 

-r 

/ sin - e i(0-Y)/ 2 

COS — 


2 

2 


Table 6-2 

The amplitudes (jT | iS) for a rotation R(<p) by the angle </> 
about the z-axis, jc-axis, or y - axis 

*.(*) 
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7 


The Dependence of Amplitudes on Time 


7-1 Atoms at rest; stationary states 

We want now to talk a little bit about the behavior of probability amplitudes 
in time. We say a “little bit,” because the actual behavior in time necessarily 
involves the behavior in space as well. Thus, we get immediately into the most 
complicated possible situation if we are to do it correctly and in detail. We are 
always in the difficulty that we can either treat something in a logically rigorous 
but quite abstract way, or we can do something which is not at all rigorous but 
which gives us some idea of a real situation — postponing until later a more careful 
treatment. With regard to energy dependence, we are going to take the second 
course. We will make a number of statements. We will not try to be rigorous — but 
will just be telling you things that have been found out, to give you some feeling 
for the behavior of amplitudes as a function of time. As we go along, the precision 
of the description will increase, so don’t get nervous that we seem to be picking 
things out of the air. It is, of course, all out of the air — the air of experiment and 
of the imagination of people. But it would take us too long to go over the historical 
development, so we have to plunge in somewhere. We could plunge into the ab- 
stract and deduce everything — which you would not understand — or we could 
go through a large number of experiments to justify each statement. We choose 
to do something in between. 

An electron alone in empty space can, under certain circumstances, have a 
certain definite energy. For example, if it is standing still (so it has no translational 
motion, no momentum, or kinetic energy), it has its rest energy. A more compli- 
cated object like an atom can also have a definite energy when standing still, but 
it could also be internally excited to another energy level. (We will describe later 
the machinery of this.) We can often think of an atom in an excited state as having 
a definite energy, but this is really only approximately true. An atom doesn’t 
stay excited forever because it manages to discharge its energy by its interaction 
with the electromagnetic field. So there is some amplitude that a new state is 
generated — with the atom in a lower state, and the electromagnetic field in a higher 
state, of excitation. The total energy of the system is the same before and after, 
but the energy of the atom is reduced. So it is not precise to say an excited atom 
has a definite energy; but it will often be convenient and not too wrong to say that 
it does. 

[Incidentally, why does it go one way instead of the other way? Why does an 
atom radiate light? The answer has to do with entropy. When the energy is in the 
electromagnetic field, there are so many different ways it can be — so many different 
places where it can wander — that if we look for the equilibrium condition, we 
find that in the most probable situation the field is excited with a photon, and the 
atom is de-excited. It takes a very long time for the photon to come back and find 
that it can knock the atom back up again. It’s quite analogous to the classical 
problem: Why does an accelerating charge radiate? It isn’t that it “wants” to lose 
energy, because, in fact, when it radiates, the energy of the world is the same as it 
was before. Radiation or absorption goes in the direction of increasing entropy.] 

Nuclei can also exist in different energy levels, and in an approximation which 
disregards the electromagnetic effects, we can say that a nucleus in an excited state 
stays there. Although we know that it doesn’t stay there forever, it is often useful 
to start out with an approximation which is somewhat idealized and easier to 
think about. Also it is often a legitimate approximation under certain circum- 
stances. (When we first introduced the classical laws of a falling body, we did not 
include friction, but there is almost never a case in which there isn’t some friction.) 
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Then there are the subnuclear “strange particles,” which have various masses. 
But the heavier ones disintegrate into other light particles, so again it is not correct 
to say that they have a precisely definite energy. That would be true only if they 
lasted forever. So when we make the approximation that they have a definite 
energy, we are forgetting the fact that they must blow up. For the moment, then, 
we will intentionally forget about such processes and learn later how to take them 
into account. 

Suppose we have an atom— or an electron, or any particle — which at rest 
would have a definite energy E 0 . By the energy E 0 we mean the mass of the whole 
thing times c 2 . This mass includes any internal energy; so an excited atom has a 
mass which is different from the mass of the same atom in the ground state. (The 
ground state means the state of lowest energy.) We will call E 0 the “energy at rest.” 

For an atom at rest, the quantum mechanical amplitude to find an atom at a 
place is the same everywhere; it does not depend on position. This means, of course, 
that the probability of finding the atom anywhere is the same. But it means even 
more. The probability could be independent of position, and still the phase of the 
amplitude could vary from point to point. But for a particle at rest, the complete 
amplitude is identical everywhere. It does, however, depend on the time . For a 
particle in a state of definite energy E 0 , the amplitude to find the particle at (x, y, z) 
at the time t is 

ae -i(E Q f*)t 9 ( 7 . 1 ) 

where a is some constant. The amplitude to be at any point in space is the same 
for all points, but depends on time according to (7.1). We shall simply assume 
this rule to be true. 

Of course, we could also write (7.1) as 

ae~ iu \ (7.2) 

with 

ho) — E 0 = Me 2 , 

where M is the rest mass of the atomic state, or particle. There are three different 
ways of specifying the energy: by the frequency of an amplitude, by the energy in 
the classical sense, or by the inertia. They are all equivalent; they are just different 
ways of saying the same thing. 

You may be thinking that it is strange to think of a “particle” which has 
equal amplitudes to be found throughout all space. After all, we usually imagine 
a “particle” as a small object located “somewhere.” But don’t forget the uncer- 
tainty principle. If a particle has a definite energy, it has also a definite momentum. 
If the uncertainty in momentum is zero, the uncertainty relation, Ap Ax = h, 
tells us that the uncertainty in the position must be infinite, and that is just what 
we are saying when we say that there is the same amplitude to find the particle 
at all points in space. 

If the internal parts of an atom are in a different state with a different total 
energy, then the variation of the amplitude with time is different. If you don’t 
know in which state it is, there will be a certain amplitude to be in one state and a 
certain amplitude to be in another — and each of these amplitudes will have a dif- 
ferent frequency. There will be an interference between these different components 
— like a beat-note— which can show up as a varying probability. Something will 
be “going on” inside of the atom — even though it is “at rest” in the sense that its 
center of mass is not drifting. However, if the atom has one definite energy, the 
amplitude is given by (7.1), and the absolute square of this amplitude does not 
depend on time. You see, then, that if a thing has a definite energy and if you ask 
any probability question about it, the answer is independent of time. Although 
the amplitudes vary with time, if the energy is definite they vary as an imaginary 
exponential, and the absolute value doesn’t change. 

That’s why we often say that an atom in a definite energy level is in a stationary 
state. If you make any measurements of the things inside, you’ll find that nothing 
(in probability) will change in time. In order to have the probabilities change in 
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time, we have to have the interference of two amplitudes at two different frequencies, 
and that means that we cannot know what the energy is. The object will have one 
amplitude to be in a state of one energy and another amplitude to be in a state of 
another energy. That’s the quantum mechanical description of something when 
its behavior depends on time. 

If we have a “condition” which is a mixture of two different states with differ- 
ent energies, then the amplitude for each of the two states varies with time according 
to Eq. (7.2), for instance, as 

e -* E ii*» and e -W 2 t*)t m (7.3) 

And if we have some combination of the two, we will have an interference. But 
notice that if we added a constant to both energies, it wouldn’t make any difference. 
If somebody else were to use a different scale of energy in which all the energies 
were increased (or decreased) by a constant amount — say, by the amount A — then 
the amplitudes in the two states would, from his point of view, be 

e—i(Ei+A)t!h an( j e —i(E 2 +A)tlft' (7.4) 

All of his amplitudes would be multiplied by the same factor and all 

linear combinations, or interferences, would have the same factor. When we take 
the absolute squares to find the probabilities, all the answers would be the same. 
The choice of an origin for our energy scale makes no difference; we can measure 
energy from any zero we want. For relativistic purposes it is nice to measure the 
energy so that the rest mass is included, but for many purposes that aren’t rela- 
tivistic it is often nice to subtract some standard amount from all energies that 
appear. For instance, in the case of an atom, it is usually convenient to subtract 
the energy M s c 2 , where M s is the mass of all the separate pieces — the nucleus and 
the electrons — which is, of course, different from the mass of the atom. For other 
problems it may be useful to subtract from all energies the amount M g c 2 , where 
M g is the mass of the whole atom in the ground state; then the energy that appears 
is just the excitation energy of the atom. So, sometimes we may shift our zero of 
energy by some very large constant, but it doesn’t make any difference, provided 
we shift all the energies in a particular calculation by the same constant. So much 
for a particle standing still. 


7-2 Uniform motion 

If we suppose that the relativity theory is right, a particle at rest in one inertial 
system can be in uniform motion in another inertial system. In the rest frame of 
the particle, the probability amplitude is the same for all x, y, and z but varies with 
t. The magnitude of the amplitude is the same for all /, but the phase depends on t. 
We can get a kind of a picture of the behavior of the amplitude if we plot lines of 
equal phase — say, lines of zero phase — as a function of x and /. For a particle at 
rest, these equal-phase lines are parallel to the x-axis and are equally spaced in 
the /-coordinate, as shown by the dashed lines in Fig. 7-1. 

In a different frame — Jt 7 , y', z 7 , / 7 — that is moving with respect to the particle 
in, say, the jc-direction, the jc 7 and f coordinates of any particular point in space 
are related to x and t by the Lorentz transformation, This transformation can be 
represented graphically by drawing x r and t' axes, as is done in Fig. 7-1. (See 
Chapter 17, Vol. I, Fig. 17-2.) You can see that in the x'-t 7 system, points of equal 
phasef have a different spacing along the / 7 -axis, so the frequency of the time 
variation is different. Also there is a variation of the phase with x f , so the prob- 
ability amplitude must be a function of x'. 


t We are assuming that the phase should have the same value at corresponding points 
in the two systems. This is a subtle point, however, since the phase of a quantum me- 
chanical amplitude is, to a large extent, arbitrary. A complete justification of this assump- 
tion requires a more detailed discussion involving interferences of two or more amplitudes. 
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Fig. 7-1. Relativistic transformation 
of the amplitude of a particle at rest in 
the x-t systems. 



Under a Lorentz transformation for the velocity v, say along the negative 
^-direction, the time t is related to the time (' by 


t f — x'v/c 2 
Vl — v 2 /c 2 


so our amplitude now varies as 

-(HK)E 0 t _ e -(ijh)(E Q t'Hl-v 2 ic 2 -E 0 vz'lc 2 '' / l-v 2 ic 2 ) 

In the prime system it varies in space as well as in time. If we write the amplitude as 

-(i/K)(E' p t'-p'x') 

^ 3 

we see that E' p = E 0 /\/l — v 2 /c 2 is the energy computed classically for a 
particle of rest energy E 0 travelling at the velocity v, and p f = E f v v/c 2 is the 
corresponding particle momentum. 

You know that x u — (/, x, y , z) and p p = (E,p x , p y ,p z ) are four-vectors, and 
that p p x p — Et — p • x is a scalar invariant. In the rest frame of the particle, 
p is just Et; so if we transform to another frame, Et will be replaced by 

E’f - p’ * X 7 . 


Thus, the probability amplitude of a particle which has the momentum p will be 
proportional to 

(7 . 5 ) 


where E p is the energy of the particle whose momentum is p, that is, 

E v = V ( pc f + El (7.6) 

where is, as before, the rest energy. For nonrelativistic problems, we can write 

E v = M s c 2 + W p , (7.7) 

where W p is the energy over and above the rest energy M s c 2 of the parts of the 
atom. In general, W p would include both the kinetic energy of the atom as well 
as its binding or excitation energy, which we can call the “internal” energy. We 
would write 

W r>= w ^ + m’ ( 7 - 8 ) 

and the amplitudes would be 


Because we will generally be doing nonrelativistic calculations, we will use this 
form for the probability amplitudes. 

Note that our relativistic transformation has given us the variation of the 
amplitude of an atom which moves in space without any additional assumptions. 
The wave number of the space variations is, from (7.9), 


so the wavelength is 


k - p ■ 

K h ’ 

(7.10) 

2w h 

(7.11) 

k p 


This is the same wavelength we have used before for particles with the momentum 
p. This formula was first arrived at by de Broglie in just this way. For a moving 
particle, the frequency of the amplitude variations is still given by 

ha> = W p . (7.12) 
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The absolute square of (7.9) is just 1, so for a particle in motion with a 
definite energy , the probability of finding it is the same everywhere and does not 
change with time. (It is important to notice that the amplitude is a complex wave. 
If we used a real sine wave, the square would vary from point to point, which 
would not be right.) 

We know, of course, that there are situations in which particles move from 
place to place so that the probability depends on position and changes with time. 
How do we describe such situations? We can do that by considering amplitudes 
which are a superposition of two or more amplitudes for states of definite energy. 
We have already discussed this situation in Chapter 48 of Vol. I— even for prob- 
ability amplitudes! We found that the sum of two amplitudes with different wave 
numbers k (that is, momenta) and frequencies to (that is, energies) gives inter- 
ference humps, or beats, so that the square of the amplitude varies with space 
and time. We also found that these beats move with the so-called “group velocity” 
given by 

Aco 


where Ak and Aco are the differences between the wave numbers and frequencies 
for the two waves. For more complicated waves— made up of the sum of many 
amplitudes all near the same frequency — the group velocity is 



V " = dk' 

(7.13) 

Taking co = E p /h and k 

= p/h , we see that 



Ve dp 

(7.14) 

Using Eq. (7.6), we have 

dE v 2 P 

dp c E p ' 

(7.15) 

But E p = Me 2 , so 

dE p p 

dp M 

(7.16) 


which is just the classical velocity of the particle. Alternatively, if we use the non 
relativistic expressions, we have 


and 


to 


Wp 

h 


and 




(7.17) 


which is again the classical velocity. 

Our result, then, is that if we have several amplitudes for pure energy states 
of nearly the same energy, their interference gives “lumps” in the probability that 
move through space with a velocity equal to the velocity of a classical particle 
of that energy. We should remark, however, that when we say we can add two 
amplitudes of different wave number together to get a beat-note that will corre- 
spond to a moving particle, we have introduced something new — something that 
we cannot deduce from the theory of relativity. We said what the amplitude did 
for a particle standing still and then deduced what it would do if the particle were 
moving. But we cannot deduce from these arguments what would happen when 
there are two waves moving with different speeds. If we stop one, we cannot stop 
the other. So we have added tacitly the extra hypothesis that not only is (7.9) a 
possible solution, but that there can also be solutions with all kinds of p’s for the 
same system, and that the different terms will interfere. 
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7-3 Potential energy; energy conservation 



Now we would like to discuss what happens when the energy of a particle 
can change. We begin by thinking of a particle which moves in a force field de- 
scribed by a potential. We discuss first the effect of a constant potential. Suppose 
that we have a large metal can which we have raised to some electrostatic potential 
0, as in Fig. 7-2. If there are charged objects inside the can, their potential energy 
will be q<j > , which we will call V, and will be absolutely independent of position. 
Then there can be no change in the physics inside, because the constant potential 
doesn’t make any difference so far as anything going on inside the can is concerned. 
Now there is no way we can deduce'what the answer should be, so we must make 
a guess. The guess which works is more or less what you might expect: For the 
energy, we must use the sum of the potential energy V and the energy E p — which 
is itself the sum of the internal and kinetic energies. The amplitude is proportional 
to 

p -\- V)t— P’X] /y 


Fig. 7-2. A particle of mass M and 
momentum p in a region of constant 
potential. 


The general principle is that the coefficient of t, which we may call co, is always 
given by the total energy of the system: internal (or “mass”) energy, plus kinetic 
energy, plus potential energy: 

hu = E p + V. (7.19) 

Or, for nonrelativistic situations, 

2 

= W ini + JL-+V. (7.20) 


Now what about physical phenomena inside the box? If there are several 
different energy states, what will we get? The amplitude for each state has the 
same additional factor 



| 

£ 


AA/WVWWVW 

“■j X| J-— -**1^2 1 - """ 

(FOR 4 >< 6 ) 
r 2 r ) 


Fig. 7-3. The amplitude for a par- 
ticle in transit from one potential to 
another. 


over what it would have with V = 0. That is just like a change in the zero of our 
energy scale. It produces an equal phase change in all amplitudes, but as we have 
seen before, this doesn’t change any of the probabilities. All the physical phenomena 
are the same. (We have assumed that we are talking about different states of the 
same charged object, so that q4> is the same for all. If an object could change its 
charge in going from one state to another, we would have quite another result, 
but conservation of charge prevents this.) 

So far, our assumption agrees with what we would expect for a change of 
energy reference level. But if it is really right, it should hold for a potentiaPenergy 
that is not just a constant. In general, V could vary in any arbitrary way with 
both time and space, and the complete result for the amplitude must be given in 
terms of a differential equation. We don’t want to get concerned with the general 
case right now, but only want to get some idea about how some things happen, 
so we will think only of a potential that is constant in time and varies very slowly 
in space. Then we can make a comparison between the classical and quantum ideas. 

Suppose we think of the situation in Fig. 7-3, which has two boxes held at 
the constant potentials (j>i and $ 2 and a region in between where we will assume 
ffiat the potential varies smoothly from one to the other. We imagine that some 
particle has an amplitude to be found in any one of the regions. We also assume 
that the momentum is large enough so that in any small region in which there are 
many wavelengths, the potential is nearly constant. We would then think that in 
any part of the space the amplitude ought to look like (7.18) with the appropriate 
V for that part of the space. 

Let’s think of a special case in which = 0, so that the potential energy 
there is zero, but in which q<j> 2 is negative, so that classically the particle would 
have more energy in the second box. Classically, it would be going faster in the 
second box — it would have more energy and, therefore, more momentum. LetV 
see how that might come out of quantum mechanics. 
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With our assumption, the amplitude in the first box would be proportional to 


and the amplitude in the second box would be proportional to 


-07*)[(Wi n t+P2/2iW + V 2 )/-P2'*] 


(7.21) 


(7.22) 


(Let’s say that the internal energy is not being changed, but remains the same in 
both regions.) The question is: How do these two amplitudes match together 
through the region between the boxes? 

We are going to suppose that the potentials are qill constant in time — so that 
nothing in the conditions varies. We will then suppose that the variations of the 
amplitude (that is, its phase) have the same frequency everywhere — because, so 
to speak, there is nothing in the “medium” that depends on time. If nothing in 
the space is changing, we can consider that the wave in one region “generates” 
subsidiary waves all over space which will all oscillate at the same frequency — 
just as light waves going through materials at rest do not change their frequency. 
If the frequencies in (7.21) and (7.22) are the same, we must have that 

W'int + A + Vi = Wiat + A + V 2 . (7.23) 


Both sides are just the classical total energies, so Eq. (7.23) is a statement of the 
conservation of energy. In other words, the classical statement of the conservation 
of energy is equivalent to the quantum mechanical statement that the frequencies 
for a particle are everywhere the same if the conditions are not changing with time. 
It all fits with the idea that = E . 

In the special example that V x — 0 and V 2 i,s negative, Eq. (7.23) gives that 
/7 2 is greater than p so the wavelength of the waves is shorter in region 2. The 
surfaces of equal phase are shown by the dashed lines in Fig. 7-3. We have also 
drawn a graph of the real part of the amplitude, which shows again how the 
wavelength decreases in going from region 1 to region 2. The group velocity of 
the waves, which is p/M, also increases in the way one would expect from the 
classical energy conservation, since it is just the same as Eq. (7.23). 

There is an interesting special case where V 2 gets so large that V 2 — V x is 
greater than p\/2M. Then p 2> which is given by 


pi - 2M 


li 

2 M 


~ V 2 + 



(7.24) 


is negative . That means that p 2 is an imaginary number, say, ip f . Classically, we 
would say that the particle never gets into region 2 — it doesn't have enough energy 
to climb the potential hill. Quantum mechanically, however, the amplitude is still 
given by Eq, (7.22); its space variation still goes as 


e (ilh)P 2 ' x 


But if p 2 is imaginary, the space dependence becomes a real exponential. Say that 
the particle was initially going in the H-x-direction; then the amplitude would 
vary as 

e- p ’ xtn . (7.25) 

The amplitude decreases rapidly with increasing x. 

Imagine that the two regions at different potentials were very close together, 
so that the potential energy changed suddenly from V x to V 2 , as shown in Fig. 
7-4(a). If we plot the real part of the probability amplitude, we get the dependence 
shown in part (b) of the figure. The wave in the first region corresponds to a 
particle trying to get into the second region, but the amplitude there falls off 
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Fig. 7-4. The amplitude for a particle approaching 
a strongly repulsive potential. 




Fig. 7-5. The penetration of the amplitude through 
a potential barrier. 



Fig. 7-6. (a) The potential function 

for an a-particle in a uranium nucleus, 
(b) The qualitative form of the probability 
amplitude. 


rapidly. There is some chance that it will be observed in the second region— where 
it could never get classically— but the amplitude is very small except right near 
the boundary. The situation is very much like what we found for the total internal 
reflection of light. The light doesn’t normally get out, but we can observe it if we 
put something within a wavelength or two of the surface. 

You will remember that if we put a second surface close to the boundary where 
light was totally reflected, we could get some light transmitted into the second piece 
of material. The corresponding thing happens to particles in quantum mechanics. 
If there is a narrow region with a potential V, so great that the classical kinetic 
energy would be negative, the particle would classically never get past. But quan- 
tum mechanically, the exponentially decaying amplitude can reach across the 
region and give a small probability that the particle will be found on the other side 
where the kinetic energy is again positive. The situation is illustrated in Fig. 7-5. 
This effect is called the quantum mechanical “penetration of a barrier.” 

The barrier penetration by a quantum mechanical amplitude gives the ex- 
planation— or description— of the a-particle decay of a uranium nucleus. The 
potential energy of an a-particle, as a function of the distance from the center, is 
shown in Fig. 7-6(a). If one tried to shoot an a-particle with the energy E into 
the nucleus, it would feel an electrostatic repulsion from the nuclear charge z and 
would, classically, get no closer than the distance r x where its total energy is equal 
to the potential energy V. Closer in, however, the potential energy is much lower 
because of the strong attraction of the short-range nuclear forces. How is it then 
that in radioactive decay we find a-particles which started out inside the nucleus 
coming out with the energy El Because they start out with the energy E inside 
the nucleus and “leak” through the potential barrier. The probability amplitude 
is roughly as sketched in part (b) of Fig. 7-6, although actually the exponential 
decay is much larger than shown. It is, in fact, quite remarkable that the mean 
life of an a-particle in the uranium nucleus is as long as 4\ billion years, when the 
natural oscillations inside the nucleus are so extremely rapid — about 10 22 per sec! 
How can one get a number like 10 9 years from 10 -22 sec? The answer is that the 
exponential gives the tremendously small factor of about e ~ 45 — which gives the 
very small, though definite, probability of leakage. Once the a-particle is in the 
nucleus, there is almost no amplitude at all for finding it outside; however, if you 
take many nuclei and wait long enough, you may be lucky and find one that has 
come out. 
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Fig. 7-7. The deflection of a particle by a 
transverse potential gradient. 



Fig. 7-8. The probability amplitude in a region 
with a transverse potential gradient. 


7-4 Forces; the classical limit 

Suppose that we have a particle moving along and passing through a region 
where there is a potential that varies at right angles to the motion. Classically, we 
would describe the situation as sketched in Fig. 7-7. If the particle is moving 
along the x-direction and enters a region where there is a potential that varies 
with y, the particle will get a transverse acceleration from the force F = —dV/dy. 
If the force is present only in a limited region of width w 9 the force will act only for 
the time w/v. The particle will be given the transverse momentum 


The angle of deflection 66 is then 


50 = ^ = ^, 

P pv 

where p is the initial momentum. Using — dV/dy for F, we get 


66 = - 


W dV 
pv dy 


(7.26) 


It is now up to us to see if our idea that the waves go as (7.20) will explain 
the same result. We look at the same thing quantum mechanically, assuming that 
everything is on a very large scale compared with a wavelength of our probability 
amplitudes. In any small region we can say that the amplitude varies as 


+p 2 i 2M +V)t~P*x] 


(7.27) 


Can we see that this will also give rise to a deflection of the particle when V has 
a transverse gradient? We have sketched in Fig. 7-8 what the waves of prob- 
ability amplitude will look like. We have drawn a set of “wave nodes” which you 
can think of as surfaces where the phase of the amplitude is zero. In every small 
region, the wavelength — the distance between successive nodes — is 


\ h 
A = — » 

P 


where p is related to V through 


W + + y= const. 

2 M 


(7.28) 


In the region where V is larger, p is smaller, and the wavelength is longer. So the 
angle of the wave nodes gets changed as shown in the figure. 

To find the change in angle of the wave nodes we notice that for the two 
paths a and b in Fig. 7-8 there is a difference of potential AK = (dV/dy)D, so 
there is a difference A p in the momentum along the two tracks which can be 
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obtained from (7.28): 


A 



B 4 ~ AK 


(7.29) 


The wave number p/h is, therefore, different along the two paths, which means 
that the phase is advancing at a different rate. The difference in the rate of increase 
of phase is Ak = A p/h, so the accumulated phase difference in the total distance 
w is 

A(phase) = A k • w = • w = — Wm (7.30) 


This is the amount by which the phase on path b is “ahead” of the phase on path 
a as the wave leaves the strip. But outside the strip, a phase advance of this amount 
corresponds to the wave node being ahead by the amount 


&x = 2^ A(phase) 


h 

P 


A(phase) 


or 

Ax =-M-A V-w. (7.31) 

P 2 

Referring to Fig. 7-8, we see that the new wavefronts will be at the angle SB 
given by 

Ax = D 89; (7.32) 

so we have 

D se = - ^A V-w. (7.33) 

P 2 

This is identical to Eq. (7.26) if we replace p/m by v and AV/D by dV/dy. 

The result we have just got is correct only if the potential variations are slow 
and smooth — in what we call the classical limit. We have shown that under these 
conditions we will get the same particle motions we get from F = ma, provided 
we assume that a potential contributes a phase to the probability amplitude equal 
to Vt/h. In the classical limit, the quantum mechanics will agree with Newtonian 
mechanics. 


7-5 The “precession” of a spin one-half particle 

Notice that we have not assumed anything special about the potential energy 
it is just that energy whose derivative gives a force. For instance, in the Stern- 
Gerlach experiment we had the energy U = —/* • B, which gives a force if B has a 
spatial variation. If we wanted to give a quantum mechanical description, we 
would have said that the particles in one beam had an energy that varied one way 
and that those in the other beam had an opposite energy variation. (We could 
put the magnetic energy U into the potential energy V or into the “internal 
energy W; it doesn’t matter.) Because of the energy variation, the waves are 
refracted, and the beams are bent up or down. (We see now that quantum me- 
chanics would give us the same bending as we would compute from the classical 
mechanics.) 

From the dependence of the amplitude on potential energy we would also 
expect that if a particle sits in a uniform magnetic field along the z-direction, its 
probability amplitude must be changing with time according to 

(We can consider that this is, in effect, a definition of p z .) In other words, if we 
place a particle in a uniform field B for a time r, its probability amplitude will be 
multiplied by 

e — (*'/*)(— VzB)t 
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over what it would be in no field. Since for a spin one-half particle, \x z can be 
either plus or minus some number, say /a, the two possible states in a uniform 
field would have their phases changing at the same rate but in opposite direc- 
tions. The two amplitudes get multiplied by 

c ±(t7ft)/xBr (7.34) 

This result has some interesting consequences. Suppose we have a spin one- 
half particle in some state that is not purely spin up or spin down. We can describe 
its condition in terms of the amplitudes to be in the pure up and pure down states. 
But in a magnetic field, these two states will have phases changing at a different 
rate. So if we ask some question about the amplitudes, the answer will depend 
on how long it has been in the field. 

As an example, we consider the disintegration of the muon in a magnetic 
field. When muons are produced as disintegration products of 7r-mesons, they are 
polarized (in other words, they have a preferred spin direction). The muons, in 
turn, disintegrate — in about 2.2 microseconds on the average — emitting an electron 
and two neutrinos: 

ju — > e + v + v. 


In this disintegration it turns out that (for at least the highest energies) the electrons 
are emitted preferentially in the direction opposite to the spin direction of the muon. 

Suppose then that we consider the experimental arrangement shown in Fig. 
7-9. If polarized muons enter from the left and are brought to rest in a block of 
material at A , they will, a little while later, disintegrate. The electrons emitted 
will, in general, go off in all possible directions. Suppose, however, that the muons 
all enter the stopping block at A with their spins in the ^-direction. Without a 
magnetic field there would be some angular distribution of decay directions; we 
would like to know how this distribution is changed by the presence of the mag- 
netic field. We expect that it may vary in some way with time. We can find out 
what happens by asking, for any moment, what the amplitude is that the muon 
will be found in the (+jc) state. 

We can state the problem in the following way: A muon is known to have 
its spin in the +x-direction at t = 0; what is the amplitude that it will be in the 
same state at the time r? Now we do not have any rule for the behavior of a spin 
one-half particle in a magnetic field at right angles to the spin, but we do know what 
happens to the spin up and spin down states with respect to the field — their ampli- 
tudes get multiplied by the factor (7.34). Our procedure then is to choose the 
representation in which the base states are spin up and spin down with respect 
to the z-direction (the field direction). Any question can then be expressed with 
reference to the amplitudes for these states. 

Let’s say that $(0 represents the muon state. When it enters the block A, its 
state is ^(0), and we want to know \p(r) at the later time r. If we represent the two 
base states by (+z) and (— z) we know the two amplitudes (+ z | ^(0)) and 
(-Z | ^(0)) — we know these amplitudes because we know that ^(0) represents a 
state with the spin in the (+*) state. From the results of the last chapter, these 
amplitudes aref 

<+z I +x) = C+ = 

V2 

and (7.35) 

<-z | +x) = C_ = 



Fig. 7-9. A muon-decay experi- 
ment. 


They happen to be equal. Since these amplitudes refer to the condition at / = 0, 
let’s call them C + (0) and C_(0). 


t If you skipped Chapter 6, you can just take (7.35) as an underived rule for now. 
We will give later (in Chapter 10) a more complete discussion of spin precession, including 
a derivation of these amplitudes. 
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Now we know what happens to these two amplitudes with time. Using 
(7.34), we have 

C+(0 = C+( 
and 

C_(/) = C_(0)<? +( '" i) ' 1 "' 

But if we know C+(t) and C_(0, we have all there is to know about the condition 
at t. The only trouble is that what we want to know is the probability that at t 
the spin will be in the + ^-direction. Our general rules can, however, take care of 
this problem. We write that the amplitude to be in the (+x) state at time /, which 

we may call A+(t% is 

A+(t) = i+x I iKO) = <+* I +z>(+z ! HO) + <+* I -z)<-z I HO) 


or 

A + { o = {+* I +z)C+{0 + <+* I -z>C-( 0- (7-37) 

Again using the results of the last chapter— or better the equality (<f> f x) = 
(X | <j>>* from Chapter 5— we know that 

(+x | +z) = <+* I -z> = -7= ' 


So we know all the quantities in Eq. (7.37). We get 

A+ ( t ) = 
or 

fiB 

^+(0 = cos t. 

A particularly simple result! Notice that the answer agrees with what we expect 
for t = 0. We get T t ( 0) = 1, which is right, because we assumed that the muon 
was in the (+x) state at t = 0. 

The probability P + that the muon will be found in the (+x) state at t is 
(T + ) 2 or 


The probability oscillates between zero and one, as shown in Fig. 7-10. Note 
that the probability returns to one for fiBt/h = x ( not 2 x). Because we have 
squared the cosine function, the probability repeats itself with the frequency 
2nB/h. 


Fig. 7-10. Time dependence of the 
probability that a spin one-half particle 
will be in a (+) state with respect to the 
x-axis. 



Thus, we find that the chance of catching the decay electron in the electron 
counter of Fig. 7-9 varies periodically with the length of time the muon has been 
sitting in the magnetic field. The frequency depends on the magnetic moment g. 
The magnetic moment of the muon has, in fact, been measured in just this way. 

We can, of course, use the same method to answer any other questions about 
the muon decay. For example, how does the chance of detecting a decay electron 
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in the ^-direction at 90° to the x-direction but still at right angles to the field depend 
on /? If you work it out, the amplitude to be in the (+ 3 ;) state varies as 
cos 2 {{pBt/h) — 7t/4}, which oscillates with the same period but reaches its max- 
imum one-quarter cycle later, when pBt/h - 7t/4. In fact, what is happening is 
that as time goes on, the muon goes through a succession of states which correspond 
to complete polarization in a direction that is continually rotating about the z-axis. 
We can describe this by saying that the spin is precessing at the frequency 

2 \xB 

= ( 7 - 38 > 

You can begin to see the form that our quantum mechanical description 
will take when we are describing how things behave in time. 
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8 


The Hamiltonian Matrix 


8-1 Amplitudes and vectors 

Before we begin the main topic of this chapter, we would like to describe a 
number of mathematical ideas that are used a lot in the literature of quantum 
mechanics. Knowing them will make it easier for you to read other books or 
papers on the subject. The first idea is the close mathematical resemblance between 
the equations of quantum mechanics and those of the scalar product of two vectors. 
You remember that if X and <f> are two states, the amplitude to start in <£ and end 
up in X can be written as a sum over a complete set of base states of the amplitude 
to go from <j> into one of the base states and then from that base state out again 
into X: 

<x | *> = £ (x I i)(i I <*>). (8.1) 

all i 

We explained this in terms of a Stern-Gerlach apparatus, but we remind you that 
there is no need to have the apparatus. Equation (8.1) is a mathematical law that 
is just as true whether we put the filtering equipment in or not — it is not always 
necessary to imagine that the apparatus is there. We can think of it simply as a 
formula for the amplitude (X [ tf>). 

We would like to compare Eq. (8.1) to the formula for the dot product of 
two vectors B and A . If 2? and A are ordinary vectors in three dimensions, we can 
write the dot product this way: 
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Y ( B ■ e { )( ei ■ A), (8.2) 

all i 

with the understanding that the symbol a stands for the three unit vectors in the 
x, y t and z-directions. Then B e i is what we ordinarily call B x \ B • e 2 is what we 
ordinarily call B y \ and so on. So Eq. (8.2) is equivalent to 

B X A X ~b ByAy -j- B z A Zi 
which is the dot product B • A. 

Comparing Eqs. (8.1) and (8.2), we can see the following analogy: The 
states X and <£ correspond to the two vectors A and B. The base states i correspond 

to the special vectors ei to which we refer all other vectors. Any vector can be 

represented as a linear combination of the three “base vectors” e t -. Furthermore, 
if you know the coefficients of each “base vector” in this combination — that is, 
its three components — you know everything about a vector. In a similar way, 
any quantum mechanical state can be described completely by the amplitude 
{/ 1 0) to go into the base states; and if you know these coefficients, you know 
everything there is to know about the state. Because of this close analogy, what 
we have called a “state” is often also called a “state vector.” 

Since the base vectors ei are all at right angles, we have the relation 

* e i “ (8-3) 

This corresponds to the relations (5.25) among the base states /, 

(i\j) - (8-4) 

You see now why one says that the base states / are all “orthogonal.” 
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There is one minor difference between Eq. (8.1) and the dot product. We 
have that 

= (x\ 0>*. (8.5) 

But in vector algebra, 

A B = B A. 

With the complex numbers of quantum mechanics we have to keep straight the 
order of the terms, whereas in the dot product, the order doesn’t matter. 

Now consider the following vector equation: 

A = ^ etei' A). (8.6) 

i 

It’s a little unusual, but correct. It means the same thing as 

A ^ ] A{€{ = A x e x “j“ AyCy “f- A z € z . (8.7) 

i 

Notice, though, that Eq. (8.6) involves a quantity which is different from a dot 
product. A dot product is just a number , whereas Eq. (8.6) is a vector equation. 
One of the great tricks of vector analysis was to abstract away from the equations 

the idea of a vector itself. One might be similarly inclined to abstract a thing that 

is the analog of a “vector” from the quantum mechanical formula Eq. (8.1) — and 
one can indeed. We remove the <X | from both sides Eq. (8.1) and write the 
following equation (don’t get frightened — it’s just a notation and in a few minutes 
you will find out what the symbols mean): 

U) = E I iW I *>. (8.8) 

i 

One thinks of the bracket (x | 0) as being divided into two pieces. The second 
piece | 0) is often called a ket , and the first piece (X | is called a bra (put together, 
they make a “bra-ket” — a notation proposed by Dirac); the half-symbols (x [ and 
| 0) are also called state vectors . In any case, they are not numbers, and, in general, 
we want the results of our calculations to come out as numbers ; so such “unfinished” 
quantities are only part-way steps in our calculations. 

It happens that until now we have written all our results in terms of numbers. 
How have we managed to avoid vectors? It is amusing to note that even in ordinary 
vector algebra we could make all equations involve only numbers. For instance, 
instead of a vector equation like 

F = rna f 

we could always have written 

C F = C * (ma). 

We have then an equation between dot products that is true for any vector C. 
But if it is true for any C, it hardly makes sense at all to keep writing the C! 

Now look at Eq. (8.1). It is an equation that is true for any X. So to save 
writing, we should just leave out the X and write Eq. (8.8) instead. It has the same 
information provided we understand that it should always be “finished” by “multi- 
plying on the left by” — which simply means reinserting — some (x | on both sides. 
So Eq. (8.8) means exactly the same thing as Eq. (8.1) — no more, no less. When 
you want numbers, you put in the (x | you want. 

Maybe you have already wondered about the <j> in Eq. (8.8). Since the equa- 
tion is true for any 0, why do we keep it? Indeed, Dirac suggests that the 0 also 
can just as well be abstracted away, so that we have only 

l = El l 'X*|. (8.9) 

i 

And this is the great law of quantum mechanics! (There is no analog in vector 
analysis.) It says that if you put in any two states x and 0 on the left and right of 
both sides, you get back Eq. (8.1). It is not really very useful, but it’s a nice 
reminder that the equation is true for any two states. 

8-2 



8-2 Resolving state vectors 

Let’s look at Eq. (8.8) again; we can think of it in the following way. Any 
state vector | 0) can be represented as a linear combination with suitable coefficients 
of a set of base “vectors” — or, if you prefer, as a superposition of “unit vectors” 
in suitable proportions. To emphasize that the coefficients (i \ 0) are just ordinary 
(complex) numbers, suppose we write 

(/ 1 0) = C{. 

Then Eq. (8.8) is the same as 

I *> = E I OC,-. (8.10) 

i 

We can write a similar equation for any other state vector, say | x), with, of course, 
different coefficients — say A. Then we have 

I X) = £ I i)Di. (8.11) 


The A are just the amplitudes (i \ x). 

Suppose we had started by abstracting the 0 from Eq. (8.1). We would 
have had 

<x i = J2 < x i 'X* i- (8-m 

i 

Remembering that (x | i) = (i | x)*, we can write this as 

<X| = £ D?<i|. (8.13) 

i 

Now the interesting thing is that we can just multiply Eq. (8.13) and Eq. (8.10) 
to get back (x | 0). When we do that, we have to be careful of the summation 
indices, because they are quite distinct in the two equations. Let’s first rewrite 
Eq, (8.13) as 

<x| = £ °UJ i> 

3 

which changes nothing. Then putting it together with Eq. (8.10), we have 

<x|*> = £ D*(J\i)Ci. (8.14) 

ij 

Remember, though, that (j \ i ) = S ih so that in the sum we have left only the 
terms with j = i. We get 

<x | *> = £ D* C u (8.15) 

i 

where, of course, A* — (i \ ^}* = (x I i), and C z - = (/ 1 0). Again we see the 
close analogy with the dot product 

A B AiBi. 

i 

The only difference is the complex conjugate on A. So Eq. (8.15) says that if 
the state vectors (X | and | 0) are expanded in terms of the base vectors (i | or | /), 
the amplitude to go from 0 to X is given by the kind of dot product in Eq. (8.15). 
This equation is, of course, just Eq. (8.1) written with different symbols. So we 
have just gone in a circle to get used to the new symbols. 

We should perhaps emphasize again that while space vectors in three dimen- 
sions are described in terms of three orthogonal unit vectors, the base vectors | i) 
of the quantum mechanical states must range over the complete set applicable to 
any particular problem. Depending on the situation, two, or three, or five, or an 
infinite number of base states may be involved. 

We have also talked about what happens when particles go through an 
apparatus. If we start the particles out in a certain state 0, then send them through 
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an apparatus, and afterward make a measurement to see if they are in state x, the 
result is described by the amplitude 


<*M|0>. (8.16) 

Such a symbol doesn’t have a close analog in vector algebra. (It is closer to tensor 
algebra, but the analogy is not particularly useful.) We saw in Chapter 5, Eq. 
(5.32), that we could write (8.16) as 

<X I a I *> = £ <x I i)<i I A | j)(j I *>. (8.17) 

ij 

This is just an example of the fundamental rule Eq. (8.9), used twice. 

We also found that if another apparatus B was added in series with A, then we 
could write 

(X I BA I «> = <X I i){i I B I j)(j I A I k)(k I *>. (8.18) 

ijk 

Again, this comes directly from Dirac’s method of writing Eq. (8.9) — remember 
that we can always place a bar (|), which is just like the factor 1 , between B and A. 

Incidentally, we can think of Eq. (8.17) in another way. Suppose we think 
of the particle entering apparatus A in the state <j> and coming out of A in the state 
\p (“psi”). In other words, we could ask ourselves this question: Can we find a yp 
such that the amplitude to get from ^ to x is always identically and everywhere the 
same as the amplitude (x | A | <£)? The answer is yes. We want Eq. (8.17) to be 
replaced by 

<X|*> = £ <x|i>(i|*>. (8.19) 

i 

We can clearly do this if 

(i 1 1) = £ \ A I j)(j U> = (i M I <t>), (8.20) 

3 

which determines \p. “But it doesn’t determine i p” you say; “it only determines 
(i | ^).” However, (i \ \p) does determine because if you have all the coefficients 
that relate \p to the base states /, then \p is uniquely defined. In fact, we can play 
with our notation and write the last term of Eq. (8.20) as 

(i I *> = E <» I j)<J \ A \*)- (8-21) 

3 

Then, since this equation is true for all i, we can write simply 

I *> = E I j)(j \A\4>). (8.22) 

3 

Then we can say: “The state \p is what we get if we start with <f> and go through the 
apparatus A.” 

One final example of the tricks of the trade. We start again with Eq. (8.17). 
Since it is true for any X and <f>, we can drop them both! We then getf 

A = £ \i)(i\A\j)(j\. (8.23) 


What does it mean? It means no more, no less, than what you get if you put back 
the <f> and X. As it stands, it is an “open” equation and incomplete. If we multiply 
it “on the left” by | <£}, it becomes 

A U> = Y, \i)(i\A\j)U\<t>), (8.24) 

13 


f You might think we should write \A\ instead of just A. But then it would look like 
the symbol for “absolute value of A” so the bars are usually dropped. In general, the 
bar (|) behaves much like the factor one. 
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which is just Eq. (8.22) all over again. In fact, we could have just dropped the 
/s from that equation and written 

I *> = A | *>. (8.25) 

The symbol A is neither an amplitude, nor a vector; it is a new kind of thing 
called an operator. It is something which “operates on” a state to produce a new 
state— Eq. (8.25) says that | \p) is what results if A operates on | 0). Again, it is 
still an open equation until it is completed with some bra like (x | to give 

(x I \P) = (X I A I <j>). (8.26) 

The operator A is, of course, described completely if we give the matrix of ampli- 
tudes (/ | A | /) — also written A io — in terms of any set of base vectors. 

We have really added nothing new with all of this new mathematical notation. 
One reason for bringing it all up was to show you the way of writing pieces of 
equations, because in many books you will find the equations written in the 
incomplete forms, and there's no reason for you to be paralyzed when you come 
across them. If you prefer, you can always add the missing pieces to make an 
equation between numbers that will look like something more familiar. 

Also, as you will see, the “bra” and “ket” notation is a very convenient one. 
For one thing, we can from now on identify a state by giving its state vector. 
When we want to refer to a state of definite momentum p we can say: “the state 
| py\ Or we may speak of some arbitrary state | p). For consistency we will 
always use the ket, writing | yp), to identify a state. (It is, of course an arbitrary 
choice; we could equally well have chosen to use the bra, (\p |.) 

8-3 What are the base states of the world? 

We have discovered that any state in the world can be represented as a super- 
position — a linear combination with suitable coefficients — of base states. You 
may ask, first of all, what base states? Well, there are many different possibilities. 
You can, for instance, project a spin in the z-direction or in some other direction. 
There are many, many different representations , which are the analogs of the differ- 
ent coordinate systems one can use to represent ordinary vectors. Next, what 
coefficients? Well, that depends on the physical circumstances. Different sets of 
coefficients correspond to different physical conditions. The important thing to 
know about is the “space” in which you are working — in other words, what the 
base states mean physically. So the first thing you have to know about, in gen- 
eral, is what the base states are like. Then you can understand how to describe a 
situation in terms of these base states. 

We would like to look ahead a little and speak a bit about what the general 
quantum mechanical description of nature is going to be — in terms of the now 
current ideas of physics, anyway. First, one decides on a particular representation 
for the base states — different representations are always possible. For example, 
for a spin one-half particle we can use the plus and minus states with respect to the 
z-axis. But there's nothing special about the z-axis — you can take any other axis 
you like. For consistency we’ll always pick the z-axis, however. Suppose we begin 
with a situation with one electron. In addition to the two possibilities for the spin 
(“up” and “down” along the z-direction), there is also the momentum of the electron. 
We pick a set of base states, each corresponding to one value of the momentum. 
What if the electron doesn’t have a definite momentum? That’s all right; 
we’re just saying what the base states are. If the electron hasn’t got a definite 
momentum, it has some amplitude to have one momentum and another amplitude 
to have another momentum, and so on. And if it is not necessarily spinning 
up, it has some amplitude to be spinning up going at this momentum, and some 
amplitude to be spinning down going at that momentum, and so on. The 
complete description of an electron, so far as we know , requires only that the 
base states be described by the momentum and the spin. So one acceptable set of 
base states | i) for a single electron refer to different values of the momentum and 
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whether the spin is up or down. Different mixtures of amplitudes — that is, differ- 
ent combinations of the C ’s describe different circumstances. What any particular 
electron is doing is described by telling with what amplitude it has an up-spin or a 
down-spin and one momentum or another — for all possible momenta. So you 
can see what is involved in a complete quantum mechanical description of a 
single electron. 

What about systems with more than one electron? Then the base states get 
more complicated. Let’s suppose that we have two electrons. We have, first of all, 
four possible states with respect to spin: both electrons spinning up, the first one 
down and the second one up, the first one up and the second one down, or both 
down. Also we have to specify that the first electron has the momentum p u and 
the second electron, the momentum p 2 . The base states for two electrons require 
the specification of two momenta and two spin characters. With seven electrons, 
we have to specify seven of each. 

If we have a proton and an electron, we have to specify the spin direction of the 
proton and its momentum, and the spin direction of the electron and its momen- 
tum. At least that’s approximately true. We do not really know what the correct 
representation is for the world. It is all very well to start out by supposing that if 
you specify the spin in the electron and its momentum, and likewise for a proton, 
you will have the base states; but what about the “guts” of the proton? Let’s 
look at it this way. In a hydrogen atom which has one proton and one electron, 
we have many different base states to describe — up and down spins of the proton 
and electron and the various possible momenta of the proton and electron. Then 
there are different combinations of amplitudes Q which together describe the 
character of the hydrogen atom in different states. But suppose we look at the 
whole hydrogen atom as a “particle.” If we didn’t know that the hydrogen atom 
was made out of a proton and an electron, we might have started out and said: 
“Oh, I know what the base states are— they correspond to a particular momentum 
of the hydrogen atom.” No, because the hydrogen atom has internal parts. 
It may, therefore, have various states of different internal energy, and describing 
the real nature requires more detail. 

The question is: Does a proton have internal parts? Do we have to describe 
a proton by giving all possible states of protons, and mesons, and strange particles? 
We don’t know. And even though we suppose that the electron is simple, so that 
all we have to tell about it is its momentum and its spin, maybe tomorrow we will 
discover that the electron also has inner gears and wheels. It would mean that our 
representation is incomplete, or wrong, or approximate — in the same way that a 
representation of the hydrogen atom which describes only its momentum would be 
incomplete, because it disregarded the fact that the hydrogen atom could have 
become excited inside. If an electron could become excited inside and turn into 
something else like, for instance, a muon, then it would be described not just by 
giving the states of the new particle, but presumably in terms of some more com- 
plicated internal wheels. The main problem in the study of the fundamental particles 
today is to discover what are the correct representations for the description of 
nature. At the present time, we guess that for the electron it is enough to specify 
its momentum and spin. We also guess that there is an idealized proton which has 
its x-mesons, and £>mesons, and so on, that all have to be specified. Several dozen 
particles— that’s crazy! The question of what is a fundamental particle and what 
is not a fundamental particle — a subject you hear so much about these days is 
the question of what is the final representation going to look like in the ultimate 
quantum mechanical description of the world. Will the electron s momentum 
still be the right thing with which to describe nature? Or even, should the whole 
question be put this way at all! This question must always come up in any scientific 
investigation. At any rate, we see a problem — how to find a representation. We 
don’t know the answer. We don’t even know whether we have the “right” problem, 
but if we do, we must first attempt to find out whether any particular particle is 
“fundamental” or not. 

In the nonrelativistic quantum mechanics — if the energies are not too high, 
so that you don’t disturb the inner workings of the strange particles and so forth 
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you can do a pretty good job without worrying about these details. You can just 
decide to specify the momenta and spins of the electrons and of the nuclei; then 
everything will be all right. In most chemical reactions and other low-energy 
happenings, nothing goes on in the nuclei; they don’t get excited. Furthermore, 
if a hydrogen atom is moving slowly and bumping quietly against other hydrogen 
atoms— never getting excited inside, or radiating, or anything complicated like 
that, but staying always in the ground state of energy for internal motion— you 
can use an approximation in which you talk about the hydrogen atom as one 
object, or particle, and not worry about the fact that it can do something inside. 
This will be a good approximation as long as the kinetic energy in any collision 
is well below 10 electron volts— the energy required to excite the hydrogen atom to 
a different internal state. We will often be making an approximation in which 
we do not include the possibility of inner motion, thereby decreasing the number 
of details that we have to put into our base states. Of course, we then omit some 
phenomena which would appear (usually) at some higher energy, but by making 
such approximations we can simplify very much the analysis of physical problems. 
For example, we can discuss the collision of two hydrogen atoms at low energy— or 
any chemical process — without worrying about the fact that the atomic nuclei 
could be excited. To summarize, then, when we can neglect the effects of any 
internal excited states of a particle we can choose a base set which are the states of 
definite momentum and z-component of angular momentum. 

One problem then in describing nature is to find a suitable representation for 
the base states. But that’s only the beginning. We still want to be able to say what 
“happens.” If we know the “condition” of the world at one moment, we would like 
to know the condition at a later moment. So we also have to find the laws that 
determine how things change with time. We now address ourselves to this second 
part of the framework of quantum mechanics — how states change with time. 


8-4 How states change with time 

We have already talked about how we can represent a situation in which we 
put something through an apparatus. Now one convenient, delightful “apparatus” 
to consider is merely a wait of a few minutes; that is, you prepare a state <j > , and 
then before you analyze it, you just let it sit. Perhaps you let it sit in some particular 
electric or magnetic field— it depends on the physical circumstances in the world. 
At any rate, whatever the conditions are, you let the object sit from time t\ to 
time / 2 * Suppose that it is let out of your first apparatus in the condition 0 at t x . 
And then it goes through an “apparatus,” but the “apparatus” consists of just 
delay until t 2 - During the delay, various things could be going on — external forces 
applied or other shenanigans — so that something is happening. At the end of the 
delay, the amplitude to find the thing in some state x is no longer exactly the same 
as it would have been without the delay. Since “waiting” is just a special case of 
an “apparatus,” we can describe what happens by giving an amplitude with the 
same form as Eq. (8.17). Because the operation of “waiting” is especially impor- 
tant, we’ll call it U instead of A , and to specify the starting and finishing times t x 
and t 2 , we’ll write U(t 2i t x ). The amplitude we want is 

<x| tf(/ 2 ,/i)|*>. (8- 27 ) 

Like any other such amplitude, it can be represented in some base system or other 
by writing it 

Yj <x I i)(i I u(h, tx) I j)(j ! <*>>• (8.28) 

ij 

Then U is completely described by giving the whole set of amplitudes— the matrix 

<i| £/(/2./i)|y)- (8-29) 

We can point out, incidentally, that the matrix (i \ U(j 2 ,tx) |y) gives much 
more detail than may be needed. The high-class theoretical physicist working in 
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high-energy physics considers problems of the following general nature (because 
it’s the way experiments are usually done). He starts with a couple of particles, 
like a proton and a proton, coming together from infinity. (In the lab, usually one 
particle is standing still, and the other comes from an accelerator that is practically 
at infinity on atomic level.) The things go crash and out come, say, two k-mesons, 
six 7r-mesons, and two neutrons in certain directions with certain momenta. 
What’s the amplitude for this to happen? The mathematics looks like this: 
The 0-state specifies the spins and momenta of the incoming particles. The X 
would be the question about what comes out. For instance, with what amplitude 
do you get the six mesons going in such-and-such directions, and the two neutrons 
going off in these directions, with their spins so-and-so. In other words, X would 
be specified by giving all the momenta, and spins, and so on of the final products. 
Then the job of the theorist is to calculate the amplitude (8.27). However, he is 
really only interested in the special case that ti is — oo and t 2 is +co. (There is 
no experimental evidence on the details of the process, only on what comes in 
and what goes out.) The limiting case of U(t 2 , t x ) as t x — > — oc and t 2 + oo 
is called S, and what he wants is 

(x I S I 4 > }• 

Or, using the form (8.28), he would calculate the matrix 

(i I S\j), 

which is called the S-matrix. So if you see a,theoretical physicist pacing the floor 
and saying, “All I have to do is calculate the S-matrix,” you will know what he 
is worried about. 

How to analyze — how to specify the laws for — the S-matrik is an interesting 
question. In relativistic quantum mechanics for high energies, it is done one way, 
but in nonrelativistic quantum mechanics it can be done another way, which is 
very convenient. (This other way can also be done in the relativistic case, but then 
it is not so convenient.) It is to work out the {/-matrix for a small interval of time — 
in other words for t 2 and t\ close together. If we can find a sequence of such U's 
for successive intervals of time we can watch how things go as a function of time. 
You can appreciate immediately that this way is not so good for relativity, because 
you don’t want to have to specify how everything looks “simultaneously” every- 
where. But we won’t worry about that — we’re just going to worry about non- 
relativistic mechanics. 

Suppose we think of the matrix U for a delay from t x until t 3 which is greater 
than t 2 . In other words, let’s take three successive times: 1 1 less than t 2 less than / 3 . 
Then we claim that the matrix that goes between 1 1 and / 3 is the product in suc- 
cession of what happens when you delay from t x until t 2 and then from t 2 until / 3 . 
It’s just like the situation when we had two apparatuses B and A in series. We can 
then write, following the notation of Section 5-6, 


U(t 3 ,tr) = U(t Si t 2 )' U(t 2 ,ti). (8.30) 

In other words, we can analyze any time interval if we can analyze a sequence of 
short time intervals in between. We just multiply together all the pieces; that’s the 
way that quantum mechanics is analyzed nonrelativistically. 

Our problem, then, is to understand the matrix U(t 2 , 1 1 ) for an infinitesimal 
time interval — for t 2 = ti + At. We ask ourselves this: If we have a state <f> 
now, what does the state look like an infinitesimal time At later? Let’s see how we 
write that out. Call the state at the time t, | 0(0) (we show the time dependence 
of 0 to be perfectly clear that we mean the condition at the time 0* Now we ask 
the question: What is the condition after the small interval of time At later? The 
answer is 

|0({ + AO) = U(t + At, 0 I 0(0)- (8.31) 

This means the same as we meant by (8.25), namely, that the amplitude to 
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find X at the time t + At, is 


(X | *(/ + At)) = (X | U(t + At, t) | *(/)>. (8.32) 

Since we’re not yet too good at these abstract things, let’s project our ampli- 
tudes into a definite representation. If we multiply both sides of Eq. (8.31) 
by (/ 1, we get 

</ 1 Kt + At)) = (i | U(t + At, t) | m)- ( 8 - 33 > 

We can also resolve the 1 1 pit)) into base states and write 

(/ 1 \f/(t + At)) = E (i | U(t + At, t) | j)(J | f(t))- (8.34) 

3 

We can understand Eq. (8.34) in the following way. If we let Cft)= (/ 1 \ p(t)) 
stand for the amplitude to be in the base state i at the time t, then we can think 
of this amplitude (just a number, remember!) varying with time. Each C, becomes 
a function of t. And we also have some information on how the amplitudes 
Ci vary with time. Each amplitude at (/ + At) is proportional to all of the other 
amplitudes at t multiplied by a set of coefficients. Let’s call the [/-matrix (/ 1; , by 
which we mean 

Ua = (i\U\j). 

Then we can write Eq. (8.34) as 

Cft + At) = E Uijft + At, t)Cj(t). (8.35) 

i 

This, then, is how the dynamics of quantum mechanics is going to look. 

We don’t know much about the T/ tJ yet, except for one thing. We know that 
if At goes to zero, nothing can happen — we should get just the original state. So, 
Ua — > 1 and Ua — ► 0, if i ^ ./. In other words, U,, — > 5 a for At — ► 0. Also, we 
can suppose that for small At, each of the coefficients U should differ from 8 i} 
by amounts proportional to Af; so we can write 

Uij = Bij + Kij At . (8.36) 

However, it is usual to take the factor (—//&) f out of the coefficients Kij , for 
historical and other reasons; we prefer to write 

U,ft + At, t) = 8 tJ - t Hij (t) At. (8.37) 

It is, of course, the same as Eq. (8.36) and, if you wish, just defines the coefficients 
Hift). The terms Ha are just the derivatives with respect to t 2 of the coefficients 
Uij(t 2 , t j), evaluated at t 2 = 1 i = t. 

Using this form for U in Eq. (8.35), we have 

Cft + At) = E f - 1 *«<0 Cft). (8.38) 

3 L 

Taking the sum over the 5^- term, we get just which we can put on the other 
side of the equation. Then dividing by At, we have what we recognize as a derivative 

Ci(t + At J t ~ Ci(t) = - 1 E H n<t)cft) 

or 

ih = E BiftYfft). (8.39) 


t We are in a bit of trouble here with notation. In the factor (— i/h ), the i means the 
imaginary unit \/ — 1, and not the index i that refers to the /th base state! We hope that 
you won’t find it too confusing. 
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You remember that Q(7) is the amplitude (i \ \p) to find the state yp in one of 
the base states i (at the time /)• So Eq. (8.39) tells us how each of the coefficients 
(i | \p) varies with time. But that is the same as saying that Eq. (8.39) tells us how 
the state yp varies with time, since we are describing yp in terms of the amplitudes 
(i | yp). The variation of yp in time is described in terms of the matrix H ih which has 
to include, of course, the things we are doing to the system to cause it to change. 
If we know the H i3 ~ which contains the physics of the situation and can, in general, 
depend on the time — we have a complete description of the behavior in time of the 
system. Equation (8.39) is then the quantum mechanical law for the dynamics 
of the world. 

(We should say that we will always take a set of base states which are fixed 
and do not vary with time. There are people who use base states that also vary. 
However, that’s like using a rotating coordinate system in mechanics, and we 
don’t want to get involved in such complications.) 

8-5 The Hamiltonian matrix 

The idea, then, is that to describe the quantum mechanical world we need to 
pick a set of base states i and to write the physical laws by giving the matrix of 
coefficients Hij, Then we have everything — we can answer any question about 
what will happen. So we have to learn what the rules are for finding the H’s to go 
with any physical situation — what corresponds to a magnetic field, or an electric 
field, and so on. And that’s the hardest part. For instance, for the new strange 
particles, we have no idea what H i3 * s to use. In other words, no one knows the 
complete H for the whole world. (Part of the difficulty is that one can hardly hope 
to discover the when no one even knows what the base states are!) We do have 
excellent approximations for nonrelativistic phenomena and for some other special 
cases. In particular, we have the forms that are needed for the motions of electrons 
in atoms— to describe chemistry. But we don’t know the full true H for the 
whole universe. 

The coefficients are called the Hamiltonian matrix or, for short, just the 
Hamiltonian . (How Hamilton, who worked in the 1830’s, got his name on a 
quantum mechanical matrix is a tale of history.) It would be much better called 
the energy matrix , for reasons that will become apparent as we work with it. So 
the problem is: Know your Hamiltonian! 

The Hamiltonian has one property that can be deduced right away, namely, 

that 

H% = Hjt. (8.40) 

This follows from the condition that the total probability that the system is in 
some state does not change. If you start with a particle — an object or the world — 
then you’ve still got it as time goes on. The total probability of finding it somewhere 

i 

which must not vary with time. If this is to be true for any starting condition <£, 
then Eq. (8.40) must also be true. 

As our first example, we take a situation in which the physical circumstances 
are not changing with time; we mean the external physical conditions, so that H 
is independent of time. Nobody is turning magnets on and off. We also pick a 
system for which only one base state is required for the description; it is an ap- 
proximation we could make for a hydrogen atom at rest, or something similar. 
Equation (8.39) then says 

ih = HuCi- (8.41) 

Only one equation— that’s all! And if ffn is constant, this differential equation 
is easily solved to give 

Ci = (const)e _<i/s)i/|1 ' (8.42) 
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This is the time dependence of a state with a definite energy E — Hu. You see 
why H i:i ought to be called the energy matrix. It is the generalization of the energy 
for more complex situations. 

Next, to understand a little more about what the equations mean, we look 
at a system which has two base states. Then Eq. (8.39) reads 


ih ^l = HnCy + H 12 C 2 , 

ih = HuCi + H 2 2C 2 - 


(8.43) 


If the 7/’s are again independent of time, you can easily solve these equations. 
We leave you to try for fun, and we’ll come back and do them later. Yes, you can 
solve the quantum mechanics without knowing the i/’s, so long as they are in- 
dependent of time. 


8-6 The ammonia molecule 


We want now to show you how the dynamical equation of quantum mechanics 
can be used to describe a particular physical circumstance. We have picked an 
interesting but simple example in which, by making some reasonable guesses about 
the Hamiltonian, we can work out some important — and even practical results. 
We are going to take a situation describable by two states: the ammonia molecule. 

The ammonia molecule has one nitrogen atom and three hydrogen atoms 
located in a plane below the nitrogen so that the molecule has the form of a pyramid, 
as drawn in Fig. 8— 1(a). Now this molecule, like any other, has an infinite number 
of states. It can spin around any possible axis; it can be moving in any direction; 
it can be vibrating inside, and so on, and so on. It is, therefore, not a two-state 
system at all. But we want to make an approximation that all other states remain 
fixed, because they don’t enter into what we are concerned with at the moment. 
We will consider only that the molecule is spinning around its axis of symmetry 
(as shown in the figure), that it has zero translational momentum, and that it is 
vibrating as little as possible. That specifies all conditions except one: there are still 
the two possible positions for the nitrogen atom — the nitrogen may be on one side 
of the plane of hydrogen atoms or on the other, as shown in Fig. 8-1 (a) and (b). 
So we will discuss the molecule as though it were a two-state system. We mean 
that there are only two states we are going to really worry about, all other things 
being assumed to stay put. You see, even if we know that it is spinning with a 
certain angular momentum around the axis and that it is moving with a certain 
momentum and vibrating in a definite way, there are still two possible states. We 
will say that the molecule is in the state | 7) when the nitrogen is “up,” as in 
Fig. 8-1 (a), and is in the state | 2) when the nitrogen is “down,” as in (b). The states 
| 7) and | 2) will be taken as the set of base states for our analysis of the behavior 
of the ammonia molecule. At any moment, the actual state | \f) of the molecule 
can be represented by giving C\ = (7 | \p), the amplitude to be in state | 7), and 
C 2 = (2 | ^>, the amplitude to be in state | 2). Then, using Eq. (8.8) we can 
write the state vector | \p) as 




Fig. 8—1. Two equivalent geometric 
arrangements of the ammonia molecule. 


I *> = I W I *> + I 2){2 | *> 

or 

| *> = | l)c 1 + I 2)C 2 . (8.44) 


Now the interesting thing is that if the molecule is known to be in some state 
at some instant, it will not be in the same state a little while later. The two 
C-coefficients will be changing with time according to the equations (8.43) — which 
hold for any two-state system. Suppose, for example, that you had made some 
observation — or had made some selection of the molecules — so that you know 
that the molecule is initially in the state | 7). At some later time, there is some 
chance that it will be found in state | 2). To find out what this chance is, we have 
to solve the differential equation which tells us how the amplitudes change with time. 
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The only trouble is that we don’t know what to use for the coefficients in 
Eq. (8.43). There are some things we can say, however. Suppose that once the 
molecule was in the state [ 7) there was no chance that it could ever get into 
| 2), and vice versa. Then 7/ 12 and H 2 1 would both be zero, and Eq. (8.43) 
would read 

ih ^ = HnC u ih = H 22 C 2 . 

We can easily solve these two equations ; we get 

Ci = (const)e“ Cl/#0/rili , C 2 = (const)e~ itin)H ^K (8.45) 

These are just the amplitudes for stationary states with the energies Ei — Hu 
and E 2 = H 2 2 . We note, however, that for the ammonia molecule the two states 
| 7) and | 2) have a definite symmetry. If nature is at all reasonable, the matrix 
elements Hu and H 22 must be equal. We’ll call them both E 0 , because they 
correspond to the energy the states would have if H i2 and H 2i were zero. But 
Eqs. (8.45) do not tell us what ammonia really does. It turns out that it is possible 
for the nitrogen to push its way through the three hydrogens and flip to the other 
side. It is quite difficult; to get half-way through requires a lot of energy. How 
can it get through if it hasn’t got enough energy? There is some amplitude that it 
will penetrate the energy barrier. It is possible in quantum mechanics to sneak 
quickly across a region which is illegal energetically. There is, therefore, some 
small amplitude that a molecule which starts in | 7) will get to the state | 2). The 
coefficients H 12 and H 2X are not really zero. Again, by symmetry, they should 
both be the same — at least in magnitude. In fact, we already know that, in general, 
Hu must be equal to the complex conjugate of H nj so they can differ only by a 
phase. It turns out, as you will see, that there is no loss of generality if we take 
them equal to each other. For later convenience we set them equal to a negative 
number; we take H i2 = H 21 = —A. We then have the following pair of 
equations: 

ih ~ = E 0 Ci - AC 2 , (8.46) 

at 

ih = E q C 2 - ACl (8.47) 


These equations are simple enough and can be solved in any number of ways. 
One convenient way is the following. Taking the sum of the two, we get 

ih J t (Cl + C 2 ) = (Eo - A)(C, + C 2 ), 

whose solution is 

Ci + C 2 = ae~ HimE *~ AU . (8.48) 


Then, taking the difference of (8.46) and (8.47), we find that 


which gives 


ih J t (Cj - C 2 ) = (E 0 + A)(Cj - C 2 ), 
Ci - C 2 = be~ iiimE o +AU . 


(8.49) 


We have called the two integration constants a and b ; they are, of course, to be 
chosen to give the appropriate starting condition for any particular physical 
problem. Now, by adding and subtracting (8.48) and (8.49), we get Ci and C 2 - 


Ci(0 = 

c 2 (t) = 


? < */*)(£<)—■ I 

2 _r 

^ A)t __ 

2 


2 e 

2 


(8.50) 

(8.51) 


They are the same except for the sign of the second term. 
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We have the solutions; now what do they mean? (The trouble with quantum 
mechanics is not only in solving the equations but in understanding what the 
solutions mean!) First, notice that if b = 0, both terms have the same frequency 
co = (E 0 — A)/h. If everything changes at one frequency, it means that the system 
is in a state of definite energy — here, the energy (E 0 — A). So there is a stationary 
state of this energy in which the two amplitudes C x and C 2 are equal. We get the 
result that the ammonia molecule has a definite energy ( E 0 — A) if there are equal 
amplitudes for the nitrogen atom to be “up” and to be “down.” 

There is another stationary state possible if a — 0; both amplitudes then have 
the frequency (E 0 + A)/h. So there is another state with the definite energy 
(E 0 + A) if the two amplitudes are equal but with the opposite sign ; C 2 = — C i. 
These are the only two states of definite energy. We will discuss the states of the 
ammonia molecule in more detail in the next chapter; we will mention here only a 
couple of things. 

We conclude that because there is some chance that the nitrogen atom can 
flip from one position to the other, the energy of the molecule is not just E 0 , as we 
would have expected, but that there are two energy levels (£0 + A) and (E 0 — A). 
Every one of the possible states of the molecule, whatever energy it has, is “split” 
into two levels. We say every one of the states because, you remember, we picked 
out one particular state of rotation, and internal energy, and so on. For each 
possible condition of that kind there is a doublet of energy levels because of the 
flip-flop of the molecule. 

Let’s now ask the following question about an ammonia molecule. Suppose 
that at / = 0, we know that a molecule is in the state | 1) or, in other words, that 
Ci(0) = 1 and C 2 ( 0) = 0. What is the probability that the molecule will be found 
in the state | 2) at the time t, or will still be found in state | 1) at the time tl Our 
starting condition tells us what a and b are in Eqs. (8.50) and (8.51). Letting 
t = 0, we have that 

cm = a -^± = 1, c 2 (0) = = 0. 


Clearly, a = b - 1. Putting these values into the formulas for Ci(r) and C,(t) 
and rearranging some terms, we have 


C,(0 = 


C 2 (0 = 


f£ +_£ J , 

e -Hlh)E 0 t I e ~2~ ) ' 


We can rewrite these as 


C t (0 = cos ~ , (8.52) 

C 2 (t) = o* sin ~ ■ (8.53) 


The two amplitudes have a magnitude that varies harmonically with time. 

The probability that the molecule is found in state | 2) at the time t is the 
absolute square of C 2 (t ): 

|C 2 (/)| 2 = sin 2 -- (8.54) 

The probability starts at zero (as it should), rises to one, and then oscillates back and 
forth between zero and one, as shown in the curve marked P 2 of Fig. 8-2. The 
probability of being in the | 1) state does not, of course, stay at one. It “dumps” 
into the second state until the probability of finding the molecule in the first state 
is zero, as shown by the curve Pi of Fig. 8-2. The probability sloshes back and 
forth between the two. 

A long time ago we saw what happens when we have two equal pendulums 
with a slight coupling. (See Chapter 49, Vol. I.) When we lift one back and let go, 
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Fig. 8-2. The probability P } that 
an ammonia molecule in state |7) at 
t ~ 0 will be found in state 1 1 ) at f. The 
probability P 2 that it will be found in 
state 2). 



it swings, but then gradually the other one starts to swing. Pretty soon the second 
pendulum has picked up all the energy. Then, the process reverses, and pendulum 
number one picks up the energy. It is exactly the same kind of a thing. The speed 
at which the energy is swapped back and forth depends on the coupling between 
the two pendulums— the rate at which the “oscillation” is able to leak across. 
Also, you remember, with the two pendulums there are two special motions— each 
with a definite frequency — which we call the fundamental modes. If we pull both 
pendulums out together, they swing together at one frequency. On the other hand, 
if we pull one out one way and the other out the other way, there is another sta- 
tionary mode also at a definite frequency. 

Well, here we have a similar situation — the ammonia molecule is mathe- 
matically like the pair of pendulums. These are the two frequencies— (£ 0 + A)/h 
and (E 0 — A)/h — for when they are oscillating together, or oscillating opposite. 

The pendulum analogy is not much deeper than the principle that the same 
equations have the same solutions. The linear equations for the amplitudes (8.39) 
are very much like the linear equations of harmonic oscillators. (In fact, this is 
the reason behind the success of our classical theory of the index of refraction, in 
which we replaced the quantum mechanical atom by a harmonic oscillator, even 
though, classically, this is not a reasonable view of electrons circulating about a 
nucleus.) If you pull the nitrogen to one side, then you get a superposition of 
these two frequencies, and you get a kind of beat note, because the system is not 
in one or the other states of definite frequency. The splitting of the energy levels 
of the ammonia molecule is, however, strictly a quantum mechanical effect. 

The splitting of the energy levels of the ammonia molecule has important 
practical applications which we will describe in the next chapter. At long last we 
have an example of a practical physical problem that you can understand with the 
quantum mechanics! 
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9 


The Ammonia Maser 


9-1 The states of an ammonia molecule 


In this chapter we are going to discuss the application of quantum mechanics 
to a practical device, the ammonia maser. You may wonder why we stop our 
formal development of quantum mechanics to do a special problem, but you will 
find that many of the features of this special problem are quite common in the 
general theory of quantum mechanics, and you will learn a great deal by considering 
this one problem in detail. The ammonia maser is a device for generating electro- 
magnetic waves, whose operation is based on the properties of the ammonia 
molecule which we discussed briefly in the last chapter. We begin by summarizing 
what we found there. 

The ammonia molecule has many states, but we are considering it as a two- 
state system, thinking now only about what happens when the molecule is in any 
specific state of rotation or translation. A physical model for the two states can 
be visualized as follows. If the ammonia molecule is considered to be rotating 
about an axis passing through the nitrogen atom and perpendicular to the plane 
of the hydrogen atoms, as shown in Fig. 9-1, there are still two possible conditions 
— the nitrogen may be on one side of the plane of hydrogen atoms or on the other. 
We call these two states | 1) and | 2). They are taken as a set of base states for our 
analysis of the behavior of the ammonia molecule. 


9-1 The states of an ammonia 
molecule 

9-2 The molecule in a static 
electric field 

9-3 Transitions in a time-dependent 
field 

9-4 Transitions at resonance 

9-5 Transitions olf resonance 

9-6 The absorption of light 


MASER = Microwave Amplification 
by Stimulated Emission of Radiation 



Fig. 9-1. A physical model of two 
base states for the ammonia molecule. 
These states have the electric dipole 
moments fi. 


In a system with two base states, any state | \p) of the system can always 
be described as a linear combination of the two base states; that is, there is a 
certain amplitude C x to be in one base state and an amplitude C 2 to be in the 
other. We can write its state vector as 

l*> = \l)Ci + \2)C 2 , (9.1) 

where 

Cx = (1 | *> and C 2 = <2 | *>. 

These two amplitudes change with time according to the Efamiltonian equa- 
tions, Eq. (8.43). Making use of the symmetry of the two states of the ammonia 
molecule, we set Hu = H 2 2 = E 0 , and H 12 — Ff 2 i = — A, and get the 
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solution [see Eqs. (8.50) and (8.51)] 


r — Cl -{HK){E 0 -A)l , b —{ ilh)(E 0 +A)t 
W — ^ € -T ^ C , 


r — a a — (ilfi)(Eo—A)t b -d!fi)(Eo+A)t 

c 2 - 2 e - 2 e 


(9.2) 

(9.3) 


We want now to take a closer look at these general solutions. Suppose that 
the molecule was initially put into a state 1 1 pjf) for which the coefficient b was equal 
to zero. Then at t = 0 the amplitudes to be in the states | 1) and | 2) are identical, 
and they stay that way for all time . Their phases both vary with time in the same 
way — with the frequency (E 0 — A)/h. Similarly, if we were to put the molecule 
into a state | \pi) for which a — 0, the amplitude C 2 is the negative of Ci, and this 
relationship would stay that way forever. Both amplitudes would now vary with 
time with the frequency (E 0 + A)/h. These are the only two possibilities of states 
for which the relation between C i and C 2 is independent of time. 

We have found two special solutions in which the two amplitudes do not vary 
in magnitude and, furthermore, have phases which vary at the same frequencies. 
These are stationary states as we defined them in Section 7-1, which means that 
they are states of definite energy. The state | i pu) has the energy Ejj = E () — A, 
and the state \\pi) has the energy Ei = E 0 + A. They are the only two stationary 
states that exist, so we find that the molecule has two energy levels, with the energy 
difference 2 A. (We mean, of course, two energy levels for the assumed state of 
rotation and vibration which we referred to in our initial assumptions.)! 

If we hadn’t allowed for the possibility of the nitrogen flipping back and forth, 
we would have taken A equal to zero and the two energy levels would be on top of 
each other at energy E 0 . The actual levels are not this way; their average energy 
is E (h but they are split apart by =*=A, giving a separation of 2 A between the energies 
of the two states. Since A is, in fact, very small, the difference in energy is also 
very small. 

In order to excite an electron inside an atom, the energies involved are rela- 
tively very high — requiring photons in the optical or ultraviolet range. To excite 
the vibrations of the molecules involves photons in the infrared. If you talk about 
exciting rotations , the energy differences of the states correspond to photons in 
the far infrared. But the energy difference 2 A is lower than any of those and is, in 
fact, below the infrared and well into the microwave region. Experimentally, it 
has been found that there is a pair of energy levels with a separation of 1CP 4 
electron volt — corresponding to a frequency 24,000 megacycles. Evidently this 
means that 2 A = hj\ with / = 24,000 megacycles (corresponding to a wavelength 
of 1^ cm). So here we have a molecule that has a transition which does not emit 
light in the ordinary sense, but emits microwaves. 

For the work that follows we need to describe these two states of definite 
energy a little bit better. Suppose we were to construct an amplitude Cu by taking 
the sum of the two numbers C i and C 2 : 

C TI = + C 2 = <7 | $> + {2 | *>. (9.4) 

What would that mean? Well, this is just the amplitude to find the state | T) in a 
new state [ II) in which the amplitudes of the original base states are equal. That 
is, writing Cu = (11 | 4>), we can abstract the | <t>) away from Eq. (9.4) — because 
it is true for any <f> — and get 

(II I = (I \ + (2 I, 

which means the same as 

| //> = | 7} + I 2). (9.5) 


t In what follows it is helpful — in reading to yourself or in talking to someone else — to 
have a handy way of distinguishing between the Arabic 1 and 2 and the Roman I and II. 
We find it convenient to reserve the names “one” and “two” for the Arabic numbers, and 
to call I and II by the names “eins” and “zwei” (although “unus” and “duo” might be 
more logical!). 


9-2 



The amplitude for the state | II) to be in the state | 7) is 
(1 | II) = <7 | 7) + (7 | 2>, 

which is, of course, just 1, since | 1) and | 2) are base states. The amplitude for 
the state | II) to be in the state | 2) is also 1, so the state | II) is one which has equal 
amplitudes to be in the two base states | 1) and | 2). 

We are, however, in a bit of trouble. The state | II) has a total probability 
greater than one of being in some base state or other . That simply means, however, 
that the state vector is not properly “normalized.” We can take care of that by 
remembering that we should have (II | II) = 1, which must be so for any state. 
Using the general relation that 

<*i*> = E <x i o</ 1 *>, 


letting both $ and X be the state //, and taking the sum over the base states | 1) 
and | 2), we get that 

(II\II) = (II\1)(1\II) + (II | 2)(2 | II). 

This will be equal to one as it should if we change our definition of Cu — in Eq. 
(9.4) — to read 

Cu = [Ci + C 2 ], 
y/2 


In the same way we can construct an amplitude 

Cl = 4= [Cx - C 2 ], 
V2 


Cl = [(! 1$) - (2\ *>]. 

V2 


(9.6) 


This amplitude is the projection of the state | 3>) into a new state 1 1) which has 
opposite amplitudes to be in the states | 1) and | 2). Namely, Eq. (9.6) means 
the same as 

1 


</i 


V2 


W I - II 


or 


from which it follows that 


I) = -^[|/> - | 2 )], 

V2 


(9.7) 


</|/> = -^= — {2 \ I) ■ 

V2 


Now the reason we have done all this is that the states 1 1) and | II) can be 
taken as a new set of base states which are especially convenient for describing the 
stationary states of the ammonia molecule. You remember that the requirement 
for a set of base states is that 

(* i j) = & a- 

We have already fixed things so that 

</i/> = (ii\n) = i. 

You can easily show from Eqs. (9.5) and (9.7) that 

(I\II) = <//|/> = 0. 

The amplitudes C7 = (I \ $} and Cu = (II \ <£) for any state 4> to be in our 
new base states | I) and | II) must also satisfy a Hamiltonian equation with the 
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form of Eq. (8.39). In fact, if we just subtract the two equations (9.2) and (9.3) 
and differentiate with respect to t, we see that 

ih ^ = (£ 0 + A)C Z = EjCl (9.8) 

And taking the sum of Eqs. (9.2) and (9.3), we see that 

ih = (E 0 - A)C U = EuCu. (9.9) 

Using [ I) and | II) for base states, the Hamiltonian matrix has the simple form 

H i ,i ~ E If Hijj = 0, 

Hi , ii = 0, Hjjjj = Eij. 

Note that each of the Eqs. (9.8) and (9.9) look just like what we had in Section 
8-6 for the equation of a one-state system. They have a simple exponential time 
dependence corresponding to a single energy. As time goes on, the amplitudes to 
be in each state act independently. 

The two stationary states | xfa) and | \pjj) we found above are, of course, 
solutions of Eqs. (9.8) and (9.9). The state | \pi) (for which C x = — C 2 ) has 

Cj = e -m)(E 0 +A)t 9 Cn = 0 ( 9 . 10 ) 

And the state 1 1 pu) (for which C x = C 2 ) has 

c 7 = o, Ci i = (9.ii) 

Remember that the amplitudes in Eq. (9.10) are 

Ci = (I | ypj), and Cn = {II\\pj)\ 
so Eq. (9.10) means the same thing as 

|^) = 1 1) 

That is, the state vector of the stationary state | ^/) is the same as the state vector 
of the base state 1 1) except for the exponential factor appropriate to the energy of 
the state. In fact at t = 0 

I *i) = I />; 

the state 1 1) has the same physical configuration as the stationary state of energy 
E 0 + A. In the same way, we have for the second stationary state that 

|*//> = \n) e ~ iiimE ^ A) K 

The state [ II) is just the stationary state of energy E 0 — A at / = 0. Thus our 
two new base states 1 1) and | II) have physically the form of the states of definite 
energy, with the exponential time factor taken out so that they can be time- 
independent base states. (In what follows we will find it convenient not to have 
to distinguish always between the stationary states | \pj) and 1 1 pn) and their base 
states 1 1) and | II), since they differ only by the obvious time factors.) 

In summary, the state vectors 1 1) and | II) are a pair of base vectors which 
are appropriate for describing the definite energy states of the ammonia molecule. 
They are related to our original base vectors by 

! /) = 4z [| 1) - | 2)1 I //) = -L [| /) + 1 2)]. (9.12) 

V2 V2 

The amplitudes to be in 1 1) and | II) are related to C i and C 2 by 

Ci = 4= [C, - C 2 ], Cn = ~ [C, + c 2 ], (9.13) 

\/2 \/2 
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Any state at all can be represented by a linear combination of [ 7) and | 2) — with 
the coefficients C i and C 2 — or by a linear combination of the definite energy base 
states | I) and | II) — with the coefficients Cj and C//. Thus, 

I *> = I l)Ci + | 2>C 2 

or 

I *> = I I)C Z + | II)C n . 

The second form gives us the amplitudes for finding the state | $} in a state with 
the energy Ej ~ + A or in a state with the energy Eu = E 0 — A. 


9-2 The molecule in a static electric field 

If the ammonia molecule is in either of the two states of definite energy and we 
disturb it at a frequency co such that hco = Ej ~ E n = 2A, the system may make 
a transition from one state to the other. Or, if it is in the upper state, it may change 
to the lower state and emit a photon. But in order to induce such transitions you 
must have a physical connection to the states — some way of disturbing the system. 
There must be some external machinery for affecting the states, such as magnetic 
or electric fields. In this particular case, these states are sensitive to an electric 
field. We will, therefore, look next at the problem of the behavior of the ammonia 
molecule in an external electric field. 

To discuss the behavior in an electric field, we will go back to the original 
base system | 7) and | 2), rather than using 1 1) and | II). Suppose that there is an 
electric field in a direction perpendicular to the plane of the hydrogen atoms. 
Disregarding for the moment the possibility of flipping back and forth, would it be 
true that the energy of this molecule is the same for the two positions of the nitrogen 
atom? Generally, no. The electrons tend to lie closer to the nitrogen than to the 
hydrogen nuclei, so the hydrogens are slightly positive. The actual amount 
depends on the details of electron distribution. It is a complicated problem to 
figure out exactly what this distribution is, but in any case the net result is that the 
ammonia molecule has an electric dipole moment, as indicated in Fig. 9-1. We 
can continue our analysis without knowing in detail the direction or amount of 
displacement of the charge. However, to be consistent with the notation of others, 
let’s suppose that the electric dipole moment is jul, with its direction point from 
the nitrogen atom and perpendicular to the plane of the hydrogen atoms. 

Now, when the nitrogen flips from one side to the other, the center of mass 
will not move, but the electric dipole moment will flip over. As a result of this 
moment, the energy in an electric field £ will depend on the molecular orientation. f 
With the assumption made above, the potential energy will be higher if the nitrogen 
atom points in the direction of the field, and lower if it is in the opposite direction; 
the separation in the two energies will be 2 ju£. 

In the discussion up to this point, we have assumed values of E 0 and A without 
knowing how to calculate them. According to the correct physical theory, it 
should be possible to calculate these constants in terms of the positions and 
motions of all the nuclei and electrons. But nobody has ever done it. Such a 
system involves ten electrons and four nuclei and that’s just too complicated a 
problem. As a matter* of fact, there is no one who knows much more about this 
molecule than we do. All anyone can say is that when there is an electric field, 
the energy of the two states is different, the difference being proportional to the 
electric field. We have called the coefficient of proportionality 2ju, but its value 
must be determined experimentally. We can also say that the molecule has the 
amplitude A to flip over, but this will have to be measured experimentally. Nobody 
can give us accurate theoretical values of p and A, because the calculations are 
too complicated to do in detail. 


f We are sorry that we have to introduce a new notation. Since we have been using 
p and E for momentum and energy, we don’t want to use them again for dipole moment 
and electric field. Remember, in this section p is the electric dipole moment. 
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For the ammonia molecule in an electric field, our description must be 
changed. If we ignored the amplitude for the molecule to flip from one configura- 
tion to the other, we would expect the energies of the two states | 1) and | 2) to be 
(E q ± juS). Following the procedure of the last chapter, we take 

H\i = Eq + ^22 “ £o “ MS. (9.14) 

Also we will assume that for the electric fields of interest the field does not affect 
appreciably the geometry of the molecule and, therefore, does not affect the 
amplitude that the nitrogen will jump from one position to the other. We can 
then take that H 12 and H 2 1 are not changed; so 

H 12 = H 2\ = -A. (9.15) 

We must now solve the Hamiltonian equations, Eq, (8.43), with these new values 
of Hij. We could solve them just as we did before, but since we are going to have 
several occasions to want the solutions for two-state systems, let’s solve the equa- 
tions once and for all in the general case of arbitrary — assuming only that they 
do not change with time. 

We want the general solution of the pair of Hamiltonian equations 

ih = tfuCj + i/ 12 C 2 , (9.16) 

ih^ = tf 21 C, + H 22 C 2 . (9.17) 

Since these are linear differential equations with constant coefficients, we can always 
find solutions which are exponential functions of the dependent variable /. We 
will first look for a solution in which C i and C 2 both have the same time depen- 
dence; we can use the trial functions 

i = a x e , c 2 — 

Since such a solution corresponds to a state of energy E ~ we may as well write 

right away 

Ci = ai e^ ilh)E \ (9.18) 

C 2 - (9.19) 

where E is as yet unknown and to be determined so that the differential equations 
(9.16) and (9.17) are satisfied. 

When we substitute C i and C 2 from (9.18) and (9.19) in the differential 
equations (9.16) and (9.17), the derivatives give us just ~iE/h times C x or C 2 , 
so the left sides become just EC\ and EC 2 . Cancelling the common exponential 
factors, we get 


Ea x = H\ia\ + H X2 a 2i Ea 2 — H 2 ia x + H 22 a 2 . 

Or, rearranging the terms, we have 

(E - Hn) ai - H l2 a 2 - 0, (9.20) 

~H 2 ia x + (E - H 22 )a 2 = 0. (9.21) 


With such a set of homogeneous algebraic equations, there will be nonzero solu- 
tions for a i and a 2 only if the determinant of the coefficients of a x and a 2 is zero, 
that is, if 


E - Hu 
Det 

\ - H 2 1 


H 12 
E - H 2 2 


- 0 . 


(9.22) 
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However, when there are only two equations and two unknowns, we don’t 
need such a sophisticated idea. The two equations (9.20) and (9.21) each give 
a ratio for the two coefficients a x and a 2 , and these two ratios must be equal. 
From (9.20) we have that 


and from (9.21) that 


a i H ! 2 

^ “ E — Hu ’ 


(9.23) 


£i = E - H 22 
«2 Ef 2l 


(9.24) 


Equating these two ratios, we get that E must satisfy 


(E - H 11 )(E - H 22 ) - H 12 H 21 = 0. 

This is the same result we would get by solving Eq. (9.22). Either way, we have 
a quadratic equation for E which has two solutions: 


E - 


H\\ + H 22 
2 


(H 11 - n 22 y 
4 “ h 


H\2H 2 \. 


(9.25) 


There are two possible values for the energy E. Note that both solutions give 
real numbers for the energy, because H lx and H 22 are real, and H l2 H 2l is equal 
to H 12 H* 2 = \Hi 2 | 2 , which is both real and positive. 

Using the same convention we took before, we will call the upper energy 
Ei and the lower energy E u . We have 


£i = + + Hi2H2u 


P Hu + H 2 
tn = ~~ 


s -4 


(Hu - H 22 y 


+ H l2 H 2 


(9.26) 

(9.27) 


Using each of these two energies separately in Eqs. (9.18) and (9.19), we have 
the amplitudes for the two stationary states (the states of definite energy). If there 
are no external disturbances, a system initially in one of these states will stay that 
way forever — only its phase changes. 

We can check our results for two special cases. If H l2 — H 2l = 0, we have 
that Ej ~ Hu and Eu = H 22 . This is certainly correct, because then Eqs. 
(9.16) and (9.17) are uncoupled, and each represents a state of energy Hu and 
H 2 2 ’ Next, if we set Hu = H 22 ~ E Q and H 2X = H 12 — —T, we get the 
solution we found before: 


Ei = Eq + A and Eu = E {) ~ A. 


For the genera] case, the two solutions Ej and Eu refer to two states — which 
we can again call the states 


h h) = I and I yPu) = I II)e~ H!h)E n\ 


These states will have C A and C 2 as given in Eqs. (9.18) and (9,19), where a x 
and a 2 are still to be determined. Their ratio is given by either Eq. (9.23) or 
Eq. (9.24). They must also satisfy one more condition. If the system is known to 
be in one of the stationary states, the sum of the probabilities that it will be found 
in | 7) or | 2) must equal one. We must have that 


or, equivalently, 


IC ^ 2 + |C 2 | 2 = 1 , 

Kl 2 + N 2 = i. 


(9.28) 

(9.29) 


These conditions do not uniquely specify a x and a 2 ; they are still undetermined 
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E 



by an arbitrary phase — in other words, by a factor like e tb . Although general 
solutions for the as can be written down,f it is usually more convenient to work 
them out for each special case. 

Let’s go back now to our particular example of the ammonia molecule in an 
electric field. Using the values for Hu, H 2 2 , and H 12 given in (9.14) and (9.15), 
we get for the energies of the two stationary states 

Ej = £0 + VW+1&&, Eji — E 0 — \A4 2 + (9.30) 

These two energies are plotted as a function of the electric field strength & in Fig. 
9-2. When the electric field is zero, the two energies are, of course, just E 0 =»= A. 
When an electric field is applied, the splitting between the two levels increases. 
The splitting increases at first slowly with 8, but eventually becomes proportional 
to 8. (The curve is a hyperbola.) For enormously strong fields, the energies are just 

Ei — E 0 -{- jx S = H 11 , En — Eq /x£ = H 22 - (9.31) 

The fact that there is an amplitude for the nitrogen to flip back and forth has little 
effect when the two positions have very different energies. This is an interesting point 
which we will come back to again later. 

We are at last ready to understand the operation of the ammonia maser. 
The idea is the following. First, we find a way of separating molecules in the 
state | I) from those in the state | II). t Then the molecules in the higher energy state 
| I) are passed through a cavity which has a resonant frequency of 24,000 mega- 
cycles. The molecules can deliver energy to the cavity — in a way we will discuss 
later — and leave the cavity in the state | II). Each molecule that makes such a 
transition will deliver the energy E — E { — E u to the cavity. The energy from 
the molecules will appear as electrical energy in the cavity. 

How can we separate the two molecular states? One method is as follows. 
The ammonia gas is let out of a little jet and passed through a pair of slits to 
give a narrow beam, as shown in Fig. 9-3. The beam is then set through a 


f For example, the following set is one acceptable solution, as you can easily verify: 

H 12 _ E - H n 

611 [(£ — H\\) 2 4- H 1 2 H 2 1 ] 1 2 [(£ — Hu) 2 + H L2tf2i] ir2 

f From now on we will write |/) and [ //) instead of |^/) and \^u)> You must remember 
that the actual states \tr) and | far) are the energy base states multiplied by the appro- 
priate exponential factor. 
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region in which there is a large transverse electric field. The electrodes to produce 
the field are shaped so that the electric field varies rapidly across the beam. Then 
the square of the electric field £ * £ will have a large gradient perpendicular to the 
beam. Now a molecule in state | /) has an energy which increases with S 2 , and 
therefore this part of the beam will be deflected toward the region of lower S 2 . 
A molecule in state | II) will, on the other hand, be deflected toward the region 
of larger S 2 , since its energy decreases as S 2 increases. 

Incidentally, with the electric fields which can be generated in the laboratory, 
the energy jiiS is always much smaller than A. In such cases, the square root in 
Eqs. (9.30) can be approximated by 

A (' + i e w)- <9 ' 32) 

So the energy levels are, for all practical purposes, 

2 2 

Ejt = E 0 + A + (9.33) 

and 

En = E 0 - A - ^ • (9.34) 



Fig. 9-3. The ammonia beam may 
be separated by an electric field in 
which £ 2 has a gradient perpendicular to 
the beam. 


And the energies vary approximately linearly with £ 2 . The force on the molecules 
is then 


F = 



(9.35) 


Many molecules have an energy in an electric field which is proportional to S 2 . 
The coefficient is the polarizability of the molecule. Ammonia has an unusually 
high polarizability because of the small value of A in the denominator. Thus, 
ammonia molecules are unusually sensitive to an electric field. (What would you 
expect for the dielectric coefficient of NH 3 gas?) 



Fig. 9—4. Schematic diagram of the 
ammonia maser. 


9-3 Transitions in a time-dependent field 

In the ammonia maser, the beam with molecules in the state | I) and with the 
energy Ej is sent through a resonant cavity, as shown in Fig. 9-4. The other beam 
is discarded. Inside the cavity, there will be a time-varying electric field, so the 
next problem we must discuss is the behavior of a molecule in an electric field that 
varies with time. We have a completely different kind of a problem — one with a 
time-varying Hamiltonian. Since depends upon £, the H { j vary with time, and 
we must determine the behavior of the system in this circumstance. 

To begin with, we write down the equations to be solved: 


ih *£l= (Eo + M S)C X - ac 2 , 
ih ^ = -AC, + (E 0 - M S )C 2 . 


(9.36) 
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To be definite, let’s suppose that the electric field varies sinusoidally; then we can 
write 

8 = 2S 0 cos cot = & 0 (e io>t + e~ io,t ). (9.37) 

In actual operation the frequency to will be very nearly equal to the resonant fre- 
quency of the molecular transition to 0 = 2 A/h, but for the time being we want 
to keep things general, so we’ll let it have any value at all. The best way to solve 
our equations is to form linear combinations of C i and C 2 as we did before. So 
we add the two equations, divide by the square root of 2, and use the definitions 
of Ci and Cu that we had in Eq. (9.13). We get 

ih ^ = (E 0 - A)C IT + M S Cl (9.38) 

You’ll note that this is the same as Eq. (9.9) with an extra term due to the electric 
field. Similarly, if we subtract the two equations (9.36), we get 

ih ~ (Eq + A)Cj + /x8C//. (9.39) 


Now the question is, how to solve these equations? They are more difficult 
than our earlier set, because 8 depends on t ; and, in fact, for a general 8(/) the 
solution is not expressible in elementary functions. However, we can get a good 
approximation so long as the electric field is small. First we will write 


_ y je ~ i(E 0 +A)tlft _ y ie -i(Ej)t/K> 
Cjx — y __ y n ) tin 


(9.40) 


If there were no electric field, these solutions would be correct with and y u 
just chosen as two complex constants. In fact, since the probability of being in 
state 1 1) is the absolute square of C7 and the probability of being in state | II) is the 
absolute square of Cu, the probability of being in state 1 1) or in state | II) is 
just | 7/| 2 or lT//| 2 . For instance, if the system were to start originally in state | II) 
so that 7 ; was zero and | 7//| 2 was one, this condition would go on forever. There 
would be no chance, if the molecule were originally in state | II), ever to get 
into state | I). 

Now the idea of writing our equations in the form of Eq. (9.40) is that if 
pt8 is small in comparison with A , the solutions can still be written in this way, but 
then 7 1 and 7// become slowly varying functions of time — where by “slowly 
varying” we mean slowly in comparison with the exponential functions. That is 
the trick. We use the fact that 7 7 and In vary slowly to get an approximate 
solution. 

We want now to substitute Cj from (9.40) in the differential equation (9.39), 
but we must remember that 7 / is also a function of /. We have 

ih ^7 = E I 1 I e~ iEim + ih d l’ 
at at 

The differential equation becomes 

( EjJj + ih ^ e ~ iilii)Elt = (9.41) 

Similarly, the equation in dCn/dt becomes 

{E u y n + ih e~ {ilh)Ellt = Euyjje-^*" 1 + nZy^-^"' 1 . (9.42) 

Now you will notice that we have equal terms on both sides of each equation. We 
cancel these terms, and we also multiply the first equation by e + %E i l i ri and the 
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second by e+^u 1 ^, 
finally, 


Remembering that (Ej — En) = 2 A = ho) 0 , we have 
ih ^ 

f (9.43) 

ih ^ = MS(/)e- , '“°'7 / . 


Now we have an apparently simple pair of equations — and they are still exact, 
of course. The derivative of one variable is a function of time multiplied 

by the second variable; the derivative of the second is a similar time function, 
multiplied by the first. Although these simple equations cannot be solved in general, 
we will solve them for some special cases. 

We are, for the moment at least, interested only in the case of an oscillating 
electric field. Taking £(/) as given in Eq. (9.37), we find that the equations for 
7/ and 7 // become 


ih ^ - v£> 0 [e i(0> +“ o)t + //, 

ih^f- = juS 0 [e I< “ - “ t ’ ) * + e“ I( “ + “ 0) ']Y J . 


(9.44) 


Now if £ o is sufficiently small, the rates of change of 7j and 7 n are also small. 
The two 7’s will not vary much with /, especially in comparison with the rapid 
variations due to the exponential terms. These exponential terms have real and 
imaginary parts that oscillate at the frequency a> + gj 0 or co — aj 0 - The terms with 
co + too oscillate very rapidly about an average value of zero and, therefore, do not 
contribute very much on the average to the rate of change of 7. So we can make a 
reasonably good approximation by replacing these terms by their average value, 
namely, zero. We will just leave them out, and take as our approximation: 


ih a -jL = M Soc- <(w ^ 0)fc y/j, 

ih ^ = /xS 0 e £( “ _ ‘ Jo)i 7/. 


(9.45) 


Even the remaining terms, with exponents proportional to (a; — a> 0 ), will also 
vary rapidly unless co is near cj 0 . Only then will the right-hand side vary slowly 
enough that any appreciable amount will accumulate when we integrate the 
equations with respect to i. In other words, with a weak electric field the only 
significant frequencies are those near co 0 . 

With the approximation made in getting Eq. (9.45), the equations can be 
solved exactly, but the work is a little elaborate, so we won’t do that until later when 
we take up another problem of the same type. Now we’ll just solve them ap- 
proximately — or rather, we’ll find an exact solution for the case of perfect reso- 
nance, co — w o, and an approximate solution for frequencies near resonance. 


9-4 Transitions at resonance 


Let’s take the case of perfect resonance first. If we take co = co 0 , the expo- 
nentials are equal to one in both equations of (9.45), and we have just 

dlj W&o v din W&o ^ (Q 

~di = ~-JT T//) ~dt~ h~ Jl ' (9 ' 46) 


If we eliminate first ij and then in from these equations, we find that each satisfies 
the differential equation of simple harmonic motion: 


d 2 1 
dt* 



(9.47) 


The general solutions for these equations can be made up of sines and cosines. 
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As you can easily verify, the following equations are a solution: 

r, -.«(*). + »*(«?)* (9 48) 

7 a = ib cos t — ia sin U 

where a and b are constants to be determined to fit any particular physical situation. 

For instance, suppose that at t = 0 our molecular system was in the upper 
energy state | /), which would require — from Eq. (9.40) — that 7; = land in = 0 
at t = 0. For this situation we would need a = 1 and b = 0. The probability 
that the molecule is in the state | /) at some later / is the absolute square of 7/, or 


P, = |7 7 | 2 = cos 2 (^) t. 


(9.49) 


Similarly, the probability that the molecule will be in the state | //) is given by the 
absolute square of 7/j, 



So long as 8 is small and we are on resonance, the probabilities are given by simple 
oscillating functions. The probability to be in state 1 I) falls from one to zero and 
back again, while the probability to be in the state | II) rises from zero to one and 
back. The time variation of the two probabilities is shown in Fig. 9-5. Needless 
to say, the sum of the two probabilities is always equal to one; the molecule is 
always in some state! 


Fig. 9-5. Probabilities for the two 
states of the ammonia molecule in a 
sinusoidal electric field. 



Let’s suppose that it takes the molecule the time T to go through the cavity. 
If we make the cavity just long enough so that fi& 0 T/h = 7r/2, then a molecule 
which enters in state 1 1) will certainly leave it in state | II). If it enters the cavity 
in the upper state, it will leave the cavity in the lower state. In other words, its 
energy is decreased, and the loss of energy can’t go anywhere else but into the 
machinery which generates the field. The details by which you can see how the 
energy of the molecule is fed into the oscillations of the cavity are not simple, 
however, we don’t need to study these details, because we can use the principle 
of conservation of energy. (We could study them if we had to, but then we would 
have to deal with the quantum mechanics of the field in the cavity in addition to 
the quantum mechanics of the atom.) 

In summary: the molecule enters the cavity, the cavity field oscillating at 
exactly the right frequency — induces transitions from the upper to the lower state, 
and the energy released is fed into the oscillating field. In an operating maser 
the molecules deliver enough energy to maintain the cavity oscillations not only 
providing enough power to make up for the cavity losses but even providing small 
amounts of excess power that can be drawn from the cavity. Thus, the molecular 
energy is converted into the energy of an external electromagnetic field. 
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Remember that before the beam enters the cavity, we have to use a filter 
which separates the beam so that only the upper state enters. It is easy to demon- 
strate that if you were to start with molecules in the lower state, the process will go 
the other way and take energy out of the cavity. If you put the unfiltered beam in, 
as many molecules are taking energy out as are putting energy in, so nothing much 
would happen. In actual operation it isn’t necessary, of course, to make (n&oT/h) 
exactly ir/2. For any other value (except an exact integral multiple of 7 r), there is 
some probability for transitions from state | /) to state | II). For other values, 
however, the device isn’t 100 percent efficient; many of the molecules which leave 
the cavity could have delivered some energy to the cavity but didn’t. 

In actual use, the velocity of all the molecules is not the same; they have some 
kind of Maxwell distribution. This means that the ideal periods of time for 
different molecules will be different, and it is impossible to get 100 percent efficiency 
for all the molecules at once. In addition, there is another complication which is 
easy to take into account, but we don’t want to bother with it at this stage. You 
remember that the electric field in a cavity usually varies from place to place across 
the cavity. Thus, as the molecules drift across the cavity, the electric field at the 
molecule varies in a way that is more complicated than the simple sinusoidal 
oscillation in time that we have assumed. Clearly, one would have to use a more 
complicated integration to do the problem exactly, but the general idea is still the 
same. 

There are other ways of making masers. Instead of separating the atoms in 
state 1 1) from those in state | II) by a Stern-Gerlach apparatus, one can have the 
atoms already in the cavity (as a gas or a solid) and shift atoms from state | II) 
to state 1 1) by some means. One way is one used in the so-called three-state maser. 
For it, atomic systems are used which have three energy levels, as shown in Fig. 
9-6, with the following special properties. The system will absorb radiation 
(say, light) of frequency ftco x and go from the lowest energy level Eu to some 
high-energy level E', and then will quickly emit photons of frequency hco 2 and go 
to the state 1 1) with energy Ej. The state 1 1) has a long lifetime so its population 
can be raised, and the conditions are then appropriate for maser operation between 
states 1 1) and | II). Although such a device is called a “three-state” maser, the 
maser operation really works just as a two-state system such as we are describing. 

A laser (Tight Amplification by Stimulated Emission of Radiation) is just a 
maser working at optical frequencies. The “cavity” for a laser usually consists of 
just two plane mirrors between which standing waves are generated. 



Fig. 9-6. 
“three-state” 


The energy levels of 
maser. 


a 


9-5 Transitions off resonance 

Finally, we would like to find out how the states vary in the circumstance that 
the cavity frequency is nearly, but not exactly, equal to to 0 . We could solve this 
problem exactly, but instead of trying to do that, we’ll take the important case 
that the electric field is small and also the period of time T is small, so that iiZ^T/h 
is much less than one. Then, even in the case of perfect resonance which we have 
just worked out, the probability of making a transition is small. Suppose that we 
start again with 7/ — 1 and 7 // = 0. During the time T we would expect 7 / to 
remain nearly equal to one, and 7// to remain very small compared with unity. 
Then the problem is very easy. We can calculate 7 n from the second equation in 
(9.45), taking 7 / equal to one and integrating from t = 0 to / = T. We get 




h 


] _ e *(w-wo)? n ' 

CO — COo 


(9.51) 


This 7j !, used with Eq. (9.40), gives the amplitude to have made a transition from 
the state | I) to the state | II) during the time interval T. The probability P(I — » II) 
to make the transition is | 7 / i | 2 , or 


P(I^ II) = | 7//| 2 


M& 0 T 2 sin 2 [(to — co 0 )r/2] 

h J [(« - co 0 )T/2] 2 


(9.52) 
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It is interesting to plot this probability for a fixed length of time as a function 
of the frequency of the cavity in order to see how sensitive it is to frequencies near 
the resonant frequency co 0 . We show such a plot of P(I — > II) in Fig. 9-7. (The 
vertical scale has been adjusted to be 1 at the peak by dividing by the value of the 
probability when co = co 0 .) We have seen a curve like this in the diffraction theory, 
so you should already be familiar with it. The curve falls rather abruptly to zero 
for (co — co 0 ) = 27T/T and never regains significant size for large frequency devia- 
tions. In fact, by far the greatest part of the area under the curve lies within the 
range =*= ir/T. It is possible to showf that the area under the curve is just 2i r/T and 
is equal to the area of the shaded rectangle drawn in the figure. 

Let’s examine the implication of our results for a real maser. Suppose that 
the ammonia molecule is in the cavity for a reasonable length of time, say for one 
millisecond. Then for / 0 = 24,000 megacycles, we can calculate that the prob- 
ability for a transition falls to zero for a frequency deviation of (/ — /o)//o = 
1// 0 T, which is five parts in 10 8 . Evidently the frequency must be very close to co 0 
to get a significant transition probability. Such an effect is the basis of the great 
precision that can be obtained with “atomic” clocks, which work on the maser 
principle. 




Fig. 9-7. Transition probability for the ammonia Fig. 9-8. The spectral intensity 4(co) can be approx- 

molecule as a function of frequency. imated by its value at oo 0 * 


9-6 The absorption of light 

Our treatment above applies to a more general situation than the ammonia 
maser. We have treated the behavior of a molecule under the influence of an 
electric field, whether that field was confined in a cavity or not. So we could be 
simply shining a beam of “light” — at microwave frequencies — at the molecule 
and ask for the probability of emission or absorption. Our equations apply equally 
well to this case, but let’s rewrite them in terms of the intensity of the radiation 
rather than the electric field. If we define the intensity 4 to be the average energy 
flow per unit area per second, then from Chapter 27 of Volume II, we can write 


* = «nc 2 |£ X B \ ave = Je 0 c 2 (£ X B)„ 


2eoc6o- 


(The maximum value of S is 2S 0 .) The transition probability now becomes: 


/>(/ ii) = 2 tt 


47TC0 h 2 C 




sin 2 [(co — aj 0 )T/2] 

T(« -aoyr/zjT 


(9.53) 


f Using the formula (sin 2 x/x 2 ) dx = i r. 
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Ordinarily the light shining on such a system is not exactly monochromatic. 
It is, therefore, interesting to solve one more problem — that is, to calculate the 
transition probability when the light has intensity tf(co) per unit frequency interval, 
covering a broad range which includes o> 0 . Then, the probability of going from 
1 1) to | II) will become an integral: 


P(I^ II) 


2t 


2 


M 


4tt eoh' 2 c 



sin 2 [(« — o)q)T/2] 
T(^-^o)772l^ 


doo. 


(9.54) 


In general, £T(oj) will vary much more slowly with co than the sharp resonance term. 
The two functions might appear as shown in Fig. 9-8. In such cases, we can re- 
place tf(aj) by its value 0(to o ) at the center of the sharp resonance curve and take 
it outside of the integral. What remains is just the integral under the curve of 
Fig. 9-7, which is, as we have seen, just equal to 2tt/T. We get the result that 


p « ~ <9 - 55) 

This is an important result, because it is the general theory of the absorption 
of light by any molecular or atomic system. Although we began by considering a 
case in which state 1 1) had a higher energy than state | II), none of our arguments 
depended on that fact. Equation (9.55) still holds if the state 1 1) has a lower 
energy than the state | II ) ; then P(I -> II) represents the probability for a transition 
with the absorption of energy from the incident electromagnetic wave. The 
absorption of light by any atomic system always involves the amplitude for a 
transition in an oscillating electric field between two states separated by an 
energy E = hw 0 . For any particular case, it is always worked out in just the 
way we have done here and gives an expression like Eq. (9.55). We, therefore, 
emphasize the following features of this result. First, the probability is pro- 
portional to T. In other words, there is a constant probability per unit time 
that transitions will occur. Second, this probability is proportional to the intensity 
of the light incident on the system. Finally, the transition probability is propor- 
tional to ju 2 , where, you remember, defined the shift in energy due to the 
electric field 8. Because of this, ju8 also appeared in Eqs. (9.38) and (9.39) as the 
coupling term that is responsible for the transition between the otherwise stationary 
states 1 1) and | II). In other words, for the small 8 we have been considering, 
juS is the so-called “perturbation term” in the Hamiltonian matrix element which 
connects the states 1 1) and | II). In the general case, we would have that /i8 
gets replaced by the matrix element (//|H|7) (see Section 5-6). 

In Volume I (Section 42-5) we talked about the relations among light absorp- 
tion, induced emission, and spontaneous emission in terms of the Einstein A- and 
^-coefficients. Here, we have at last the quantum mechanical procedure for 
computing these coefficients. What we have called P(I — > II) for our two-state 
ammonia molecule corresponds precisely to the absorption coefficient B nm of the 
Einstein radiation theory. For the complicated ammonia molecule — which is too 
difficult for anyone to calculate— we have taken the matrix element (7/|H| I) as 
ja8, saying that p is to be gotten from experiment. For simpler atomic systems, the 
Mmn which belongs to any particular transition can be calculated from the definition 


Vmn& = (m \H\ n) = H mni (9.56) 

where H mn is the matrix element of the Hamiltonian which includes the effects of 
a weak electric field. The p mn calculated in this way is called the electric dipole 
matrix element. The quantum mechanical theory of the absorption and emission 
of light is, therefore, reduced to a calculation of these matrix elements for particular 
atomic systems. 

Our study of a simple two-state system has thus led us to an understanding 
of the general problem of the absorption and emission of light. 
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to 

Other Two-State Systems 


10-1 The hydrogen molecular ion 

In the last chapter we discussed some aspects of the ammonia molecule under 
the approximation that it can be considered as a two-state system. It is, of course, 
not really a two-state system— there are many states of rotation, vibration, transla- 
tion, and so on — but each of these states of motion must be analyzed in terms of 
two internal states because of the flip-flop of the nitrogen atom. Here we are going 
to consider other examples of systems which, to some approximation or other, 
can be considered as two-state systems. Lots of things will be approximate because 
there are always many other states, and in a more accurate analysis they would 
have to be taken into account. But in each of our examples we will be able to 
understand a great deal by just thinking about two states. 

Since we will only be dealing with two-state systems, the Hamiltonian we 
need will look just like the one we used in the last chapter. When the Hamiltonian 
is independent of time, we know that there are two stationary states with definite — 
and usually different — energies. Generally, however, we start our analysis with a 
set of base states which are not these stationary states, but states which may, 
perhaps, have some other simple physical meaning. Then, the stationary states 
of the system will be represented by a linear combination of these base states. 

For convenience, we will summarize the important equations from Chapter 
9. Let the original choice of base states be | 7) and | 2). Then any state | is 
represented by the linear combination 
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I *> = I 0(1 | *> + I 2}{2 I *> = | 1)C\ + i 2>C 2 . (10.1) 

The amplitudes C, (by which we mean either C x or C 2 ) satisfy the two linear differ- 
ential equations 

ih = £ HuCj, ( 10 . 2 ) 

3 

where both / and j take on the values 1 and 2, 

When the terms of the Hamiltonian Hjj do not depend on t 9 the two states of 
definite energy (the stationary states), which we call 

|*/>=| and \tn) = [ 

have the energies 


Ei = 


H „ + H 2 


n — H 2 2> 

VI 2 y 


+ H V2 H 21 > 


Eu = 


H\ i -fi H 22 _ ~ ^22 ^ 2 


h X2 h 21 


The two Cs for each of these states have the same time dependence. The state 
vectors | 1) and | //) which go with the stationary states are related to our original 
base states | 1) and | 2) by 

| /) = | l)a x + | 2)a 2 , (10 4) 

I //) = [ l)a\ + | 2><4 
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The a’$ are complex constants, which satisfy 


M 2 + M 2 = i, 

a l H12 

a 2 ~ E t - H 1X ’ 

|a'i| 2 + |a 2 | 2 = 1, 

«i = H l2 

02 — Hn 


(10.5) 


( 10 . 6 ) 


If Hu and H 2 2 are equal— say both are equal to E 0 —\ and H 12 = /f 2 i = — A, 
then Ei = E 0 + A, E ri = E 0 — A, and the states 1 I) and | II) are particularly 
simple: 


I) = ~ 

V2 


U)~\2) 


II) = 


V2 


\l)+\2) 


00.7) 



l*> © 



Fig. 10-1. A set of base states for 
two protons and an electron. 


Now we will use these results to discuss a number of interesting examples 
taken from the fields of chemistry and physics. The first example is the hydrogen 
molecular ion. A positively ionized hydrogen molecule consists of two protons 
with one electron worming its way around them. If the two protons are very far 
apart, what states would we expect for this system? The answer is pretty clear: 
The electron will stay close to one proton and form a hydrogen atom in its lowest 
state, and the other proton will remain alone as a positive ion. So, if the two 
protons are far apart, we can visualize one physical state in which the electron is 
“attached” to one of the protons. There is, clearly, another state symmetric to 
that one in which the electron is near the other proton, and the first proton is the 
one that is an ion. We will take these two as our base states, and we’ll call them 
| 1) and | 2). They are sketched in Fig. 10-1. Of course, there are really many 
states of an electron near a proton, because the combination can exist as any one 
of the excited states of the hydrogen atom. We are not interested in that variety 
of states now; we will consider only the situation in which the hydrogen atom is in 
the lowest state — its ground state — and we will, for the moment, disregard spin 
of the electron. We can just suppose that for all our states the electron has its 
spin “up” along the z-axis.f 

Now to remove an electron from a hydrogen atom requires 13.6 electron volts 
of energy. So long as the two protons of the hydrogen molecular ion are far apart, 
it still requires about this much energy — which is for our present considerations a 
great deal of energy — to get the electron somewhere near the midpoint between the 
protons. So it is impossible, classically, for the electron to jump from one proton 
to the other. However, in quantum mechanics it is possible — though not very 
likely. There is some small amplitide for the electron to move from one proton 
to the other. As a first approximation, then, each of our base states | 7} and | 2) 
will have the energy E 0 , which is just the energy of one hydrogen atom plus one 
proton. We can take that the Hamiltonian matrix elements Hu and H 2 2 are 
both approximately equal to £ 0 . The other matrix elements H 12 and H 2 \, which 
are the amplitudes for the electron to go back and forth, we will again write as — A. 

You see that this is the same game we played in the last two chapters. If we 
disregard the fact that the electron can flip back and forth, we have two states of 
exactly the same energy. This energy will, however, be split into two energy levels 
by the possibility of the electron going back and forth — the greater the probability 
of the transition, the greater the split. So the two energy levels of the system are 
E q A and E 0 — A, and the states which have these definite energies are given 
by Eqs. (10.7). 


t This is satisfactory so long as there are no important magnetic fields. We will discuss 
the effects of magnetic fields on the electron later in this chapter, and the very small 
effects of spin in the hydrogen atom in Chapter 12. 
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From our solution we see that if a proton and a hydrogen ion are put any- 
where near together, the electron will not stay on one of the protons but will flip 
back and forth between the two protons. If it starts on one of the protons, it will 
oscillate back and forth between the states | 1) and | 2 )— giving a time-varying 
solution. In order to have the lowest energy solution (which does not vary with 
time), it is necessary to start the system with equal amplitudes for the electron to 
be around each proton. Remember, there are not two electrons — we are not saying 
that there is an electron around each proton. There is only one electron, and it 
has the same amplitude — \/\/2 in magnitude — to be in either position. 

Now the amplitude A for an electron which is near one proton to get to the 
other one depends on the separation between the protons. The closer the protons 
are together, the larger the amplitude. You remember that we talked in Chapter 
7 about the amplitude for an electron to “penetrate a barrier,” which it could not 
do classically. We have the same situation here. The amplitude for an electron 
to get across decreases roughly exponentially with the distance — for large distances. 
Since the transition probability, and therefore A , gets larger when the protons are 
closer together, the separation of the energy levels will also get larger. If the system 
is in the state | /), the energy E {] + A increases with decreasing distance, so these 
quantum mechanical effects make a repulsive force tending to keep the protons 
apart. On the other hand, if the system is in the state //), the total energy decreases 
if the protons are brought closer together; there is an attractive force pulling the 
protons together. The variation of the two energies with the distance between the 
two protons should be roughly as shown in Fig. 10-2. We have, then, a quantum- 
mechanical explanation of the binding force that holds the Ut ion together. 




Fig. 10-2. The energies of the two stationary Fig. 10-3. The energy levels of the ion as a 

states of the ion as a function of the distance function of the interproton distance D. (Eh = 1 3.6 ev. 

between the two protons. 


We have, however, forgotten one thing. In addition to the force we have just 
described, there is also an electrostatic repulsive force between the two protons. 
When the two protons are far apart — as in Fig. 10-1 — the “bare” proton sees only 
a neutral atom, so there is a negligible electrostatic force. At very close distances, 
however, the “bare” proton begins to get “inside” the electron distribution — that 
is, it is closer to the proton on the average than to the electron. So there begins 
to be some extra electrostatic energy which is, of course, positive. This energy — 
which also varies with the separation — should be included in E 0 . So for E {) we 
should take something like the broken-line curve in Fig. 10-2 which rises rapidly 
for distances less than the radius of a hydrogen atom.We should add and subtract 
the flip-flop energy A from this E 0 . When we do that, the energies £> and En will 
vary with the interproton distance D as shown in Fig. 10-3. [In this figure, we 
have plotted the results of a more detailed calculation. The interproton distance 
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is given in units of 1 A(10 -8 cm), and the excess energy over a proton plus a hydro- 
gen atom is given in units of the binding energy of the hydrogen atom — the so- 
called “Rydberg” energy, 13.6 ev.] We see that the state | II) has a minimum-en- 
ergy point. This will be the equilibrium configuration — the lowest energy condition 
— for the Ht ion. The energy at this point is lower than the energy of a separated 
proton and hydrogen ion, so the system is bound. A single electron acts to hold 
the two protons together. A chemist would call it a “one-electron bond.” 

This kind of chemical binding is also often called “quantum mechanical 
resonance” (by analogy with the two coupled pendulums we have described 
before). But that really sounds more mysterious than it is, it’s only a “resonance” 
if you start out by making a poor choice for your base states — as we did also! 
If you picked the state | //), you would have the lowest energy state — that’s all. 

We can see in another way why such a state should have a lower energy than 
a proton and a hydrogen atom. Let’s think about an electron near two protons 
with some fixed, but not too large, separation. You remember that with a single 
proton the electron is “spread out” because of the uncertainty principle. It seeks 
a balance between having a low coulomb potential energy and not getting con- 
fined into too small a space, which would make a high kinetic energy (because of 
the uncertainty relation Ap Ax ^ h). Now if there are two protons, there is more 
space where the electron can have a low potential energy. It can spread out — 
lowering its kinetic energy — without increasing its potential energy. The net 
result is a lower energy than a hydrogen atom. Then why does the other state 1 1) 
have a higher energy? Notice that this state is the difference of the states | I) and 
| 2). Because of the symmetry of 1 1) and | 2), the difference must have zero 
amplitude to find the electron half-way between the two protons. This means that 
the electron is somewhat more confined, which leads to a larger energy. 

We should say that our approximate treatment of the ion as a two-state 
system breaks down pretty badly once the protons get as close together as they 
are at the minimum in the curve of Fig. 10-3, and so, will not give a good value 
for the actual binding energy. For small separations, the energies of the two 
“states” we imagined in Fig. 6-1 are not really equal to £ () ; a more refined quan- 
tum mechanical treatment is needed. 

Suppose we ask now what would happen if instead of two protons, we had 
two different objects — as, for example, one proton and one lithium positive ion 
(both particles still with a single positive charge). In such a case, the two terms 
H x i and H 2 2 of the Hamiltonian would no longer be equal; they would, in fact, 
be quite different. If it should happen that the difference (Hu — H 22 ) is, in 
absolute value, much greater than A = ~H i2 , the attractive force gets very weak, 
as we can see in the following way. 

If we put H l2 H 2 1 = A 2 into Eqs. (10.3) we get 

p _ Hu + H 22 ^ Hu - H 22 f~ 4T2 

2 2 \ A ^ (jy„ - H 22 )*‘ 

When Hu — H 2 2 is much greater than A 2 , the square root is very nearly equal to 


1 + 


2 A 2 

(Hi 1 - H 22 )*' 


The two energies are then 

A 2 

El = Hi1 + 

E " = 


( 10 . 8 ) 


They are now very nearly just the energies Hu and H 22 of the isolated atoms, 
pushed apart only slightly by the flip-flop amplitude A . 

The energy difference E T — E IT is 


2 A 2 


KM 


(Hu - H 22 ) + 


H n - H 22 



The additional separation from the flip-flop of the electron is no longer equal to 
2 A; it is smaller by the factor A/ (Hu — H 2 2 )> which we are now taking to be 
much less than one. Also, the dependence of Ei — Ejj on the separation of the 
two nuclei is much smaller than for the ion — it is also reduced by the factor 
A/(Hn — H 2 2 )• We can now see why the binding of unsymmetric diatomic 
molecules is generally very weak. 

In our theory of the ion we have discovered an explanation for the 
mechanism by which an electron shared by two protons provides, in effect, an 
attractive force between the two protons which can be present even when the 
protons are at large distances. The attractive force comes from the reduced energy 
of the system due to the possibility of the electron jumping from one proton to 
the other. In such a jump the system changes from the configuration (hydrogen 
atom, proton) to the configuration (proton, hydrogen atom), or switches back. 
We can write the process symbolically as 

(H, p) - (p, H). 

The energy shift due to this process is proportional to the amplitude A that an 
electron whose energy is — W H (its binding energy in the hydrogen atom) can 
get from one proton to the other. 

For large distances R between the two protons, the electrostatic potential 
energy of the electron is nearly zero over most of the space it must go when it 
makes its jump. In this space, then, the electron moves nearly like a free particle 
in empty space — but with a negative energy! We have seen in Chapter 3 [Eq. 
(3.7)] that the amplitude for a particle of definite energy to get from one place 
to another a distance r away is proportional to 

3 

r 


where p is the momentum corresponding to the definite energy. In the present 
case (using the nonrelativistic formula), p is given by 


£1 

2 m 


-W H . 


(10.9) 


This means that p is an imaginary number, 


p = iy/lmWn 

(the other sign for the radical gives nonsense here). 

We should expect, then, that the amplitude A for the ion will vary as 

A (10.10) 


for large separations R between the two protons. The energy shift due to the 
electron binding is proportional to A , so there is a force pulling the two protons 
together which is proportional — for large R — to the derivative of (10.10) with 
respect to R. 

Finally, to be complete, we should remark that in the two-proton, one-electron 
system there is still one other effect which gives a dependence of the energy on R . 
We have neglected it until now because it is usually rather unimportant — the 
exception is just for those very large distances where the energy of the exchange 
term A has decreased exponentially to very small values. The new effect we are 
thinking of is the electrostatic attraction of the proton for the hydrogen atom, 
which comes about in the same way any charged object attracts a neutral object. 
The bare proton makes an electric field S (varying as 1 /R 2 ) at the neutral hydrogen 
atom. The atom becomes polarized, taking on an induced dipole moment jjl 
proportional to S. The energy of the dipole is juS, which is proportional to S 2 — or 
to \/R\ So there is a term in the energy of the system which decreases with the 
fourth power of the distance, (It is a correction to E 0 .) This energy falls off with 
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distance more slowly than the shift A given by (10.10); at some large separation 
R it becomes the only remaining important term giving a variation of energy with 
R — and, therefore, the only remaining force. Note that the electrostatic term has 
the same sign for both of the base states (the force is attractive, so the energy is 
negative) and so also for the two stationary states, whereas the electron exchange 
term A gives opposite signs for the two stationary states. 


10-2 Nuclear forces 

We have seen that the system of a hydrogen atom and a proton has an energy 
of interaction due to the exchange of the single electron which varies at large 
separations R as 

-clR 

( 10 . 11 ) 

with a = \/2 mW H /h. (One usually says that there is an exchange of a “virtual” 
electron when — as here — the electron has to jump across a space where it would 
have a negative energy. More specifically, a “virtual exchange” means that the 
phenomenon involves a quantum mechanical interference between an exchanged 
state and a nonexchanged state.) 

Now we might ask the following question: Could it be that forces between 
other kinds of particles have an analogous origin? What about, for example, the 
nuclear force between a neutron and a proton, or between two protons? In an 
attempt to explain the nature of nuclear forces, Yukawa proposed that the force 
between two nucleons is due to a similar exchange effect — only, in this case, due 
to the virtual exchange, not of an electron, but of a new particle, which he called 
a “meson. ” Today, we would identify Yukawa’s meson with the 7r-meson (or 
“pion”) produced in high-energy collisions of protons or other particles. 

Let’s see, as an example, what kind of a force we would expect from the ex- 
change of a positive pion (7r + ) of mass m v between a proton and a neutron. Just 
as a hydrogen atom H° can go into a proton p + by giving up an electron e~ 

H°->p + + e" (10.12) 

a proton p + can go into a neutron n° by giving up a 7r + meson: 

p+->n° + 7T+ (10.13) 


So if we have a proton at a and a neutron at b separated by the distance R , the 
proton can become a neutron by emitting a i r + which is then absorbed by the 
neutron at b , turning it into a proton. There is an energy of interaction of the 
two-nucleon (plus pion) system which depends on the amplitude A for the pion 
exchange— just as we found for the electron exchange in the ion. 

In the process (10.12), the energy of the H° atom is less than that of the proton 
by W u (calculating nonrelativistically, and omitting the rest energy me 2 of the 
electron), so the electron has a negative kinetic energy — or imaginary momentum — 
as in Eq. (10.9). In the nuclear process (10.13), the proton and neutron have 
almost equal masses, so the 7 r + will have zero total energy. The relation between 
the total energy E and the momentum p for a pion of mass m * is 

E 2 = p 2 c 2 + mlc 4 

Since E is zero (or at least negligible in comparison with m^), the momentum is 
again imaginary: 

p = im^c. 


Using the same arguments we gave for the amplitude that a bound electron 
would penetrate the barrier in the space between two protons, we get for the nuclear 
case an exchange amplitude A which should — for large R — go as 




(10.14) 
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The interaction energy is proportional to A , and so varies in the same way. We 
get an energy variation in the form of the so-called Yukawa potential between 
two nucleons. Incidentally, we obtained this same formula earlier directly from 
the differential equation for the motion of a pion in free space [see Chapter 28, 
Vol. II, Eq. (28.18)]. 

We can, following the same line of argument, discuss the interaction between 
two protons (or between two neutrons) which results from the exchange of a 
neutral pion ( 7 T°). The basic process is now 

p+->p + + tt 0 . (10.15) 

A proton can emit a virtual i r°, but then it remains still a proton. If we have two 
protons, proton No. 1 can emit a virtual ? r° which is absorbed by proton No. 2. 
At the end, we still have two protons. This is somewhat different from the Hf ion. 
There the H° went into a different condi tion— the proton— after emitting the 
electron. Now we are assuming that a proton can emit a i r° without changing its 
character. Such processes are, in fact, observed in high-energy collisions. The 
process is analogous to the way that an electron emits a photon and ends up still 

an electron: e -> e + photon. (10.16) 

We do not “see” the photons inside the electrons before they are emitted or after 
they are absorbed, and their emission does not change the “nature” of the electron. 

Going back to the two protons, there is an interaction energy which arises 
from the amplitude A that one proton emits a neutral pion which travels across 
(with imaginary momentum) to the other proton and is absorbed there. This 
amplitude is again proportional to (10.14), with m w the mass of the neutral pion. 
All the same arguments give an equal interaction energy for two neutrons. Since 
the nuclear forces (disregarding electrical effects) between neutron and proton, 
between proton and proton, between neutron and neutron are the same, we con- 
clude that the masses of the charged and neutral pions should be the same. Experi- 
mentally, the masses are indeed very nearly equal, and the small difference is about 
what one would expect from electric self-energy corrections (see Chapter 28, 
Vol. II). 

There are other kinds of particles— like K-mesons— which can be exchanged 
between two nucleons. It is also possible for two pions to be exchanged at the 
same time. But all of these other exchanged “objects” have a rest mass m x higher 
than the pion mass m*, and lead to terms in the exchange amplitude which vary as 

(m x cifi) R 

R 

These terms die out faster with increasing R than the one-meson term. No one 
knows, today, how to calculate these higher-mass terms, but for large enough 
values of R only the one-pion term survives. And, indeed, those experiments which 
involve nuclear interactions only at large distances do show that the interaction 
energy is as predicted from the one-pion exchange theory. 

In the classical theory of electricity and magnetism, the coulomb electrostatic 
interaction and the radiation of light by an accelerating charge are closely related — 
both come out of the Maxwell equations. We have seen in the quantum theory that 
light can be represented as the quantum excitations of the harmonic oscillations of 
the classical electromagnetic fields in a box. Alternatively, the quantum theory 
can be set up by describing light in terms of particles — photons — which obey Bose 
statistics. We emphasized in Section 4-5 that the two alternative points of view 
always give identical predictions. Can the second point of view be carried through 
completely to include all electromagnetic effects? In particular, if we want to 
describe the electromagnetic field purely in terms of Bose particles — that is, in 
terms of photons — what is the coulomb force due to? 

From the “particle” point of view the coulomb interaction between two 
electrons comes from the exchange of a virtual photon. One electron emits a photon 
—as in reaction (10.16) — which goes over to the second electron, where it is 
absorbed in the reverse of the same reaction. The interaction energy is again given 
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by a formula like (10.14), but now with m w replaced by the rest mass of the photon 
— which is zero. So the virtual exchange of a photon between two electrons gives 
an interaction energy that varies simply inversely as R , the distance between the 
two electrons — just the normal coulomb potential energy! In the “particle” theory 
of electromagnetism, the process of a virtual photon exchange gives rise to all the 
phenomena of electrostatics. 




10-3 The hydrogen molecule 



Fig. 10-5. The energy levels of the 
H 2 molecule for different interproton 
distances D. (£/, = 1 3.6 ev.) 


As our next two-state system we will look at the neutral hydrogen molecule 
H 2 . It is, naturally, more complicated to understand because it has two electrons. 
Again, we start by thinking of what happens when the two protons are well 
separated. Only now we have two electrons to add. To keep track of them, we’ll 
call one of them “electron 0 ” and the other “electron b” We can again imagine 
two possible states. One possibility is that “electron a” is around the first proton 
and “electron b” is around the second, as shown in Fig. 10-4(a). We have simply 
two hydrogen atoms. We will call this state | 7). There is also another possibility: 
that “electron b” is around the first proton and that “electron a” is around the 
second. We call this state | 2), From the symmetry of the situation, those two 
possibilities should be energetically equivalent, but, as we will see, the energy of 
the system is not just the energy of two hydrogen atoms. We should mention that 
there are many other possibilities. For instance, “electron a” might be near the 
first proton and “electron b” might be in another state around the same proton. 
We’ll disregard such a case, since it will certainly have higher energy (because of 
the large coulomb repulsion between the two electrons). For greater accuracy, we 
would have to include such states, but we can get the essentials of the molecular 
binding by considering just the two states of Fig. 10.4. To this approximation we 
can describe any state by giving the amplitude (7 | <f>) to be in the state | 7) and an 
amplitude (2 | <j>) to be in state | 2). In other words, the state vector | <f>) can be 
written as the linear combination 

! *) = E 1 o</ 1 *>• 

i 

To proceed, we assume — as usual — that there is some amplitude A that the 
electrons can move through the intervening space and exchange places. This 
possibility of exchange means that the energy of the system is split, as we have seen 
for other two-state systems. As for the hydrogen molecular ion, the splitting is 
very small when the distance between the protons is large. As the protons approach 
each other, the amplitude for the electrons to go back and forth increases, so the 
splitting increases. The decrease of the lower energy state means that there is an 
attractive force which pulls the atoms together. Again the energy levels rise when 
the protons get very close together because of the coulomb repulsion. The net 
final result is that the two stationary states have energies which vary with the sep- 
aration as shown in Fig. 10-5. At a separation of about 0.74 A, the lower energy 
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Jevel reaches a minimum; this is the proton-proton distance of the true hydrogen 
molecule. 

Now you have probably been thinking of an objection. What about the fact 
that the two electrons are identical particles? We have been calling them “electron 
a ” and “electron bf but there really is no way to tell which is which. And we have 
said in Chapter 4 that for electrons — which are Fermi particles — if there are two 
ways something can happen by exchanging the electrons, the two amplitudes will 
interfere with a negative sign. This means that if we switch which electron is which, 
the sign of the amplitude must reverse. We have just concluded, however, that 
the bound state of the hydrogen molecule would be (at t = 0) 

|/7> = V5 (|7) + ]2}) - 

However, according to our rules of Chapter 4, this state is not allowed. If we 
reverse which electron is which, we get the state 

^=(l*> + | /», 

and we get the same sign instead of the opposite one. 

These arguments are correct if both electrons have the same spin . It is true that 
if both electrons have spin up (or both have spin down), the only state that is per- 
mitted is 

|/> = ~{\ 1 ) - | 2 ». 

V2 

For this state, an interchange of the two electrons gives 

(I 2 ) - | 1 », 

which is — | 7), as required. So if we bring two hydrogen atoms near to each 
other with their electrons spinning in the same direction, they can go into the 
state | 7) and not state | 77). But notice that state | 7) is the upper energy state. 
Its curve of energy versus separation has no minimum. The two hydrogens will 
always repel and will not form a molecule. So we conclude that the hydrogen 
molecule cannot exist with parallel electron spins. And that is right. 

On the other hand, our state | 77) is perfectly symmetric for the two electrons. 
In fact, if we interchange which electron we call a and which we call b we get back 
exactly the same state. We saw in Section 4-7 that if two Fermi particles are in 
the same state, they must have opposite spins. So, the bound hydrogen molecule 
must have one electron with spin up and one with spin down. 

The whole story of the hydrogen molecule is really somewhat more compli- 
cated if we want to include the proton spins. It is then no longer right to think of 
the molecule as a two- state system. It should really be looked at as an eight- state 
system there are four possible spin arrangements for each of our states | 1) and 
| 2)— so we were cutting things a little short by neglecting the spins. Our final 
conclusions are, however, correct. 

We find that the lowest energy state— the only bound state— of the H 2 mole- 
cule has the two electrons with spins opposite. The total spin angular momentum 
of the electrons is zero. On the other hand, two nearby hydrogen atoms with spins 
parallel and so with a total angular momentum h — must be in a higher (unbound) 
energy state; the atoms repel each other. There is an interesting correlation be- 
tween the spins and the energies. It gives another illustration of something we 
mentioned before, which is that there appears to be an “interaction” energy be- 
tween two spins because the case of parallel spins has a higher energy than the 
opposite case. In a certain sense you could say that the spins try to reach an 
antiparallel condition and, in doing so, have the potential to liberate energy — not 
because there is a large magnetic force, but because of the exclusion principle. 
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We saw in Section 10-1 that the binding of two different ions by a single elec- 
tron is likely to be quite weak. This is not true for binding by two electrons. Sup- 
pose the two protons in Fig. 10-4 were replaced by any two ions (with closed inner 
electron shells and a single ionic charge), and that the binding energies of an 
electron at the two ions are different. The energies of states | 1) and | 2) would 
still be equal because in each of these states we have one electron bound to each 
ion. Therefore, we always have the splitting proportional to A. Two-electron 
binding is ubiquitous it is the most common valence bond. Chemical binding 
usually involves this flip-flop game played by two electrons. Although two atoms 
can be bound together by only one electron, it is relatively rare — because it re- 
quires just the right conditions. 

Finally, we want to mention that if the energy of attraction for an electron to 
one nucleus is much greater than to the other, then what we have said earlier about 
ignoring other possible states is no longer right. Suppose nucleus a (or it may be 
a positive ion) has a much stronger attraction for an electron than does nucleus b. 
It may then happen that the total energy is still fairly low even when both electrons 
are at nucleus a , and no electron is at nucleus b. The strong attraction may more 
than compensate for the mutual repulsion of the two electrons. If it does, the 
lowest energy state may have a large amplitude to find both electrons at a (making 
a negative ion) and a small amplitude to find any electron at b . The state looks like 
a negative ion with a positive ion. This is, in fact, what happens in an “ionic” 
molecule like NaCl. You can see that all the gradations between covalent binding 
and ionic binding are possible. 

You can now begin to see how it is that many of the facts of chemistry can 
be most clearly understood in terms of a quantum mechanical description. 
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10-4 The benzene molecule 

Chemists have invented nice diagrams to represent complicated organic 
molecules. Now we are going to discuss one of the most interesting of them— the 
benzene molecule shown in Fig. 10-6. It contains six carbon and six hydrogen 
atoms in a symmetrical array. Each bar of the diagram represents a pair of elec- 
trons, with spins opposite, doing the covalent bond dance. Each hydrogen atom 
contributes one electron and each carbon atom contributes four electrons to 
make up the total of 30 electrons involved. (There are two more electrons close to 
the nucleus of the carbon which form the first, or K, shell. These are not shown 
since they are so tightly bound that they are not appreciably involved in the cova- 
lent binding.) So each bar in the figure represents a bond, or pair of electrons, 
and the double bonds mean that there are two pairs of electrons between alternate 
pairs of carbon atoms. 


There is a mystery about this benzene molecule. We can calculate what energy 
should be required to form this chemical compound, because the chemists have 
measured the energies of various compounds which involve pieces of the ring — for 
instance, they know the energy of a double bond by studying ethylene, and so on. 
We can, therefore, calculate the total energy we should expect for the benzene 
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Fig. 10-7. Two possibilities of orthodibromobenzene. The two bromines could 
be separated by a single bond or by a double bond. 
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molecule. The actual energy of the benzene ring, however, is much lower than we 
get by such a calculation; it is more tightly bound than we would expect from what 
is called an “unsaturated double bond system/’ Usually a double bond system 
which is not in such a ring is easily attacked chemically because it has a relatively 
high energy — the double bonds can be easily broken by the addition of other 
hydrogens. But in benzene the ring is quite permanent and hard to break up. 
In other words, benzene has a much lower energy than you would calculate from 
the bond picture. 

Then there is another mystery. Suppose we replace two adjacent hydrogens 
by bromine atoms to make ortho-dibromobenzene. There are two ways to do this, 
as shown in Fig. 10-7. The bromines could be on the opposite ends of a double 
bond as shown in part (a) of the figure, or could be on the opposite ends of a single 
bond as in (b). One would think that ortho-dibromobenzene should have two 
different forms, but it doesn’t. There is only one such chemical. f 

Now we want to resolve these mysteries — and perhaps you have already 
guessed how: by noticing, of course, that the “ground state” of the benzene ring 
is really a two-state system. We could imagine that the bonds in benzene could 
be in either of the two arrangements shown in Fig. 10-8. You say, “But they are 
really the same; they should have the same energy.” Indeed, they should. And for 
that reason they must be analyzed as a two-state system. Each state represents a 
different configuration of the whole set of electrons, and there is some amplitude 
A that the whole bunch can switch from one arrangement to the other— there is a 
chance that the electrons can flip from one dance to the other. 

As we have seen, this chance of flipping makes a mixed state whose energy is 
lower than you would calculate by looking separately at either of the two pictures 
in Fig. 10-8. Instead, there are two stationary states — one with an energy above 
and one with an energy below the expected value. So actually, the true normal 
state (lowest energy) of benzene is neither of the possibilities shown in Fig. 10-8, 
but it has the amplitude l/\/2 to be in each of the states shown. It is the only 
state that is involved in the chemistry of benzene at normal temperatures. In- 
cidentally, the upper state also exists ; we can tell it is there because benzene has a 
strong absorption for ultraviolet light at the frequency = (E z — E n )/h. You 
will remember that in ammonia, where the object flipping back and forth was three 
protons, the energy separation was in the microwave region. In benzene, the 
objects are electrons, and because they are much lighter, they find it easier to flip 
back and forth, which makes the coefficient A very much larger. The result is 
that the energy difference is much larger — about 1.5 ev, which is the energy of 
an ultraviolet photon, J 

What happens if we substitute bromine? Again the two “possibilities” (a) 
and (b) in Fig, 10-7 represent the two different electron configurations. The only 
difference is that the two base states we start with would have slightly different 
energies. The lowest energy stationary state will still involve a linear combination 
of the two states, but with unequal amplitudes. The amplitude for state | 1) might 
have a value something like \/2/3, say, whereas state | 2) would have the magnitude 
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Fig. 10-8. A set of base states for 
the benzene molecule. 


t We are oversimplifying a little. Originally, the chemists thought that there should 
be four forms of dibromobenzene : two forms with the bromines on adjacent carbon atoms 
(ortho-dibromobenzene), a third form with the bromines on next-nearest carbons (meta- 
dibromobenzene), and a fourth form with the bromines opposite to each other (para- 
dibromobenzene). However, they found only three forms — there is only one form of 
the ortho-molecule. 

t What we have said is a little misleading. Absorption of ultraviolet light would be 
very weak in the two-state system we have taken for benzene, because the dipole moment 
matrix element between the two states is zero. [The two states are electrically symmetric, 
so in our formula Eq. (9.55) for the probability of a transition, the dipole moment n 
is zero and no light is absorbed.] If these were the only states, the existence of the upper 
state would have to be shown in other ways. A more complete theory of benzene, how- 
ever, which begins with more base states (such as those having adjacent double bonds) 
shows that the true stationary states of benzene are slightly distorted from the ones we 
have found. The resulting dipole moments permit the transition we mentioned in the text 
to occur by the absorption of ultraviolet light. 
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\/l/3. We can’t say for sure without more information, but once the two energies 
Hu and H 2 2 are no longer equal, then the amplitudes Ci and C 2 no longer have 
equal magnitudes. This means, of course, that one of the two possibilities in the 
figure is more likely than the other, but the electrons are mobile enough so that 
there is some amplitude for both. The other state has different amplitudes (like 
a/ 1 /3 and — \/2/3) but lies at a higher energy. There is only one lowest state, 
not two as the naive theory of fixed chemical bonds would suggest. 
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Fig. 10-9. Two base states for the 
molecule of the dye magenta. 


We will give you one more chemical example of the two-state phenomenon — 
this time on a larger molecular scale. It has to do with the theory of dyes. Many 
dyes — in fact, most artificial dyes — have an interesting characteristic; they have a 
kind of symmetry. Figure 10-9 shows an ion of a particular dye called magenta, 
which has a purplish red color. The molecule has three ring structures — two of 
which are benzene rings. The third is not exactly the same as a benzene ring 
because it has only two double bonds inside the ring. The figure shows two equally 
satisfactory pictures, and we would guess that they should have equal energies. 
But there is a certain amplitude that all the electrons can flip from one condition 
to the other, shifting the position of the “unfilled” position to the opposite end. 
With so many electrons involved, the flipping amplitude is somewhat lower than it 
is in the case of benzene, and the difference in energy between the two stationary 
states is smaller. There are, nevertheless, the usual two stationary states 1 1) and | II) 
which are the sum and difference combinations of the two base states shown in the 
figure. The energy separation of 1 1) and | II) comes out to be equal to the energy 
of a photon in the optical region. If one shines light on the molecule, there is a 
very strong absorption at one frequency, and it appears to be brightly colored. 
That’s why it’s a dye! 

Another interesting feature of such a dye molecule is that in the two base 
states shown, the center of electric charge is located at different places. As a result, 
the molecule should be strongly affected by an external electric field. We had a 
similar effect in the ammonia molecule. Evidently we can analyze it by using 
exactly the same mathematics, provided we know the numbers E 0 and A. Gener- 
ally, these are obtained by gathering experimental data. If one makes measure- 
ments with many dyes, it is often possible to guess what will happen with some 
related dye molecule. Because of the large shift in the position of the center of 
electric charge the value of fx in formula (9.55) is large and the material has a high 
probability for absorbing light of the characteristic frequency 2 A/h. Therefore, 
it is not only colored but very strongly so — a small amount of substance absorbs 
a lot of light. 

The rate of flipping — and, therefore, A — is very sensitive to the complete struc- 
ture of the molecule. By changing A, the energy splitting, and with it the color of 
the dye, can be changed. Also, the molecules do not have to be perfectly sym- 
metrical. We have seen that the same basic phenomenon exists with slight modifica- 
tions, even if there is some small asymmetry present. So, one can get some modi- 
fication of the colors by introducing slight asymmetries in the molecules. For 
example, another important dye, malachite green, is very similar to magenta, but 
has two of the hydrogens replaced by CH 3 . It’s a different color because the A is 
shifted and the flip-flop rate is changed. 


10-6 The Hamiltonian of a spin one-half particle in a magnetic field 

Now we would like to discuss a two-state system involving an object of spin 
one-half. Some of what we will say has been covered in earlier chapters, but doing 
it again may help to make some of the puzzling points a little clearer. We can 
think of an electron at rest as a two-state system. Although we will be talking in 
this section about “an electron,” what we find out will be true for any spin one-half 
particle. Suppose we choose for our base states | 1) and | 2) the states in which the 
z-component of the electron spin is +h/2 and ~h/2. 
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These states are, of course, the same ones we have called (+) and (— ) in 
earlier chapters. To keep the notation of this chapter consistent, though, we call 
the “plus” spin state | 7) and the “minus” spin state [ 2) — where “plus” and “minus” 
refer to the angular momentum in the z-direction. 

Any possible state \p for the electron can be described as in Eq. (10.1) by 
giving the amplitude C\ that the electron is in state ( 7), and the amplitude C 2 
that it is in state | 2). To treat this problem, we will need to know the Hamiltonian 
for this two-state system — that is, for an electron in a magnetic field. We begin 
with the special case of a magnetic field in the z-direction. 

Suppose that the vector B has only a z-component B z . From the definition 
of the two base states (that is, spins parallel and antiparallel to B ) we know that 
they are already stationary states with a definite energy in the magnetic field. 
State | 7) corresponds to an energyf equal to — pB z and state | 2) to The 

Hamiltonian must be very simple in this case since C Xi the amplitude to be in state 
| 7), is not affected by C 2 , and vice versa: 


ih = E X C X = —nB z Ci, 
ih ^ = E 2 C = +»B z C 2 . 

For this special case, the Hamiltonian is 

Hu = —p>B z , H\2 — 0 , 

77 2 1 = 0, 7T 2 2 = 


(10.17) 


(10.18) 


So we know what the Hamiltonian is for the magnetic field in the z-direction, and 
we know the energies of the stationary states. 

Now suppose the field is not in the z-direction. What is the Hamiltonian? 
How are the matrix elements changed if the field is not in the z-direction? We 
are going to make an assumption that there is a kind of superposition principle 
for the terms of the Hamiltonian. More specifically, we want to assume that if 
two magnetic fields are superposed, the terms in the Hamiltonian simply add — if 
we know the Hij for a pure B z and we know the H^ for a pure B x , then the Hij 
for both B z and B x together is simply the sum. This is certainly true if we consider 
only fields in the z-direction — if we double B z , then all the Hij are doubled. So 
let’s assume that H is linear in the field B. That’s all we need to be able to find 
the Hij for any magnetic field. 

Suppose we have a constant field B. We could have chosen our z-axis in its 
direction, and we would have found two stationary states with the energies ^pB. 
Just choosing our axes in a different direction won’t change the physics. Our 
description of the stationary states will be different, but their energies will still be 
=fj \xB — that is, 

El = -mV Bl + Bl + B 2 ; 

and (10.19) 

Eii = +mV^| + By + Bl . 

The rest of the game is easy. We have here the formulas for the energies. 
We want a Hamiltonian which is linear in B x , B y , and B z , and which will give these 
energies when used in our general formula of Eq. (10.3). The problem: find the 
Hamiltonian. First, notice that the energy splitting is symmetric, with an average 
value of zero. Looking at Eq. (10.3), we can see directly that that requires 

H 2 2 = ~H xl . 

(Note that this checks with what we already know when B x and B y are both zero; 


| We are taking the rest energy woe 2 as our “zero” of energy and treating the magnetic 
moment p of the electron as a negative number, since it points opposite to the spin, 
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in that case H xx = — ixB z and H 22 = l^B z .) Now if we equate the energies of 
Eq. (10.3) with what we know from Eq. (10.19), we have 


- H 22 y 


+ \H 12 \ 2 = tx\Bl + Bt + Bb 


( 10 . 20 ) 


(We have also made use of the fact that H 2 1 = H* 2) so that H 12 H 2X can also 
be written as \H X2 \ 2 .) Again for the special case of a field in the z-direction, this 
gives 

M 2 ^ 2 + \H 12 \ 2 = 


Clearly, \H X2 \ must be zero in this special case, which means that H l2 cannot 
have any terms in B z . (Remember, we have said that all terms must be linear in 
B x , B y , and B z .) 

So far, then, we have discovered that H X1 and H 22 have terms in B z , while 
H 12 and H 2X do not. We can make a simple guess that will satisfy Eq. (10.20) if 
we say that 

Hu = -fxB z , 

H 22 = fxB e , (10.21) 

aDd I// 12 I 2 = U.\B 2 X + B 2 y). 


And it turns out that that’s the only way it can be done! 

“Wait” — you say — “// 12 is not linear in B; Eq. (10,21) gives H 12 = 
lx\/B£ + Not necessarily. There is another possibility which is linear, 

namely, 

#12 ~ KBx + iBy). 


There are, in fact, several such possibilities — most generally, we could write 

H 12 ~ ^(B x =*= iB v )e ^ , 

where 8 is some arbitrary phase. Which sign and phase should we use? It turns 
out that you can choose either sign, and any phase you want, and the physical 
results will always be the same. So the choice is a matter of convention. People 
ahead of us have chosen to use the minus sign and to take e %b — — 1. We might 
as well follow suit and write 


H 12 = ~ iB y \ H 21 = ~fJi(B x + iB y ). 


(Incidentally, these conventions are related to, and consistent with, some of the 
arbitrary choices we made in Chapter 6.) 

The complete Hamiltonian for an electron in an arbitrary magnetic field is, 

then 


H 1 1 — y<B z , H \ 2 — t^{B x iBy), 

H 2 \ — —fx(B x + iB y ), H 22 — -{~fxB z . 


( 10 . 22 ) 


And the equations for the amplitudes C x and C 2 are 


ih ^ = -A.B& + (B x - iB y )Co], 
ih ^ = -ix[(B z + iB y )C l - B z C 2 ]. 


( 10 . 23 ) 


So we have discovered the “equations of motion for the spin states" of an 
electron in a magnetic field. We guessed at them by making some physical argu- 
ment, but the real test of any Hamiltonian is that it should give predictions in 
agreement with experiment. According to any tests that have been made, these 
equations are right. In fact, although we made our arguments only for constant 
fields, the Hamiltonian we have written is also right for magnetic fields which 
vary with time. So we can now use Eq. (10.23) to look at all kinds of interesting 
problems. 
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10-7 The spinning electron in a magnetic field 


Example number one: We start with a constant field in the z-direction. There 
are just the two stationary states with energies ^pB z . Suppose we add a small 
field in the ^-direction. Then the equations look like our old two-state problem. 

We get the flip-flop business once more, and the energy levels are split a little 

farther apart. Now let’s let the x-component of the field vary with time — say, as 
cos cot. The equations are then the same as we had when we put an oscillating 
electric field on the ammonia molecule in Chapter 9. You can work out the de- 
tails in the same way. You will get the result that the oscillating field causes 

transitions from the + z-state to the — z-state — and vice versa — when the hori- 
zontal field oscillates near the resonant frequency co 0 = 2 pBJh. This gives the 
quantum mechanical theory of the magnetic resonance phenomena we described 
in Chapter 35 of Volume II ( see Appendix ). 

It is also possible to make a maser which uses a spin one-half system. A 
Stern-Gerlach apparatus is used to produce a beam of particles polarized in, say, 
the + z-direction, which are sent into a cavity in a constant magnetic field. The 
oscillating fields in the cavity can couple with the magnetic moment and induce 
transitions which give energy to the cavity. 

Now let’s look at the following question. Suppose we have a magnetic field 
B which points in the direction whose polar angle is 9 and azimuthal angle is 
0, as in Fig. 10-10. Suppose, additionally, that there is an electron which has been 
prepared with its spin pointing along this field. What are the amplitudes C Y and 
C 2 for such an electron? In other words, calling the state of the electron | 0), 
we want to write 

I *> = I 1)C X + | 2)C 2 , 

where C, and C 2 are 

Cl - (I I 0), c 2 = <2 I 0), 


where by | 7) and | 2) we mean the same thing we used to call | + ) and | — \ 
(referred to our chosen z-axis). 

The answer to this question is also in our general equations for two-state 
systems. First, we know that since the electron’s spin is parallel to B it is in a 
stationary state with energy Ej = ~pB. Therefore, both C x and C 2 must vary 
as e lE i tin i> as in (9.18); and their coefficients aj and a 2 are given by (10.5), namely. 


a l _ H\2 

Cl 2 £*/ — 1 


(10.24) 


An additional condition is that a x and a 2 should be normalized so that \a r \ 2 + 
|o 2 | 2 = 1. We can take H lx and H X2 from (10.22) using 



Fig, 10-10. The direction of 8 is 
defined by the polar angle 9 and the 
azimuthal angle 0. 


B z = B cos 9 , B x = B sin 9 cos 0, B y ~ B sin 9 sin <f>. 


So we have 

Hi i = ~pB cos 9 , 

H 12 — —pB sin 9 (cos <j> — i sin 0). 


(10.25) 


The last factor in the second equation is, incidentally, so it is simpler to write 

H 1 2 = —pB sin 9 e~ i<f> . (10.26) 

Using these matrix elements in Eq. (10.16) — and canceling ~pB from numer- 
ator and denominator — we find 


a x __ sin 9e l4> 
a 2 1 — cos 9 


(10.27) 


With this ratio and the normalization condition, we can find both a x and a 2 . 
That’s not hard, but we can make a short cut with a little trick. Notice that 
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1 — cos 0 — 2 sin 2 (0/2), and that sin 0 = 2 sin (0/2) cos (0/2). Then Eq. 
(10.27) is equivalent to 


So one possible answer is 


at 

a 2 


cos ^ e 


. 0 
sm 2 






(10.28) 


(10.29) 


since it fits with (10.28) and also makes 

|«i| 2 + \ a 2\ 2 ~ E 

As you know, multiplying both a x and a 2 by an arbitrary phase factor doesn’t 
change anything. People generally prefer to make Eqs. (10.29) more symmetric 
by multiplying both by e %4>!2 . So the form usually used is 

ai = cos^e-* 12 , a 2 = sin ~ e +i * r2 , (10.30) 

and this is the answer to our question. The numbers a x and a 2 are the amplitudes 
to find an electron with its spin up or down along the z-axis when we know that 
its spin is along the axis at 0 and </>. (The amplitudes C i and C 2 are just a x and 
a 2 times e~ lE r t/n .) 

Now we notice an interesting thing. The strength B of the magnetic field 
does not appear anywhere in (10.30). The result is clearly the same in the limit that 
B goes to zero. This means that we have answered in general the question of how 
to represent a particle whose spin is along an arbitrary axis. The amplitudes of 
(10.30) are the projection amplitudes for spin one-half particles corresponding to 
the projection amplitudes we gave in Chapter 5 [Eqs. (5.38)] for spin-one par- 
ticles. We can now find the amplitudes for filtered beams of spin one-half particles 
to go through any particular Stern-Gerlach filter. 

Let | +z) represent a state with spin up along the z-axis, and | — z> represent 
the spin down state. Tf | +z') represents a state with spin up along a z'-axis which 
makes the polar angles 0 and with the z-axis, then in the notation of Chapter 
5, we have 

<+z I +z'> = cos i e-* 12 , <-z I +z'> = sin ~ e +l>/2 . (10.31) 

These results are equivalent to what we found in Chapter 6, Eq. (6.36), by purely 
geometrical arguments. (So if you decided to skip Chapter 6, you now have the 
essential results anyway.) 

As* our final example lets look again at one which we’ve already mentioned a 
number of times. Suppose that we consider the following problem. We start 
with an electron whose spin is in some given direction, then turn on a magnetic 
field in the z-direction for 25 minutes, and then turn it off. What is the final state? 
Again let’s represent the state by the linear combination | \p) = | l)C x + | 2)C 2 . 
For this problem, however, the states of definite energy are also our base states 
| 1) and | 2). So C i and C 2 only vary in phase. We know that 

Cj(0 = Ci( 0)e~ iE J t}h = Ci( 0)e+ iftBtin 9 

and 

C 2 (0 = C 2 (0)e~ iE Ti tlfi - C 2 ( 0)e~ itlBtin . 

Now initially we said the electron spin was set in a given direction. That means 
that initially C : and C 2 are two numbers given by Eqs. (10.30). After we wait 
for a period of time 7/ the new C i and C 2 are the same two numbers multiplied 
respectively by e ttiBzTlfl and What state is that? That’s easy. It’s 

exactly the same as if the angle <j> had been changed by the subtraction of 2 jiB z T/h 
and the angle 0 had been left unchanged. That means that at the end of the time 
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T, the state | \p) represents an electron lined up in a direction which differs from 
the original direction only by a rotation about the z-axis through the angle A <f> = 

2 fxB z T/h. Since this angle is proportional to T, we can also say the direction of the 
spin processes at the angular velocity 2 pB z /h around the z-axis. This result we 
discussed several times previously in a less complete and rigorous manner. Now 
we have obtained a complete and accurate quantum mechanical description of 
the precession of atomic magnets. 

It is interesting that the mathematical ideas we have just gone over for the 
spinning electron in a magnetic field can be applied to any two-state system. 
That means that by making a mathematical analogy to the spinning electron, 
any problem about two-state systems can be solved by pure geometry. It works 
like this. First you shift the zero of energy so that (Hu + H 22 ) is equal to 
zero so that Hu — — H 22 - Then any two-state problem is formally the same 
as the electron in a magnetic field. All you have to do is identify —pB z with Hu 
and —p(B x — iB y ) with H i2 . No matter what the physics is originally — an 
ammonia molecule, or whatever — you can translate it into a corresponding 
electron problem. So if we can solve the electron problem in general , we have 
solved all two-state problems. 

And we have the general solution for the electron! Suppose you have some 
state to start with that has spin “up” in some direction, and you have a magnetic 
field B that points in some other direction. You just rotate the spin direction around 
the axis of B with the vector angular velocity w(/) equal to a constant times the 
vector B (namely co - 2 pB/h). As B varies with time, you keep moving the axis 
of the rotation to keep it parallel with B , and keep changing the speed of rotation 
so that it is always proportional to the strength of B. See Fig. 10-11. If you keep 
doing this, you will end up with a certain final orientation of the spin axis, and the 
amplitudes CT and C 2 are just given by the projections — using (10.30) — into your 
coordinate frame. You see, it’s just a geometric problem to keep track of where you 
end up after all the rotating. Although it’s easy to see what’s involved, this geo- 
metric problem (of finding the net result of a rotation with a varying angular 
velocity vector) is not easy to solve explicitly in the general case. Anyway, we see, 
in principle , the general solution to any two-state problem. In the next chapter 
we will look iome more into the mathematical techniques for handling the im- 
portant case of a spin one-half particle— and, therefore, for handling two-state 
systems in general. 



Fig. 10-1 1. The spin direction of an 
electron in a varying magnetic field B(f) 
precesses at the frequency w(f) about an 
axis parallel to B. 
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11 


More Two-State Systems 


11-1 The Pauli spin matrices 

We continue our discussion of two-state systems. At the end of the last 
chapter we were talking about a spin one-half particle in a magnetic field. We 
described the spin state by giving the amplitude C x that the z-component of spin 
angular momentum is -\-h/ 2 and the amplitude C 2 that it is —ft/ 2. In earlier 
chapters we have called these base states | +) and | -). We will now go back 
to that notation, although we may occasionally find it convenient to use | + ) or 
| 1), and | — ) or | 2), interchangeably. 

We saw in the last chapter that when a spin one-half particle with a magnetic 
moment pi is in a magnetic field B = (B x , B y , B z ), the amplitudes C+(=Cj) 
and C_(= C 2 ) are connected by the following differential equations: 


ih = -!i[B z C + + (B x - iB v )CJ\, 
ih = -ix[(B x + iB y )C+ - B Z C_}. 


( 11 . 1 ) 


In other words, the Hamiltonian matrix is 


H\\ — — H12 ~ x ^y)i 

H 21 = —p(B x + iBy)* H 22 — +m^2* 


And Eqs. (11.1) are, of course, the same as 

^ ^ = E ( 1L3 ) 

3 

where i and j take on the values + and — (or 1 and 2). 

The two-state system of the electron spin is so important that it is very useful 
to have a neater way of writing things. We will now make a little mathematical 
digression to show you how people usually write the equations of a two-state 
system. It is done this way: First, note that each term in the Hamiltonian is 
proportional to jx and to some component of B ; we can then— purely formally— 
write that 

Hij = + v V ijB y + <r? 3 SJ. (11.4) 

There is no new physics here ; this equation just means that the coefficients <y x ij, 
a v ij, and a i} — there are 4 X 3 = 12 of them— can be figured out so that (11.4) 
is identical with (11.2). 

Let’s see what they have to be. We start with B z . Since B z appears only in 
Hu and H 2 2, everything will be O.K. if 

(Tii = 1 , 0*12 ~ 0 , 

0"21 — 0, 022 = — 1. 

We often write the matrix Hij as a little table like this: 


Hij - 



H22) 


11-1 The Pauli spin matrices 

11-2 The spin matrices as operators 

11-3 The solution of the two-state 
equations 

11-4 The polarization states of the 
photon 

11-5 The neutral K-mesonf 

11-6 Generalization to TV-state 
systems 

Review: Chapter 35, Vol. I, Polariza- 
tion 


1 1-1 


| This section should be omitted on the 
first reading of this book. It is more ad- 
vanced than is appropriate in a first course. 



Table 11-1 


The Pauli spin matrices 



For the Hamiltonian of a spin one-half particle in the magnetic field B zt this is 
the same as 

, i~ 

~^(.B X tBy)\ 


\ — fx(B x + iB y ) 


J yJ +&B; 

In the same way, we can write the coefficients <r - y as the matrix 

r 3 

i 

< 7 * 


"'■t 


(11.5) 


Working with the coefficients of B x , we get that the terms of a x have to be 
0“ li - 0, a 12 = 1, 

<721 = 1 » CT22 = 0. 


Or, in shorthand, 


h::) 


( 11 . 6 ) 


Finally, looking at B y , we get 

^ii = 0, a\ 2 = -i, 

oil = i, 022 = 0; 


or 


4 ■ c ■;) 


(11.7) 


With these three sigma matrices, Eqs. (11.2) and (11.4) are identical. To leave 
room for the subscripts i and /, we have shown which <7 goes with which component 
of B by putting x, y, and z as superscripts. Usually, however, the i and j are omitted 
— it’s easy to imagine they are there — and the x, y, z are written as subscripts. 
Then Eq. (11.4) is written 


H = — j u[a x B x + (j y By + <t z B z \ (11.8) 

Because the sigma matrices are so important — they are used all the time by the 
professionals — we have gathered them together in Table 11-1. (Anyone who is 
going to work in quantum physics really has to memorize them.) They are also 
called the Pauli spin matrices after the physicist who invented them. 

In the table we have included one more two-by-two matrix which is needed if 
we want to be able to take care of a system which has two spin states of the same 
energy, or if we want to choose a different zero energy. For such situations 
we must add £ 0 C+ to the first equation in (11.1) and to the second equation. 
We can include this in the new notation if we define the unit matrix “1” as S*y, 


and rewrite Eq. (11.8) as 


1 = hi = 


1 0 
,0 l 


(11.9) 


H ~ Eo&ij — p(<r x B x + cr y B y + <7 Z B Z ). (11.10) 

Usually, it is understood that any constant like E 0 is automatically to be multiplied 
by the unit matrix ; then one writes simply 

H = E 0 - ii(<r x B x + <r y B y + <x z B z ). (11.11) 

One reason the spin matrices are useful is that any two-by-two matrix at all 
can be written in terms of them. Any matrix you can write has four numbers 
in it, say, 
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It can always be written as a linear combination of four matrices. For example, 



There are many ways of doing it, but one special way is to say that M is a certain 
amount of <r X} plus a certain amount of <r y , and so on, like this: 

M = al + j 3<t x -f- 7(7*2, -f- 80 ’g, 

where the “amounts” a, 0, 7 , and 8 may, in general, be complex numbers. 

Since any two-by-two matrix can be represented in terms of the unit matrix 
and the sigma matrices, we have all that we ever need for any two-state system. 
No matter what the two-state system— the ammonia molecule, the magenta dye, 
anything — the Hamiltonian equation can be written in terms of the sigmas. 
Although the sigmas seem to have a geometrical significance in the physical 
situation of an electron in a magnetic field, they can also be thought of as just 
useful matrices, which can be used for any two-state problem. 

For instance, in one way of looking at things a proton and a neutron can be 
thought of as the same particle in either of two states. We say the nucleon (proton 
or neutron) is a two-state system — in this case, two states with respect to its charge. 
When looked at that way, the | 1) state can represent the proton and the | 2) 
state can represent the neutron. People say that the nucleon has two “isotopic- 
spin” states. 

Since we will be using the sigma matrices as the “arithmetic” of the quantum 
mechanics of two-state systems, let's review quickly the conventions of matrix 
algebra. By the “sum” of any two or more matrices we mean just what was obvious 
in Eq. (1 1.4). In general, if we “add” two matrices A and B , the “sum” C means 
that each term Q, is given by 

Cij = -J- . 

Each term of C is the sum of the terms in the same slots of A and B. 

In Section 5-6 we have already encountered the idea of a matrix “product.” 
The same idea will be useful in dealing with the sigma matrices. In general, the 
“product” of two matrices A and B (in that order) is defined to be a matrix C 
whose elements are 

Cii = Yj a * b u- 01 . 12 ) 

k 

It is the sum of products of terms taken in pairs from the /th row of A and the kth 
column of B. If the matrices are written out in tabular form as in Fig. 11-1, there 
is a good “system” for getting the terms of the product matrix. Suppose you are 
calculating C 2 z . You run your left index finger along the second row of A and your 
right index finger down the third column of B , multiplying each pair and adding 
as you go. We have tried to indicate how to do it in the figure. 



B n 



B l4 





B 21 

B 22 


B 24 

B 

B 


B i 

ol 

32 



B u 

B 42 


b 44 



C 13 

C l 4 

vC XX \ 

V\ 23 \ 

\\W N 

c 24 

C 33 

C 34 

C 43 

C 44 


= 5 a ijV 


Example: ^ ^ + A 22 ^ B J3 + B^ 


Fig. 11-1. Multiplying two matrices. 
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Table 11-2 


Products of the spin matrices 



Gxffy — ~ 1&Z 

(T 2 y 

&z&x = &x&z = i&y 


It is, of course, particularly simple for two-by-two matrices. For instance, 
if we multiply a x times a x , we get 



which is just the unit matrix 1. Or, for another example, let’s work out <t x <t v : 



Referring to Table 11-1, you see that the product is just i times the matrix <r z . 
(Remember that a number times a matrix just multiplies each term of the matrix.) 
Since the products of the sigmas taken two at a time are important — as well as 
rather amusing — we have listed them all in Table 1 1-2. You can work them out as 
we have done for <r x and cr x cr y . 

There’s another very important and interesting point about these a matrices. 
We can imagine, if we wish, that the three matrices a x , a y , and <s z are analogous to 
the three components of a vector — it is sometimes called the “sigma vector” and 
is written a. It is really a “matrix vector” or a “vector matrix.” It is three different 
matrices — one matrix associated with each axis, *, y , and z. With it, we can write 
the Hamiltonian of the system in a nice form which works in any coordinate 
system: 

H — —fjLcr • B. (11.13) 

Although we have written our three matrices in the representation in which 
“up” and “down” are in the z-direction — so that a z has a particular simplicity — 
we could figure out what the matrices would look like in some other representation. 
Although it takes a lot of algebra, you can show that they change among themselves 
like the components of a vector. (We won’t, however, worry about proving it 
right now. You can check it if you want.) You can use <r in different coordinate 
systems as though it is a vector. 

You remember that the H is related to energy in quantum mechanics. It is, 
in fact, just equal to the energy in the simple situation where there is only one state. 
Even for two-state systems of the electron spin, when we write the Hamiltonian 
as in Eq. (11.13), it looks very much like the classical formula for the energy of a 
little magnet with magnetic moment fi in a magnetic field. B Classically, we would 
say 

U — —ft, • B, (11.14) 

where ju is the property of the object and B is an external field. We can imagine 
that Eq. (11.14) can be converted to (11.13) if we replace the classical energy by 
the Hamiltonian and the classical by the matrix ju a. Then, after this purely 
formal substitution, we interpret the result as a matrix equation. It is sometimes 
said that to each quantity in classical physics there corresponds a matrix in quantum 
mechanics. It is really more correct to say that the Hamiltonian matrix corre- 
sponds to the energy, and any quantity that can be defined via energy has a corre- 
sponding matrix. 

For example, the magnetic moment can be defined via energy by saying that 
the energy in an external field B is —fiB. This defines the magnetic moment 
vector Then we look at the formula for the Hamiltonian of a real (quantum) 
object in a magnetic field and try to identify whatever the matrices are that corre- 
spond to the various quantities in the classical formula. That’s the trick by which 
sometimes classical quantities have their quantum counterparts. 

You may try, if you want, to understand how a classical vector is equal to a 
matrix /x<r, and maybe you will discover something — but don’t break your head 
on it. That’s not the idea — they are not equal. Quantum mechanics is a different 
kind of a theory to represent the world. It just happens that there are certain 
correspondences which are hardly more than mnemonic devices — things to re- 
member with. That is, you remember Eq. (11.14) when you learn classical physics ; 
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then if you remember the correspondence & jut, you have a handle for re- 
membering Eq. (11.13). Of course, nature knows the quantum mechanics, and 
the classical mechanics is only an approximation; so there is no mystery in the 
fact that in classical mechanics there is some shadow of quantum mechanical laws — 
which are truly the ones underneath. To reconstruct the original object from the 
shadow is not possible in any direct way, but the shadow does help you to re- 
member what the object looks like. Equation (11.13) is the truth, and Eq. (11.14) 
is the shadow. Because we learn classical mechanics first, we would like to be 
able to get the quantum formula from it, but there is no sure-fire scheme for 
doing that. We must always go back to the real world and discover the correct 
quantum mechanical equations. When they come out looking like something in 
classical physics, we are in luck. 

If the warnings above seem repetitious and appear to you to be belaboring 
self-evident truths about the relation of classical physics to quantum physics, 
please excuse the conditioned reflexes of a professor who has usually taught 
quantum mechanics to students who hadn’t heard about Pauli spin matrices until 
they were in graduate school. Then they always seemed to be hoping that, somehow, 
quantum mechanics could be seen to follow as a logical consequence of classical 
mechanics which they had learned thoroughly years before. (Perhaps they wanted 
to avoid having to learn something new.) You have learned the classical formula, 
Eq. (11.14), only a few months ago — and then with warnings that it was inade- 
quate — so maybe you will not be so unwilling to take the quantum formula, 
Eq. (11.13), as the basic truth. 


11-2 The spin matrices as operators 

While we are on the subject of mathematical notation, we would like to de- 
scribe still another way of writing things — a way which is used very often because 
it is so compact. It follows directly from the notation introduced in Chapter 8. 
If we have a system in a state | which varies with time, we can — as we 
did in Eq. (8.31) — write the amplitude that the system would be in the state | /) 
at / + At as 


</ 1 *(/ + a/» = £ </ 1 u(j, t + ao i j)(j i m). 

3 

The matrix element (i | U(t, t + AO |y) is the amplitude that the base state |y) 
will be converted into the base state | i) in the time interval At, We then defined 
Hij by writing 

(i I U(t, t + AO | j) = - * Hjj(t) At, 

and we showed that the amplitudes Q(t) = (/ 1 ^(0) were related by the differ- 
ential equations 

iVl = Z H » C r (11-15) 

3 

If we write out the amplitudes Q explicitly, the same equation appears as 

= £ BiiUl*)- (11-16) 

3 

Now the matrix elements are also amplitudes which we can write as (i \H\j); 
our differential equation looks like this: 

ihj t H |*> = £ (i\H\ j)(j | $). ( 11 . 17 ) 

3 

We see that —i/h ( i \ H\j ) is the amplitude that — under the physical conditions 
described by H— a state | j) will, during the time dt , “generate” the state [ i), 
(All of this is implicit in the discussion of Section 8-4.) 
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Now following the ideas of Section 8-2, we can drop out the common term 
{/ | in Eq. (1 1.17) — since it is true for any state | i) — and write that equation simply as 

ih j t i*> = E h \m\+)- (1L18) 

3 

Or, going one step further, we can also remove they and write 

ihj t \*) = H\t). (11.19) 

In Chapter 8 we pointed out that when things are written this way, the H in 
H\j) or H | \p) is called an operator. From now on we will put the little hat 
O over an operator to remind you that it is an operator and not just a number. 
We will write H\yp). Although the two equations (11.18) and (11.19) mean 
exactly the same thing as Eq. (11.17) or Eq. (11.15), we can think about them in a 
different way. For instance, we would describe Eq. (11.18) in this way: “The 
time derivative of the state vector [ yp) is equal to what you get by operating with 
the Hamiltonian operator H on each base state, multiplying by the amplitude 
(j | yp) that yp is in the state j, and summing over all j” Or Eq. (1 1.19) is described 
this way. “The time derivative (times ih) of a state | yp) is equal to what you get 
if you operate with the Hamiltonian H on the state vector | yp).** It’s just a short- 
hand way of saying what is in Eq. (11.17), but, as you will see, it can be a great 
convenience. 

If we wish, we can carry the “abstraction” idea one more step. Equation 

(11.19) is true for any state \ \p). Also the left-hand side, ihd/dt , is also an operator 
— it’s the operation “differentiate by t and multiply by ih .* 9 Therefore, Eq. (11.19) 
can also be thought of as an equation between operators — the operator equation 

The Hamiltonian operator (within a constant) produces the same result as does 
d/dt when acting on any state. Remember that this equation — as well as Eq. 

(11.19) — is not a statement that the H operator is just the identical operation as 
d/dt. The equations are the dynamical law of nature — the law of motion — for a 
quantum system. 

Just to get some practice with these ideas, we will show you another way we 
could get to Eq. (11.18). You know that we can write any state | yp) in terms of 
its projections into some base set [see Eq. (8.8)], 

i*> = £ I*'></|*>. ( 1L2 °) 

i 

How does | yp) change with time? Well, just take its derivative: 

i 

Now the base states [ i) do not change with time (at least we are always taking them 
as definite fixed states), but the amplitudes (i | yp) are numbers which may vary. 
So Eq. (11.21) becomes 

j t i*> = £ i *> is </ 1 *>• < n - 22 ) 

% 

Since we know d(i | \p)/dt from Eq. (11.16), we get 

|i*> = 

i 3 

lj 3 

This is Eq. (11.18) all over again. 
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So we have many ways of looking at the Hamiltonian. We can think of the 
set of coefficients Hij as just) a bunch of numbers, or we can think of the “ampli- 
tudes” (i | H\j), or we can think of the “matrix” H ih or we can think of the 
operator” H. It all means the same thing. 

Now let’s go back to our two-state systems. If we write the Hamiltonian in 
terms of the sigma matrices (with suitable numerical coefficients like B x , etc.), 
we can clearly also think of of, as an amplitude (i \a x \ j) or, for short, as the 
operator a x . If we use the operator idea, we can write the equation of motion of a 
state | \p) in a magnetic field as 

ih l | *> = -»(B X & X + Byfry + B s &,) I *). (1 1.23) 

When we want to “use” such an equation we will normally have to express | \p) 
in terms of base vectors (just as we have to find the components of space vectors 
when we want specific numbers). So we will usually want to put Eq. (11.23) in 
the somewhat expanded form: 

ih Jt 1 ^ = E (*«*» + B « & « + I M 1 1)- (1 1.24) 

i 

Now you will see why the operator idea is so neat. To use Eq. (11.24) we 
need to know what happens when the & operators work on each of the base states. 
Let’s find out. Suppose we have & z | +); it is some vector | ?), but what? Well, 
let’s multiply it on the left by (+ | ; we have 

(+ I 3* I + > = 0Ti = 1 

(using Table 11-1). So we know that 

<+!?)=!• (11.25) 

Now let’s multiply & z \ +) on the left by |. We get 

(— | &z | +) = cr 2 i = 0; 
so 

<-!?> = 0. (11.26) 

There is only one state vector that satisfies both (11.25) and (11.26); it is | +). 
We discover then that 

**!+> = I +)• (11.27) 

By this kind of argument you can easily show that all of the properties of the sigma 
matrices can be described in the operator notation by the set of rules given in 
Table 11-3. 

If we have products of sigma matrices, they go over into products of operators. 
When two operators appear together as a product, you carry out first the operation 
with the operator which is farthest to the right. For instance, by & x & y | +> we 
are to understand & x (& y \ +)). From Table 11-3, we get & y | +) = i | — ), so 

+> = &x(i I -)). (11.28) 

Now any number — like 7— just moves through an operator (operators work only 
on state vectors); so Eq. (11.28) is the same as 


Table 11-3 


Properties of the a-operator 


<T Z | 

+)=[+) 

<*z 1 

-)=-!-) 

<Tx | 

+> = !-> 

<*x | 

-> = l+> 

1 

+) = /| -) 

<*v i 

-) = -i\+) 


| +) - i& x I -) - i I +). 


If you do the same thing for & x & v \ — ), you will find that 

| ) = 1 [ )• 

Looking at Table 11-3, you see that & x & y operating on | +) or | — ) gives just 
what you get if you operate with & z and multiply by — We can, therefore, say 
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that the operation & x & y is identical with the operation i& z , and write this statement 
as an operator equation : 

& x & y = ia z . (11.29) 

Notice that this equation is identical with one of our matrix equations of Table 
1 1-2. So again we see the correspondence between the matrix and operator points 
of view. Each of the equations in Table 1 1-2 can, therefore, also be considered 
as equations about the sigma operators. You can check that they do indeed 
follow from Table 11-3. It is best, when working with these things, not to keep 
track of whether a quantity likecr or H is an operator or a matrix. All the equations 
are the same either way, so Table 1 1-2 is for sigma operators, or for sigma matrices, 
as you wish. 


11-3 The solution of the two-state equations 

We can now write our two-state equation in various forms, for example, 
either as 

or 5 (11.30) 


They both mean the same thing. For a spin one-half particle in a magnetic field, 
the Hamiltonian H is given by Eq. (11.8) or by Eq. (11.13). 

If the field is in the 2 -direction, then— as we have seen several times by now— 
the solution is that the state | ^), whatever it is, precesses around the z-axis (just 
as if you were to take the physical object and rotate it bodily around the z-axis) 
at an angular velocity equal to twice the magnetic field times fi/h. The same is 
true, of course, for a magnetic field along any other direction, because the physics 
is independent of the coordinate system. If we have a situation where the magnetic 
field varies from time to time in a complicated way, then we can analyze the situa- 
tion in the following way. Suppose you start with the spin in the -f z-direction 
and you have an x-magnetic field. The spin starts to turn. Then if the x-field is 
turned off, the spin stops turning. Now if a z-field is turned on, the spin precesses 
about z, and so on. So depending on how the fields vary in time, you can figure 
out what the final state is — along what axis it will point. Then you can refer that 
state back to the original | +) and | -) with respect to z by using the projection 
formulas we had in Chapter 10 (or Chapter 6). If the state ends up with its 
spin in the direction (0, <£), it will have an up-amplitude cos (0/ 2)e~ l4>12 and a 
down-amplitude sin (0/2)c +i0/2 . That solves any problem. It is a word description 
of the solution of the differential equations. 

The solution just described is sufficiently general to take care of any mo-state 
system. Let’s take our example of the ammonia molecule — including the effects of 
an electric field. If we describe the system in terms of the states 1 I) and | II), the 
equations look like this: 


ih —jj- — -\-ACi + fi&Cii, 
ih dC* 1 = —ACu + fx&Cj. 


(11.31) 


You say, “No, I remember there was an E 0 in there.” Well, we have shifted the 
origin of energy to make the E 0 zero. (You can always do that by changing both 
amplitudes by the same factor— e iE o T/n ~ and get rid of any constant energy.) 
Now if corresponding equations always have the same solutions, then we really 
don’t have to do it twice. If we look at these equations and look at Eq. (11.1), 
then we can make the following identification. Let’s call 1 1) the state | + } and 
| II) the state | — ). That does not mean that we are lining-up the ammonia in space, 
or that | +) and | — ) has anything to do with the z-axis. It is purely artificial. 
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We have an artificial space that we might “call the ammonia molecule repre- 
sentative space,” or something — a three-dimensional “diagram” in which being 
“up” corresponds to having the molecule in the state 1 1) and being “down” 
along this false z-axis represents having a molecule in the state | II). Then, the 
equations will be identified as follows. First of all, you see that the Hamiltonian 
can be written in terms of the sigma matrices as 

H = +A<t z + ixZ<r x . (11.32) 

Or, putting it another way, p B z in Eq. (11.1) corresponds to —A in Eq. (11.32), 
and (xB x corresponds to — /x8. In our “model” space, then, we have a constant B 
field along the z-direction. If we have an electric field 8 which is changing with 
time, then we have a B field along the x-direction which varies in proportion. 
So the behavior of an electron in a magnetic field with a constant component in the 
z-direction and an oscillating component in the x-direction is mathematically analo- 
gous and corresponds exactly to the behavior of an ammonia molecule in an oscillating 
electric field. Unfortunately, we do not have the time to go any further into the 
details of this correspondence, or to work out any of the technical details. We 
only wished to make the point that all systems of two states can be made analogous 
to a spin one-half object precessing in a magnetic field. 


11-4 The polarization states of the photon 

There are a number of other two-state systems which are interesting to study, 
and the first new one we would like to talk about is the photon. To describe a 
photon we must first give its vector momentum. For a free photon, the frequency 
is determined by the momentum, so we don’t have to say also what the frequency 
is. After that, though, we still have a property called the polarization. Imagine 
that there is a photon coming at you with a definite monochromatic frequency 
(which will be kept the same throughout all this discussion so that we don’t have 
a variety of momentum states). Then there are two directions of polarization. 
In the classical theory, light can be described as having an electric field which 
oscillates horizontally or an electric field which oscillates vertically (for instance) ; 
these two kinds of light are called x-polarized and ^-polarized light. The light can 
also be polarized in some other direction, which can be made up from the super- 
position of a field in the x-direction and one in the jy-direction. Or if you take 
the x- and the ^-components out of phase by 90°, you get an electric field that 
rotates— the light is elliptically polarizejd. (This is just a quick reminder of the 
classical theory of polarized light that we studied in Chapter 35, Vol. I.) 

Now, however, suppose we have a single photon— just one. There is no electric 
field that we can discuss in the same way. All we have is one photon. But a photon 
has to have the analog of the classical phenomena of polarization. There must be 
at least two different kinds of photons. At first, you might think there should be 
an infinite variety— after all, the electric vector can point in all sorts of directions. 
We can, however, describe the polarization of a photon as a two-state system. 
A photon can be in the state | x) or in the state | y). By | x) we mean the polariza- 
tion state of each one of the photons in a beam of light which classically is x-polar- 
ized light. On the other hand, by | y) we mean the polarization state of each of the 
photons in a .y-polarized beam. And we can take | x) and [ y) as our base states 
of a photon of given momentum pointing at you — in what we will call the z-direc- 
tion. So there are two base states | x) and | y ), and they are all that are needed 
to describe any photon at all. 

For example, if we have a piece of polaroid set with its axis to pass light polar- 
ized in what we call the x-direction, and we send in a photon which we know is in 
the state | y), it will be absorbed by the polaroid. If we send in a photon which we 
know is in the state | x), it will come right through as | x). If we take a piece of 
calcite which takes a beam of polarized light and splits it into an | x) beam and a 
| y) beam, that piece of calcite is the complete analog of a Stern-Gerlach apparatus 
which splits a beam of silver atoms into the two states | +) and | — ). So every- 
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Fig. 11-2. Coordinates at right 
angles to the momentum vector of the 
photon. 


thing we did before with particles and Stern-Gerlach apparatuses, we can do 
again with light and pieces of calcite. And what about light filtered through a 
piece of polaroid set at an angle 0? Is that another state? Yes, indeed, it is another 
state. Let’s call the axis of the polaroid x f to distinguish it from the axes of our 
base states. See Fig. 11-2. A photon that comes out will be in the state | x'). 
However, any state can be represented as a linear combination of base states, and 
the formula for the combination is, here, 

| x*) = cos 9 | x) + sin 9 \ y). (11.33) 

That is, if a photon comes through a piece of polaroid set at the angle 9 (with 
respect to x), it can still be resolved into | x) and | y) beams — by a piece of calcite, 
for example. Or you can, if you wish, just analyze it into x- and y-components in 
your imagination. Either way, you will find the amplitude cos 9 to be in the | x) 
state and the amplitude sin 9 to be in the | y) state. 

Now we ask this question: Suppose a photon is polarized in the ^-direction 
by a piece of polaroid set at the angle 9 and arrives at a polaroid at the angle zero — 
as in Fig. 11-3; what will happen? With what probability will it get through? 
The answer is the following. After it gets through the first polaroid, it is definitely 
in the state | x f ). The second polaroid will let the photon through if it is in the 
state | x) (but absorb it if it is the state | j)). So we are asking with what probability 
does the photon appear to be in the state | x)? We get that probability from the 
absolute square of amplitude (x \ x') that a photon in the state | x') is also in 
the state | x). What is (x \ x')? Just multiply Eq. (11.33) by (x | to get 


{x | x f ) = cos 9 (x | x) + sin 9 (x \ y). 


Now (x | y) = 0, from the physics — as they must be if | x) and | y) are base states 
— and (x\ x) = 1 . So we get 

(x | x') = cos 9 , 

and the probability is cos 2 9. For example, if the first polaroid is set at 30°, a 
photon will get through 3/4 of the time, and 1/4 of the time it will heat the polaroid 
by being absorbed therein. 



Now let us see what happens classically in the same situation. We would have 
a beam of light with an electric field which is varying in some way or another — say 
“unpolarized.” After it gets through the first polaroid, the electric field is oscillat- 
ing in the x'-direction with a size 8; we would draw the field as an oscillating 
vector with a peak value 8 0 in a diagram like Fig. 11-4. Now when the light 
arrives at the second polaroid, only the x-component, 8 0 cos 0, of the electric 
field gets through. The intensity is proportional to the square of the field and, 
therefore, to S 2 cos 2 9. So the energy coming through is cos 2 9 weaker than the 
energy which was entering the last polaroid. 
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The classical picture and the quantum picture give similar results. If you 
were to throw 10 billion photons at the second polar oid, and the average prob- 
ability of each one going through is, say, 3/4, you would expect 3/4 of 10 billion 
would get through. Likewise, the energy that they would carry would be 3/4 y 

of the energy that you attempted to put through. The classical theory says nothing 
about the statistics of the thing — it simply says that the energy that comes through 
will be precisely 3/4 of the energy which you were sending in. That is, of course, 

impossible if there is only one photon. There is no such thing as 3/4 of a photon. 

It is either all there, or it isn’t there at all. Quantum mechanics tells us it is all 
there 3/4 of the time. The relation of the two theories is clear. 

What about the other kinds of polarization? For example, right-hand 
circular polarization? In the classical theory, right-hand circular polarization 
has equal components in x and y which are 90° out of phase. In the quantum 
theory, a right-hand circularly polarized (RHC) photon has equal amplitudes to 
be polarized | x) or | y ), and the amplitudes are 90° out of phase. Calling a RHC Fig. 1 1-4. The classical picture of 
photon a state | R) and a LHC photon a state [ L), we can write (see Vol. I, Section the electric vector 6. 
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l*> = -t=(I*> + MJ')). 

a/2 

I L) = l — (| x) - i | .y)). 


(11.34) 


—the 1 /y/2 is put in to get normalized states. With these states you can calculate 
any filtering or interference effects you want, using the laws of quantum theory. 
If you want, you can also choose | R) and | L) as base states and represent every- 
thing in terms of them. You only need to show first that (R \ L) = 0 — which you 
can do by taking the conjugate form of the first equation above [see Eq. (8.13)] and 
multiplying it by the other. You can resolve light into x- and y-polarizations, or 
into x'- and /-polarizations, or into right and left polarizations as a basis. 

Just as an example, let’s try to turn our formulas around. Can we represent 
the state | x) as a linear combination of right and left? Yes, here it is: 


I X) = -L (I *) + I L)\ 
\/2 

\y)= — ‘-(\R)-\ L)). 


(11.35) 


Proof: Add and subtract the two equations in (11.34). It is easy to go from 
one base to the other. 

One curious point has to be made, though. If a photon is right circularly 
polarized, it shouldn’t have anything to do with the x- and y-axes. If we were 
to look at the same thing from a coordinate system turned at some angle about 
the direction of flight, the light would still be right circularly polarized — and simi- 
larly for left. The right and left circularly polarized light are the same for any such 
rotation; the definition is independent of any choice of the x-direction (except 
that the photon direction is given). Isn’t that nice — it doesn’t take any axes to 
define it. Much better than x and y. On the other hand, isn’t it rather a miracle 
that when you add the right and left together you can find out which direction x 
was? If “right” and “left” do not depend on x in any way, how is it that we can 
put them back together again and get x? We can answer that question in part 
by writing out the state | R')> which represents a photon RHC polarized in the 
frame x', /. In that frame, you would write 

I R') = /= (I *'> + i I /»• 
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How does such a state look in the frame x, y? Just substitute x f from Eq.(l 1. 33) 
and the corresponding | /) — we didn’t write it down, but it is (—sin 6) \ x) + 
(cos 0) | y). Then 

| R f ) = — [cos 0 | jc) + sin 0 \ y) — i sin 0 \ x) + / cos 0 \ y)] 

V2 

= — [(cos 0 — i sin 0) \ x) + /(cos 0 — i sin 0 ) | y)] 

V2 

= — (| x) + i | >>))(cos 0 — i sin 0). 

\/2 

The first term is just | JR), and the second is e +ld ; our result is that 

| R') = e~ ie \R ). (11.36) 

The states | R r ) and | R ) are the same except for the phase factor e~ ie . If you work 
out the same thing for | L'), you get thatf 

| U) = e +ie | L). (11.37) 

Now you see what happens. If we add | R) and | T), we get something different 
from what we get when we add | JR') and | Z/). For instance, an x-polarized photon 
is [Eq. (1 1.35)] the sum of | R) and | L), but a j>-polarized photon is the sum with 
the phase of one shifted 90° backward and the other 90° forward. That is just 
what we would get from the sum of | R f ) and | U) for the special angle 9 = 90°, 
and that’s right. An jt-polarization in the prime frame is the same as a ^-polariza- 
tion in the original frame. So it is not exactly true that a circularly polarized 
photon looks the same for any set of axes. Its phase (the phase relation of the 
right and left circularly polarized states) keeps track of the ^-direction. 


11-5 The neutral K-mesonJ 

We will now describe a two-state system in the world of the strange particles— 
a system for which quantum mechanics gives a most remarkable prediction. To 
describe it completely would involve us in a lot of stuff about strange particles, 
so we will, unfortunately, have to cut some corners. We can only give an outline 
of how a certain discovery was made — to show you the kind of reasoning that was 
involved. It begins with the discovery by Gell-Mann and Nishijima of the concept 
of strangeness and of a new law of conservation of strangeness . It was when Gell- 
Mann and Pais were analyzing the consequences of these new ideas that they came 
across the prediction of a most remarkable phenomenon we are going to describe. 
First, though, we have to tell you a little about “ ‘strangeness.” 

We must begin with what are called the strong interactions of nuclear particles. 
These are the interactions which are responsible for the strong nuclear forces— 
as distinct, for instance, from the relatively weaker electromagnetic interactions. 
The interactions are “strong” in the sense that if two particles get close enough 
to interact at all, they interact in a big way and produce other particles very easily. 


f It's similar to what we found (in Chapter 6) for a spin one-half particle when we 
rotated the coordinates about the z-axis — then we got the phase factors It is, in 

fact, exactly what we wrote down in Section 5-7 for the | +) and | — ) states of a spin-one 
particle — which is no coincidence. The photon is a spin-one particle which has, however, 
no “zero” state. 

X We now feel that the material of this section is longer and harder than is appropriate 
at this point in our development. We suggest that you skip it and continue with Section 
11-6. If you are ambitious and have time you may wish to come back to it later. We 
leave it here, because it is a beautiful example— taken from recent work in high-energy 
physics — of what can be done with our formulation of the quantum mechanics of two- 
state systems. 
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The nuclear particles have also what is called a “weak interaction*’ by which cer- 
tain things can happen, such as beta decay, but always very slowly on a nuclear 
time scale the weak interactions are many, many orders of magnitude weaker 
than the strong interactions and even much weaker than electromagnetic inter- 
actions. 

When the strong interactions were being studied with the big accelerators, 
people were surprised to find that certain things that “should” happen— that were 
expected to happen did not occur. For instance, in some interactions a particle 
of a certain type did not appear when it was expected. Gell-Mann and Nishijima 
noticed that many of these peculiar happenings could be explained at once by 
inventing a new conservation law: the conservation of strangeness . They proposed 
that there was a new kind of attribute associated with each particle — which they 
called its strangeness number and that in any strong interaction the “quantity 
of strangeness” is conserved. 

Suppose, for instance, that a high-energy negative K-meson — with, say, an 
energy of many Bev collides with a proton. Out of the interaction may come 
many other particles: 7r-mesons, K-mesons, lambda particles, sigma particles— 
any of the mesons or baryons listed in Table 2-2 of Vol. I. It is observed, however, 
that only certain combinations appear, and never others. Now certain conservation 
laws were already known to apply. First, energy and momentum are always 
conserved. The total energy and momentum after an event must be the same as 
before the event. Second, there is the conservation of electric charge which says 
that the total charge of the outgoing particles must be equal to the total charge 
carried by the original particles. In our example of a K-meson and a proton 
coming together, the following reactions do occur: 


K + p — > p + K +7T + +7T +7 r° 

or 

+ p — > 2— + 7 r + 

We would never get: 


(11.38) 


K + p — » p + K + 7T+ or K- + p — > Ao + tt + . (11.39) 

because of the conservation of charge. It was also known that the number of 
baryons is conserved. The number of baryons out must be equal to the number 
of baryons in. For this law, an antiparticle of a baryon is counted as minus one 
baryon. This means that we can— and do — see 


or 


K. + P — ► A 0 + 7T° 

K — + P — » p + K~ + p + p 


(11.40) 


(where p is the antiproton, which carries a negative charge). But we never see 


K- + p -» K- + TT+ + 7T° 

or (11.41) 

K- + p-> p + K- + n 

(even when there is plenty of energy), because baryons would not be conserved. 

These laws, however, do not explain the strange fact that the following re- 
actions— which do not immediately appear to be especially different from some of 
those in (1 1 .38) or (1 1 .40) — are also never observed : 


+ p -» p + K~ + K° 
or 

K “ + P -» P + TT~ (11.42) 

or 

K~ + p — ♦ A 0 + K°. 

The explanation is the conservation of strangeness. With each particle goes a 
number its strangeness S and there is a law that in any strong interaction, the 
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Table 11-4 

The strangeness numbers of the strongly interacting particles 



5 

-2 -1 0 +1 

Baryons 

2 + P 

E° A 0 , 2° n 

3~ 2“ 

Mesons 

ir+ K+ 

K" 7T° K° 

K~ TT~ 


Note : The 7 r~ is the antiparticle of the 7 r + (or vice versa). 


total strangeness out must equal the total strangeness that went in. The proton and 
antiproton (p, p), the neutron and antineutron (n, n), and the 7r-mesons (7r + , 7 r°, 
7 r ) all have the strangeness number zero ; the K + and K° mesons have strangeness 
+ 1; the K“ and K° (the anti-K°),t the A 0 and the 2-particles (+, 0, — ) have 
strangeness —1. There is also a particle with strangeness —2 — the E-particle 
(capital “ksi”) — and perhaps others as yet unknown. We have made a list of these 
strangenesses in Table 11-4. 

Let’s see how the strangeness conservation works in some of the reactions we 
have written down. If we start with a K — and a proton, we have a total strangeness 
of (— 1 + 0) = — 1. The conservation of strangeness says that the strangeness 
of products after the reaction must also add up to — 1 . You see that that is so for 
the reactions of (1 1 .38) and (11 .40). But in the reactions of (1 1 .42) the strangeness 
of the right-hand side is zero in each case. Such reactions do not conserve strange- 
ness, and do not occur. Why? Nobody knows. Nobody knows any more than 
what we have just told you about this. Nature just works that way. 

Now let’s look at the following reaction: a 7r“ hits a proton. You might, 
for instance, get a A 0 particle plus a neutral K-particle— two neutral particles. 
Now which neutral K do you get? Since the A-particle has a strangeness — 1 and 
the 7 r and p + have a strangeness zero, and since this is a fast production reaction, 
the strangeness must not change. The K-particle must have strangeness +1 — it 
must therefore be the K°. The reaction is 

tt“ + p -► A° + K°, 

with 

5 = 0 + 0= — 1 + +1 (conserved). 

If the K° were there instead of the K°, the strangeness on the right would be —2 
— which nature does not permit, since the strangeness on the left side is zero. 
On the other hand, a K° can be produced in other reactions, such as 

n + n— >-n + p + K 0 + K + , 

5=0 + 0 = 0 + 0++l + -l 
or _ 

K" + p -> n + K°, 

S = —1+0 = 0+ —1. 

You may be thinking, “That’s all a lot of stuff, because how do you know 
whether it is a K° or a K°? They look exactly the same. They are antiparticles of 
each other, so they have exactly the same mass, and both have zero electric charge. 


t Read as: “K-naught-bar,” or “K-zero-bar.” 
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Fig. 1 1-5. High-energy events as seen in a hydrogen bubble chamber, (a) A it meson interacts 
with a hydrogen nucleus (proton) producing a A 0 particle and a K° meson. Both particles decay in 
the chamber, (b) A K° meson interacts with a proton producing a 7 r + meson and a A 0 particle 
which then decays. (The neutral particles leave no tracks. Their inferred trajectories are indicated 
here by light dashed lines.) 

How do you distinguish them?” By the reactions they produce. For example, 
a K° can interact with matter to produce a A-particle, like this: 

K° + p A° + 7T+ 

but a K° cannot. There is no way a K° can produce a A-particle when it interacts 
with ordinary matter (protons and neutrons).! So the experimental distinction 
between the K° and the K° would be that one of them will and one of them will 
not produce A’s. 

One of the predictions of the strangeness theory is then this — if, in an experi- 
ment with high-energy pions, a A-particle is produced with a neutral K-meson, 
then that neutral K-meson going into other pieces of matter will never produce a A. 

The experiment might run something like this. You send a beam of 7r — -mesons 
into a large hydrogen bubble chamber. A 7r” track disappears, but somewhere 
else a pair of tracks appear (a proton and a 7r”) indicating that a A-particle has 
disintegrated^ — see Fig. 1 1-5. Then you know that there is a K° somewhere which 
you cannot see. 

You can, however, figure out where it is going by using the conservation 
of momentum and energy. [It could reveal itself later by disintegrating into two 
charged particles, as shown in Fig. 1 1— 5(a).] As the K° goes flying along, it may 
interact with one of the hydrogen nuclei (protons), producing perhaps some other 
particles. The prediction of the strangeness theory is that it will never produce a 
A-particle in a simple reaction like, say, 

K° + p A 0 + 7T°, 

although a K° can do just that. That is, in a bubble chamber a K° might produce 
the event sketched in Fig. 1 1— 5(b) — in which the A 0 is seen because it decays — but 
a K° will not. That’s the first part of our story. That’s the conservation of strange- 
ness. 

The conservation of strangeness is, however, not perfect . There are very slow 
disintegrations of the strange particles — decays taking a longlf time like 10” 10 
second in which the strangeness is not conserved. These are called the “weak” 
decays. For example, the K° disintegrates into a pair of 7r-mesons (+ and — ) 


t Except, of course, if it also produces two K +, s or other particles with a total strange- 
ness of +2. We can think here of reactions in which there is insufficient energy to produce 
these additional strange particles. 

I The free A-particle decays slowly via a weak interaction (so strangeness need not be 
conserved). The decay products are either a p and a tt“ or an n and a tt°. The lifetime 
is 2.2 X 10~ 10 sec. 

If A typical time for strong interactions is more like 10 -23 sec. 
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with a lifetime of lO"* 10 second. That was, in fact, the way K-particles were 
first seen. Notice that the decay reaction 

K° -* 7T + + 7T~ 

does not conserve strangeness, so it cannot go “fast” by the strong interaction; 
it can only go through the weak decay process. 

Now the K° also disintegrates in the same way — into a 7 r + and a 7r“ — and 
also with the same lifetime 

K° — » 7T” + 7T + . 


Again we have a weak decay because it does not conserve strangeness. There is a 
principle that for any reaction there is the corresponding reaction with “matter” 
replaced by “antimatter” and vice versa. Since the K° is the antiparticle of the 
K°, it should decay into the antiparticles of the x + and 7r~, but the antiparticle 
of a 7 r + is the 7 r _ . (Or, if you prefer, vice versa. It turns out that for the 7r-mesons 
it doesn’t matter which one you call “matter.”) So as a consequence of the weak 
decays, the K° and K° can go into the same final products. When “seen” through 
their decays — as in a bubble chamber— they look like the same particle. Only 
their strong interactions are different. 

At last we are ready to describe the work of Gell-Mann and Pais. They 
first noticed that since the K° and the K° can both turn intestates of two 7 r-mesons 
there must be some amplitude that a K° can turn into a K°, and also that a K° 
can turn into a K°. Writing the reactions as one does in chemistry, we would have 

K 0 ±^tt- + tt + ±+K 0 . (11.43) 

These reactions imply that there is some amplitude per unit time, say —i/h times 
(K° | W | K°), that a K° will turn into a K° through the weak interaction re- 
sponsible for the decay into two 7r-mesons. And there is the corresponding 
amplitude (K° | W | K°) for the reverse process. Because matter and antimatter 
behave in exactly the same way, these two amplitudes are numerically equal; 
we’ll call them both A : 


<K°| W|K°) = <K°| W|K°) = A. (11.44) 

Now — said Gell-Mann and Pais — here is an interesting situation. What 
people have been calling two distinct states of the world — the K° and the K° — 
should really be considered as one two-state system, because there is an amplitude 
to go from one state to the other. For a complete treatment, one would, of course, 
have to deal with more than two states, because there are also the states of 27r’s, 
and so on; but since they were mainly interested in the relation of K° and K°, 
they did not have to complicate things and could make the approximation of a 
two-state system. The other states were taken into account to the extent that their 
effects appeared implicitly in the amplitudes of Eq. (11.44). 

Accordingly, Gell-Mann and Pais analyzed the neutral particle as a two- 
state system. They began by choosing as their two base states the states | K°) and 
| K°). (From here on, the story goes very much as it did for the ammonia mole- 
cule.) Any state | \p) of the neutral K-particle could then be described by giving 
the amplitudes that it was in either base state. We’ll call these amplitudes 

C + = <K° I i), C_ = <K° I *>. (11.45) 


The next step was to write the Hamiltonian equations for this two-state 
system. If there were no coupling between the K° and the K°, the equations 
would be simply 


ih^=E 0 C + , 

dC„ j., „ 
th ^ — EqC — . 


(11.46) 
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But since there is the amplitude (K° | W | K°) for the K° to turn into a K° there 
should be the additional term 

<K° | W | K°)C_ = AC_ 

added to the right-hand side of the first equation. And similarly, the term AC + 
should be inserted in the equation for the rate of change of C_. 

But that’s not all. When the two-pion effect is taken into account there is an 
additional amplitude for the K° to turn into itself through the process 

K° — > W~ + 7T + — > K°. 

The additional amplitude, which we would write (K° | W | K°), is just equal to 
the amplitude <K° | W | K°), since the amplitudes to go to and from a pair of 
ir-mesons are identical for the K° and the K°. If you wish, the argument can be 
written out in detail like this. First writer 

<K° | W | K°> = <K° | W | 2ir><2ir | W | K°) 
and 

<K° | W | K°> = (K° | W | 2x)(2tt | W | K°>. 

Because of the symmetry of matter and antimatter 

<2tt | W | K°> = (2tt | W | K°), 

and also 

<K° | W | 2tt) = <K° | W | 2tt). 


It then follows that <K° | W | K°> = <K° | W | K°), and also that <K° j W | K°) = 
(K° | W | K°), as we said earlier. Anyway, there are the two additional ampli- 
tudes <K° | W | K°> and <K° | W | K°>, both equal to A, which should be included 
in the Hamiltonian equations. The first gives a term AC + on the right-hand side 
of the equation for dC+/dt , and the second gives a new term AC_ in the equation 
for dC_/dt . Reasoning this way, Gell-Mann and Pais concluded that the Hamil- 
tonian equations for the K° K° system should be 


ih = E 0 C+ + AC_ + AC + , 
dt 

ih = £ 0 C_ + AC + + AC_. 


(11.47) 


We must now correct something we have said in earlier chapters: that two 
amplitudes like (K° | W | K°) and <K° | W | K°) which are the reverse of each 
other, are always complex conjugates. That was true when we were talking about 
particles that did not decay. But if particles can decay — and can, therefore, 
become “lost”— the two amplitudes are not necessarily complex conjugates. So 
the equality of (11.44) does not mean that the amplitudes are real numbers; they 
are in fact complex numbers. The coefficient A is, therefore, complex; and we 
can’t just incorporate it into the energy E 0 . 

Having played often with electron spins and such, our heroes knew that the 
Hamiltonian equations of (11.47) meant that there was another pair of base states 
which could also be used to represent the K-particle system and which would have 
especially simple behaviors. They said, “Let’s take the sum and difference of these 
two equations. Also, let’s measure all our energies from E 0 , and use units for 


t We are making a simplification here. The 2ir-system can have many states corre- 
sponding to various momenta of the ir-mesons, and we should make the right-hand side 
of this equation into a sum over the various base states of the it’s. The complete treatment 
still leads to the same conclusions. 
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energy and time that make h — 1.” (That’s what modern theoretical physicists 
always do. It doesn’t change the physics but makes the equations take on a 
simple form.) Their result: 

i J t (C + + C_) = 2A(C + + C_). / J t (C+ - C_) = 0. (11.48) 

It is apparent that the combinations of amplitudes (C + + C_) and 
(C + — C_) act independently from each other (corresponding, of course, to 
the stationary states we have been studying earlier). So they concluded that it 
would be more convenient to use a different representation for the K-p article. 
They defined the two states 

I Kj) = -L (I K°> + | K°», I K 2 > = 4= (I K°> - I K°». (11.49) 

s/2 s/2 

They said that instead of thinking of the K° and K° mesons, we can equally well 
think in terms of the two “particles” (that is, “states”) AT X and K 2 . (These corre- 
spond, of course, to the states we have usually called 1 1) and | II). We are not 
using our old notation because we want now to follow the notation of the original 
authors — and the one you will see in physics seminars.) 

Now Gell-Mann and Pais didn’t do all this just to get different names for 
the particles— there is also some strange new physics in it. Suppose that C i and 
C 2 are the amplitudes that some state | will be either a K x or a K 2 meson: 


C x = <K X |*>, C 2 - <K a |*>. 

From the equations of (11.49), 

Cl = 4= (C+ + C_), c 2 = -±- (C+ - c_). 

s/2 s/2 


Then the Eqs. (1 1.48) become 


The solutions are 


dC i dCo . 

i ~i = 1AC - i ~dT = 0 - 


Ci(i) = C l (0)e~ i2At , C 2 (t ) = C 2 (0), 


(11.50) 


(11.51) 

(11.52) 


where, of course, C x (0) and C 2 (0) are the amplitudes at t = 0. 

These equations say that if a neutral K-particle starts out in the state | K x ) 
at t = 0 [then C x (0) = 1 and C 2 (0) — 0], the amplitudes at the time t are 

Ci(i) = e~ i2A \ C 2 (0 = 0. 

Remembering that A is a complex number, it is convenient to take A = 
a — i&. (Since the imaginary part of 2 A turns out to be negative, we write it as 
minus $.) With this substitution, C x (0 reads 

C x (0 = C x ( O)e~ 0t e~ iat . (11.53) 

The probability of finding a K x particle at t is the absolute square of this ampli- 
tude, which is e~ 2&t . And, from Eqs. (1 1.52), the probability of finding the K 2 state 
at any time is zero. That means that if you make a K-particle in the state | K x ), 
the probability of finding it in the same state decreases exponentially with time — 
but you will never find it in state | K 2 ). Where does it go? It disintegrates into two 
7 r-mesons with the mean life r = 1/20 which is, experimentally, 10“ 10 sec. We 
made provisions for that when we said that A was complex. 

On the other hand, Eq. (11.52) says that if we make a K-particle completely 
in the K 2 state, it stays that way forever. Well, that’s not really true. It is observed 
experimentally to disintegrate into three 7r-mesons, but 600 times slower than the 
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two-pi on decay we have described. So there are some other small terms we 
have left out in our approximation. But so long as we are considering only the 
two-pi on decay, the K 2 lasts “forever.” 

Now to finish the story of Gell-Mann and Pais. They went on to consider what 
happens when a K-particle is produced with a A 0 particle in a strong interaction. 
Since it must then have a strangeness of + 1 , it must be produced in the K° state. 
So at / = 0 it is neither a K 1 nor a K 2 but a mixture. The initial conditions are 

C+( 0) = 1, C_( 0) = 0. 

But that means — from Eq. (11.50) — that 

cm = -A , c 2 (0) = -T . 

V2 V 2 

and — from Eq. (11.51) — that 


/'“'f — icti 

Ci(t) = — e e 

V2 


c 2 (0 = 


V2 


(11.54) 


Now remember that Ki and K 2 are each linear combinations of K° and K°. 
In Eqs. (11.54) the amplitudes have been chosen so that at / = 0 the K° parts 
cancel each other out by interference, leaving only a K° state. But the | K i ) state 
changes with time , and the | K 2 ) state does not . After^ = 0 the interference of 
Ci and C 2 will give finite amplitudes for both K° and K°. 

What does all this mean? Let’s go back and think of the experiment we 
sketched in Fig. 11-5. A i r“ meson has produced a A 0 particle and a K° meson 
which is tooting along through the hydrogen in the chamber. As it goes along, 
there is some small but uniform chance that it will collide with a hydrogen nucleus. 
At first, we thought that strangeness conservation would prevent the K-particle 
from making a A 0 in such an interaction. Now, however, we see that that is not 
right. For although our K-particle starts out as a K° — which cannot make a 
A 0 — it does not stay this way. After a while, there is some amplitude that it will 
have flipped to the K° state. We can, therefore, sometimes expect to see a A 0 
produced along the K-particle track. The chance of this happening is given by 
the amplitude C_, which we can [by using Eq. (11.50) backwards] relate to C i 
and C 2 . The relation is 


C_ = -L (C, - c 2 ) = Ue-*‘e- iat ~ 1). (11 55) 

\/2 

As our K-particle goes along, the probability that it will “act like” a K° is equal 
to |C_| 2 , which is 

|C_| 2 = £(1 + e~ 2 ^ 1 — cos at). (11.56) 

A complicated and strange result! 

This, then, is the remarkable prediction of Gell-Mann and Pais: when a K° 
is produced, the chance that it will turn into a K°— as it can demonstrate by being 
able to produce a A 0 — varies with time according to Eq. (11.56). This prediction 
came from using only sheer logic and the basic principles of the quantum me- 
chanics— with no knowledge at all of the inner workings of the K-particle. Since 
nobody knows anything about the inner machinery, that is as far as Gell-Mann 
and Pais could go. They could not give any theoretical values for a and p. And 
nobody has been able to do so to this date. They were able to give a value of p 
obtained from the experimentally observed rate of decay into two 7r’s (2 p = 
10 10 sec), but they could say nothing about a. 

We have plotted the function of Eq. (11.56) for two values of a in Fig. 11-6. 
You can see that the form depends very much on the ratio of a to p. There is no 
K° probability at first; then it builds up. If a is large, the probability would have 
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large oscillations. If a is small, there will be little or no oscillation — the prob- 
ability will just rise smoothly to 1/4. 

Now, typically, the K-particle will be travelling at a constant speed near the 
speed of light. The curves of Fig. 1 1-6 then also represent the probability along 
the track of observing a K° — with typical distances of several centimeters. You 
can see why this prediction is so remarkably peculiar. You produce a single 
particle and instead of just disintegrating, it does something else. Sometimes it 
disintegrates, and other times it turns into a different kind of a particle. Its char- 
acteristic probability of producing an effect varies in a strange way as it goes 
along. There is nothing else quite like it in nature. And this most remarkable 
prediction was made solely by arguments about the interference of amplitudes. 



If there is any place where we have a chance to test the main principles of 
quantum mechanics in the purest way — does the superposition of amplitudes 
work or doesn’t it? — this is it. In spite of the fact that this effect has been pre- 
dicted now for several years, there is no experimental determination that is very 
clear. There are some rough results which indicate that the a is not zero, and that 
the effect really occurs — they indicate that a is between 2/3 and 4/3. That’s all there 
is, experimentally. It would be very beautiful to check out the curve exactly to see 
if the principle of superposition really still works in such a mysterious world as 
that of the strange particles — with unknown reasons for the decays, and unknown 
reasons for the strangeness. 

The analysis we have just described is very characteristic of the way quantum 
mechanics is being used today in the search for an understanding of the strange 
particles. All the complicated theories that you may hear about are no more and 
no less than this kind of elementary hocus-pocus using the principles of super- 
position and other principles of quantum mechanics of that level. Some people 
claim that they have theories by which it is possible to calculate the /3 and a, or 
at least the a given the 0, but these theories are completely useless. For instance, 
the theory that predicts the value of a, given the /3, tells us that the value of a 
should be infinite. The set of equations with which they originally start involves 
two 7 r-mesons and then goes from the two 7r’s back to a K°, and so on. When it’s 
all worked out, it does indeed produce a pair of equations like the ones we have 
here; but because there are an infinite number of states of two 7r’s, depending on 
their momenta, integrating over all the possibilities gives an a which is infinite. 
But nature’s a is not infinite. So the dynamical theories are wrong. It is really 
quite remarkable that the phenomena which can be predicted at all in the world 
of the strange particles come from the principles of quantum mechanics at the 
level at which you are learning them now. 
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11-6 Generalization to TV-state systems 


We have finished with all the two-state systems we wanted to talk about. 
In the following chapters we will go on to study systems with more states. The 
extension to TV-state systems of the ideas we have worked out for two states is 
pretty straightforward. It goes like this. 

If a system has TV distinct states, we can represent any state | ^(0) as a linear 
combination of any set of base states | /), where i = 1, 2, 3, . . . , TV; 

I V'(O) = £ IOCi(0. (11.57) 

all i 

The coefficients C t -(0 are the amplitudes (i | ^(0). The behavior of the amplitudes 
Ci with time is governed by the equations 

ih dC^t) = ^ H ^ (11.58) 

i 

where the energy matrix Hij describes the physics of the problem. It looks the 
same as for two states. Only now, both i and j must range over all TV base states, 
and the energy matrix — or, if you prefer, the Hamiltonian — is an TV by TV 
matrix with TV 2 numbers. As before, //*■ = H jt — so long as particles are conserved 
— and the diagonal elements Ha are real numbers. 

We have found a general solution for the C’s of a two-state system when the 
energy matrix is constant (doesn’t depend on /). It is also not difficult to solve 
Eq. (1 1.58) for an TV-state system when H is not time dependent. Again, we begin 
by looking for a possible solution in which the amplitudes all have the same time 
dependence. We try 


Q = 




(11.59) 


When these C t ’s are substituted into (11.58), the derivatives dCi(t)/dt become just 
( — i/h)ECi. Canceling the common exponential factor from all terms, we get 

Edi = H n a i- (11.60) 

3 

This is a set of TV linear algebraic equations for the TV unknowns a x , a 2 , . . . , a n , 
and there is a solution only if you are lucky — only if the determinant of the co- 
efficients of all the «’s is zero. But it’s not necessary to be that sophisticated; you 
can just start to solve the equations any way you want, and you will find that they 
can be solved only for certain values of E. (Remember that E is the only adjustable 
thing we have in the equations.) 

If you want to be formal, however, you can write Eq. (11.60) as 

E = 0- 01.61) 

3 

Then you can use the rule — if you know it — that these equations will have a solu- 
tion only for those values of E for which 

Det {H^ - 5 {j E) = 0. (11.62) 

Each term of the determinant is just except that E is subtracted from every 
diagonal element. That is, (11.62) means just 


Det 


H 


- E 

#12 

#13 

. \ 

21 

#22 ~ E 

#23 

. . 

31 

E 32 

fcl 

1 

CO 

CO 



= 0 . 


(11.63) 
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This is, of course, just a special way of writing an algebraic equation for E which 
is the sum of a bunch of products of all the terms taken a certain way. These 
products will give all the powers of E up to E N . 

So we have an Mh order polynomial equal to zero, and there are, in general, 
N roots. (We must remember, however, that some of them may be multiple 
roots — meaning that two or more roots are equal.) Let’s call the N roots 

Ely E Ui Em , . . . , E n , . . . , Z? N . (1 1.64) 

(We will use n to represent the nth Roman numeral, so that n takes on the values 
I, II, ... , N.) It may be that some of these energies are equal— say E n = Em — 
but we will still choose to call them by different names. 

The equations (1 1.60) — or (1 1.61) — have one solution for each value of E. If 
you put any one of the £”s — say E n — into (1 1.60) and solve for the a l} you get a 
set which belongs to the energy E n . We will call this set ^(n). 

Using these 0 ;(n) in Eq. (1 1.59), we have the amplitudes Q(n) that the definite 
energy states are in the base state | /). Letting | n) stand for the state vector of the 
definite energy state at t = 0, we can write 

C f ( n) = </ 1 n )e (ilh)E ^, 

with 

</|n> = a { ( n). (11.65) 

The complete definite energy state | ypjj)) can then be written as 

llM0> = J2 I *>i(n)e- (,7 * )£ "', 

i 

or 

I^ n (0) = \n)e m)E * 1 . (11.66) 

The state vectors | n) describe the configuration of the definite energy states, but 
have the time dependence factored out. Then they are constant vectors which 
can be used as a new base set if we wish. 

Each of the states | n) has the property — as you can easily show — that when 
operated on by the Hamiltonian operator H it gives just E n times the same state: 

H\n) = En | #i>. (11.67) 

The energy E n is, then, a number which is a characteristic of the Hamiltonian 
operator H. As we have seen, a Hamiltonian will, in general, have several char- 
acteristic energies. In the mathematician’s world these would be called the “char- 
acteristic values” of the matrix H i3 . Physicists usually call them the “eigenvalues” 
of H. (“Eigen” is the German word for “characteristic” or “proper.”) With 
each eigenvalue of ft— in other words, for each energy— there is the state of 
definite energy, which we have called the “stationary state.” Physicists usually 
call the states | n) “the eigenstates of Hr Each eigenstate corresponds to a par- 
ticular eigenvalue E n . 

Now, generally, the states | n>— of which there are A— can also be used as a 
base set. For this to be true, all of the states must be orthogonal, meaning that 
for any two of them, say | n) and | m), 

<n|m> = 0. (11.68) 

This will be true automatically if all the energies are different. Also, we can 
multiply all the ^(n) by a suitable factor so that all the states are normalized — by 
which we mean that 

(n|n) = 1 (11.69) 

for all n. 

When it happens that Eq. (11.63) accidentally has two (or more) roots with 
the same energy, there are some minor complications. First, there are still two 
different sets of cr/s which go with the two equal energies, but the states they give 
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may not be orthogonal. Suppose you go through the normal procedure and find 
two stationary states with equal energies — let’s call them | n) and | v). Then it 
will not necessarily be so that they are orthogonal — if you are unlucky, 

<M I v) * 0. 

It is, however, always true that you can cook up two new states, which we will 
call i n') and j v'), that have the same energies and are also orthogonal, so that 

(n’ | v') = 0. (11.70) 

You can do this by making | //} and | v') a suitable linear combination of | \i) 
and | v ), with the coefficients chosen to make it come out so that Eq. (11.70) is 
true. It is always convenient to do this. We will generally assume that this has 
been done so that we can always assume that our proper energy states | n) are 
all orthogonal. 

We would like, for fun, to prove that when two of the stationary states have 
different energies they are indeed orthogonal. For the state | n) with the energy 
E„, we have that 

| n> = £ n | n>. (11-71) 

This operator equation really means that there is an equation between numbers. 
Filling the missing parts, it means the same as 

E <;|£|7>o» = < 1L72 > 

i 

If we take the complex conjugate of this equation, we get 

E (1L73) 

j 

Remember now that the complex conjugate of an amplitude is the reverse ampli- 
tude, so (11.73) can be rewritten as 

E ( n I j){j ! 7? | /> = £>|i). (11.74) 

3 

Since this equation is valid for any i, its “short form” is 

<n|tf=^(n|> ( 1L75 ) 

which is called the adjoint to Eq. (11.71). 

Now we can easily prove that E„ is a real number. We multiply Eq. (11.71) 
by (n | to get 

<n |#| n> = £ n , (11.76) 

since (n | n) = 1. Then we multiply Eq. (11.75) on the left by | n) to get 

(n |/f| n) = E*. (11.77) 

Comparing (11.76) with (11.77) it is clear that 

E n = Et, (11.78) 

which means that E„ is real. We can erase the star on E n in Eq. (11.75). 

Finally we are ready to show that the different energy states are orthogonal. 
Let | n) and | m) be any two of the definite energy base states. Using Eq. (11.75) 
for the state m, and multiplying it by | n), we get that 

(m |7?| n) = £ m (m | n). 

11-23 



But if we multiply (11.71) by (m |, we get 

(m \6\ n> = E n (m | n). 

Since the left sides of these two equations are equal, the right sides are, also: 

£m<m | n> = E n (m | n>. (11.79) 

If E m = E n the equation does not tell us anything. But if the energies of the two 
states | m) and | n) are different (E m ^ EJ, Eq. (11.79) says that <m | n> must 
be zero, as we wanted to prove. The two states are necessarily orthogonal so long 
as E n and E m are numerically different. 
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12 


The Hyper fine Splitting in Hydrogen 


12-1 Base states for a system with two spin one-half particles 

In this chapter we take up the “hyperfine splitting” of hydrogen, because 
it is a physically interesting example of what we can already do with quantum 
mechanics. It’s an example with more than two states, and it will be illustrative of 
the methods of quantum mechanics as applied to slightly more complicated prob- 
lems. It is enough more complicated that once you see how this one is handled 
you can get immediately the generalization to all kinds of problems. 

As you know, the hydrogen atom consists of an electron sitting in the neigh- 
borhood of the proton, where it can exist in any one of a number of discrete 
energy states in each one of which the pattern of motion of the electron is different. 
The first excited state, for example, lies 3/4 of a Rydberg, or about 10 electron 
volts, above the ground state. But even the so-called ground state of hydrogen 
is not really a single, definite-energy state, because of the spins of the electron and 
the proton. These spins are responsible for the “hyperfine structure” in the energy 
levels, which splits all the energy levels into several nearly equal levels. 

The electron can have its spin either “up” or “down” and, the proton can 
also have its spin either “up” or “down.” There are, therefore, four possible spin 
states for every dynamical condition of the atom. That is, when people say “the 
ground state” of hydrogen, they really mean the “four ground- states,” and not 
just the very lowest state. The four spin states do not all have exactly the same 
energy; there are slight shifts from the energies we would expect with no spins. 
The shifts are, however, much, much smaller than the 10 volts or so from the 
ground state to the next state above. As a consequence, each dynamical state has 
its energy split into a set of very close energy levels — the so-called hyperfine splitting . 

The energy differences among the four spin states is what we want to calculate 
in this chapter. The hyperfine splitting is due to the interaction of the magnetic 
moments of the electron and proton, which gives a slightly different magnetic 
energy for each spin state. These energy shifts are only about ten-millionths 
of an electron volt — really very small compared with 10 volts! It is because of 
this large gap that we can think about the ground state of hydrogen as a “four- 
state” system, without worrying about the fact that there are really many more 
states at higher energies. We are going to limit ourselves here to a study of the 
hyperfine structure of the ground state of the hydrogen atom. 

For our purposes we are not interested in any of the details about the positions 
of the electron and proton because that has all been worked out by the atom so to 
speak — it has worked itself out by getting into the ground state. We need know 
only that we have an electron and proton in the neighborhood of each other with 
some definite spatial relationship. In addition, they can have various different 
relative orientations of their spins. It is only the effect of the spins that we want to 
look into. 

The first question we have to answer is : What are the base states for the system? 
Now the question has been put incorrectly. There is no such thing as “/Ac” base 
states, because, of course, the set of base states you may choose is not unique. 
New sets can always be made out of linear combinations of the old. There are 
always many choices for the base states, and among them, any choice is equally 
legitimate. So the question is not what is the base set, but what could a base set 
be? We can choose any one we wish for our own convenience. It is usually best 
to start with a base set which is physically the clearest. It may not be the solution 
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to any problem, or may not have any direct importance, but it will generally 
make it easier to understand what is going on. 

We choose the following four base states: 


State 1 : The electron and proton are both spin “up.” 
State 2\ The electron is “up” and the proton is “down.” 
State 3: The electron is “down” and the proton is “up.” 
State 4: The electron and proton are both “down.” 






Fig. 12-1. A set of base states for 
the ground state of the hydrogen atom. 


We need a handy notation for these four states, so we’ll represent them this way: 

State 1: | + +); electron up, proton up. 

State 2\ | H ); electron up, proton down . 

State 3: | — +); electron down, proton up. (12.1) 

State 4: | — — ); electron down, proton down. 

You will have to remember that the first plus or minus sign refers to the electron 
and the second, to the proton. For handy reference, we’ve also summarized the 
notation in Fig. 12-1. Sometimes it will also be convenient to call these states 
| 1), | 2), | 5), and | 4). 

You may say, “But the particles interact, and maybe these aren’t the right 
base states. It sounds as though you are considering the two particles indepen- 
dently. Yes, indeed! The interaction raises the problem : what is the Hamiltonian 
for the system, but the interaction is not involved in the question of how to describe 
the system. What we choose for the base states has nothing to do with what 
happens next. It may be that the atom cannot ever stay in one of these base states, 
even if it is started that way. That’s another question. That’s the question: 
How do the amplitudes change with time in a particular (fixed) base? In choosing 
the base states, we are just choosing the “unit vectors” for our description. 

While we’re on the subject, let’s look at the general problem of finding a set 
of base states when there is more than one particle. You know the base states for 
a single particle. An electron, for example, is completely described in real life— not 
in our simplified cases, but in real life — by giving the amplitudes to be in each of 
the following states: 

| electron “up” with momentum p) 
or 

| electron “down” with momentum p ). 

There are really two infinite sets of states, one state for each value of p. That is 
to say that an electron state | p) is completely described if you know all the ampli- 
tudes 

(+>P\'P) and <-,/> \P), 


where the + and — represent the components of angular momentum along some 
axis usually the z-axis — and p is the vector momentum. There must, therefore, 
be two amplitudes for every possible momentum (a multi-infinite set of base 
states). That is all there is to describing a single particle. 

When there is more than one particle, the base states can be written in a 
similar way. For instance, if there were an electron and a proton in a more com- 
plicated situation than we are considering, the base states could be of the following 
kind: 

| an electron with spin “up,” moving with momentum pi and 

a proton with spin “down,” moving with momentum /> 2 ). 


And so on for other spin combinations. If there are more than two particles — 
same idea. So you see that to write down the possible base states is really very easy. 
The only problem is, what is the Hamiltonian? 

For our study of the ground state of hydrogen we don’t need to use the full 
sets of base states for the various momenta. We are specifying particular mo- 
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mentum states for the proton and electron when we say “the ground state.” The 
details of the configuration— the amplitudes for all the momentum base states— 
can be calculated, but that is another problem. Now we are concerned only with 
the effects of the spin, so we can take only the four base states of (12.1). Our 
next problem is: What is the Hamiltonian for this set of states? 


12-2 The Hamiltonian for the ground state of hydrogen 

We’ll tell you in a moment what it is. But first, we should remind you of one 
thing: any state can always be written as a linear combination of the base states. 
For any state | P) we can write 

| *> = I + +)(+ + I *> + I + -)<+ - I +) + I - +>(- + I *> 

+ I X If)- (12.2) 

Remember that the complete brackets are just complex numbers, so we can also 
write them in the usual fashion as C t -, where i = 1, 2, 3, or 4, and write Eq. (12.2) as 

| yp) = | + +)Ci + | + — )C 2 + | ~ +)C3 + | — “)C4. (12.3) 

By giving the four amplitudes C t we completely describe the spin state | \p). If 
these four amplitudes change with time, as they will, the rate of change in time is 
given by the operator fi. The problem is to find the H. 

There is no general rule for writing down the Hamiltonian of an atomic 
system, and finding the right formula is much more of an art than finding a set of 
base states. We were able to tell you a general rule for writing a set of base states 
for any problem of a proton and an electron, but to describe the general Hamilton- 
ian of such a combination is too hard at this level. Instead, we will lead you to a 
Hamiltonian by some heuristic argument — and you will have to accept it as the 
correct one because the results will agree with the test of experimental observation. 

You will remember that in the last chapter we were able to describe the 
Hamiltonian of a single, spin one-half particle by using the sigma matrices — or the 
exactly equivalent sigma operators. The properties of the operators are sum- 
marized in Table 12-1. These operators— which are just a convenient, shorthand 
way of keeping track of the matrix elements of the type (+ | cr z \ + ) — were 
useful for describing the behavior of a single particle of spin one-half. The question 
is: Can we find an analogous device to describe a system with two spins? The 
answer is yes, very simply, as follows. We invent a thing which we will call “sigma 
electron,” which we represent by the vector operator a e , and which has the 
x-, y-, and z-components, a®, <r®, < 7 % We now make the convention that when one 
of these things operates on any one of our four base states of the hydrogen atom, 
it acts only on the electron spin, and in exactly the same way as if the electron were 
all by itself. Example: What is cr® | — +)? Since <r y on an electron “down” 
is — / times the corresponding state with the electron “up”, 

*;i -+>«-/ 1 + +>■ 

(When acts on the combined state it flips over the electron, but does nothing to 
the proton and multiplies the result by — /.) Operating on the other states, cr® 
would give 

a v I + +) = * I 

Vy H ) = z * I )> 

a t\ ) = — i I H )• 

Just remember that the operators a e work only on the first spin symbol — that is, 
on the electron spin. 

Next we define the corresponding operator “sigma proton” for the proton 
spin. Its three components <r£, <rj, <r? act in the same way as a e , only on the 
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Table 12-1 


<Tz 

+ ) = + 

+ ) 


-) = - 

-) 

Ox 

+ ) = + 

-) 

<r x I 

-) = + 

1+) 

(7 y 

1+) = + 

* 1 “) 

<7y\ 

1 -) = -I' 

l+> 



proton spin. For example, if we have acting on each of the four base states, we 
get — always using Table 12-1 — 

o£l + +>H + ->, 

ol I > “ | + +), 

1 - +> = I — X 

Ox | ) “ I b>- 

As you can see, it’s not very hard. 

Now in the most general case we could have more complex things. For 
instance, we could have products of the two operators like . When we have 
such a product we do first what the operator on the right says, and then do what 
the other one says.f For example, we would have that 

OxOz | H ) = ot{<r v z | -| » = ot(— | H » = —Ox H ) = - I )• 

Note that these operators don’t do anything on pure numbers— we have used 
this fact when we wrote or® (— 1) = (-1 )o e x . We say that the operators “commute” 
with pure numbers, or that a number “can be moved through” the operator. 
You can practice by showing that the product cr e x a l gives the following results 
for the four states: 

OxO P z | + +) — + 1 b), 

o Wl | H — ) = “ I )> 

o%o\ I b) = + I + +), 

trSrf I )=-! + “)• 

If we take all the possible operators, using each kind of operator only once, 
there are sixteen possibilities. Yes, sixteen— provided we include also the “unit 
operator” 1. First, there are the three: <r e x , a% 9 a%. Then the three <r p , <r p of— that 
makes six. In addition, there are the nine possible products of the form ofof, 
which makes a total of 15. And there’s the unit operator which just leaves any 
state unchanged. Sixteen in all. 

Now note that for a four-state system, the Hamiltonian matrix has to be 
a four-by-four matrix of coefficients— it will have sixteen entries. It is easily 
demonstrated that any four-by-four matrix — and, therefore, the Hamiltonian 
matrix in particular — can be written as a linear combination of the sixteen double- 
spin matrices corresponding to the set of operators we have just made up. There- 
fore, for the interaction between a proton and an electron that involves only their 
spins, we can expect that the Hamiltonian operator can be written as a linear 
combination of the same 16 operators. The only question is, how? 

Well, first, we know that the interaction doesn’t depend on our choice of 
axes for a coordinate system. If there is no external disturbance — like a magnetic 
field — to determine a unique direction in space, the Hamiltonian can’t depend on 
our choice of the direction of the x>, y-, and z-axes. That means that the 
Hamiltonian can’t have a term like <r% all by itself. It would be ridiculous, because 
then somebody with a different coordinate system would get different results. 

The only possibilities are a term with the unit matrix, say a constant a (times 
1), and some combination of the sigmas that doesn’t depend on the coordinates— 
some “invariant” combination. The only scalar invariant combination of two 
vectors is the dot product, which for our <r’s is 

<r e * a p = <TxOx +oy y + o\(X V z . (12.4) 

This operator is invariant with respect to any rotation of the coordinate system. 


f For these particular operators, you will notice it turns out that the sequence of the 
operators doesn’t matter. 
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So the only possibility for a Hamiltonian with the proper symmetry in space is a 
constant times the unit matrix plus a constant times this dot product, say, 

ft = E 0 + A a* • a p . (12.5) 

That’s our Hamiltonian. It’s the only thing that it can be, by the symmetry of 
space, so long as there is no external field. The constant term doesn’t tell us much; 
it just depends on the level we choose to measure energies from. We may just 
as well take E 0 = 0. The second term tells us all we need to know to find the 
level splitting of the hydrogen. 

If you want to, you can think of the Hamiltonian in a different way. If there 
are two magnets near each other with magnetic moments Me and m p , the mutual 
energy will depend on Me ‘ M P — among other things. And, you remember, we 
found that the classical thing we call Me appears in quantum mechanics as ix c a c . 
Similarly, what appears classically asM P will usually turn out in quantum mechanics 
to be MpOp (where /x P is the magnetic moment of the proton, which is about 1000 
times smaller than Me, and has the opposite sign). So Eq. (12.5) says that the 
interaction energy is like the interaction between two magnets — only not quite, 
because the interaction of the two magnets depends on the radial distance between 
them. But Eq. (12.5) could be — and, in fact, is — some kind of an average inter- 
action. The electron is moving all around inside the atom, and our Hamiltonian 
gives only the average interaction energy. All it says is that for a prescribed ar- 
rangement in space for the electron and proton there is an energy proportional 
to the cosine of the angle between the two magnetic moments, speaking classically. 
Such a classical qualitative picture may help you to understand where it comes 
from, but the important thing is that Eq. (12.5) is the correct quantum mechanical 
formula. 

The order of magnitude of the classical interaction between two magnets 
would be the product of the two magnetic moments divided by the cube of the 
distance between them. The distance between the electron and the proton in the 
hydrogen atom is, speaking roughly, one half an atomic radius, or 0.5 angstrom. 
It is, therefore, possible to make a crude estimate that the constant A should be 
about equal to the product of the two magnetic moments n e and m p divided by 
the cube of 1 /2 angstrom. Such an estimate gives a number in the right ball park. 
It turns out that A can be calculated accurately once you understand the complete 
quantum theory of the hydrogen atom — which we so far do not. It has, in fact, 
been calculated to an accuracy of about 30 parts in one million. So, unlike the 
flip-flop constant A of the ammonia molecule, which couldn’t be calculated at 
all well by a theory, our constant A for the hydrogen can be calculated from a more 
detailed theory. But never mind, we will for our present purposes think of the A 
as a number which could be determined by experiment, and analyze the physics 
of the situation. 

Taking the Hamiltonian of Eq. (12.5), we can use it with the equation 

ihCi = J2 HijCj ( 12 . 6 ) 

3 

to find out what the spin interactions do to the energy levels. To do that, we need 
to work out the sixteen matrix elements — ( i\H |y) corresponding to each 
pair of the four base states in (12.1). 

We begin by working out what ft\ j) is for each of the four base states. 
For example, 

6 | + + > = A<J V * a p I + +> = A{<rZal + <r£cr£ + *3*5} | + +>. (12.7) 

Using the method we described a little earlier — it’s easy if you have memorized 
Table 12-1 — we find what each pair of(r’s does on | H — h)* The answer is 

I + +) = + I )> 

* 3 * 51 + +>=-!-->. < 12 * 8) 

*3*S | + +)=+I + +)• 
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So (12.7) becomes 


Table 12-2 

Spin operators for the hydrogen atom 


| + +) = + | “ “) 
<*£ |+-)=+|-+> 
ay 9 \- + )=+\ + _> 

<*5 | — — ) = + | H — b) 

<*Sl + +> = - I — > 
<*Sl + -) = +1- +> 
*y, i -+) = + i +-> 

^Wv I — — ) " " | + +) 

<7 z^z | + +) = +1+ +) 

<** I + — ) = “ I + “ ) 

<*z I — +) = - I - +> 

<*51- -> = +!--> 


fi\ + +) = A{\ > - | — ) + !++>} = A\ + +). (12.9) 

Since our four base states are all orthogonal, that gives us immediately that 


(H — b | 7/| H — b) — A(-\ — b | H — b) — A, 

(+ - \H\ + +> = A(+ - |+ + > - 0, 

<- + 1*1 + + > = A(- +| + +> = 0, 

<- - \H\ + +) = A{- - |+ +) = 0. 


( 12 . 10 ) 


Remembering that (j \ H \ i) = (i \ H\j )*, we can already write down the differ- 
ential equation for the amplitudes C : : 

ihC\ — H\\C\ + H 1 2 C 2 + 7/13C3 + 7/14C4 
or 

ihC 1 = ACl (12.11) 


That’s all! We get only the one term. 

Now to get the rest of the Hamiltonian equations we have to crank through 
the same procedure for ff operating on the other states. First, we will let you 
practice by checking out all of the sigma products we have written down in Table 
12-2. Then we can use them to get: 

# 1 + -) = A{2\ - +) - |+ -)}, 

fi\ - +> = A{2\+ -) - | - + >}, (12.12) 

ft\- -) = M - -)■ 


Then, multiplying each one in turn on the left by all the other state vectors, we 
get the following Hamiltonian matrix, Hif 


y 

*1 /a 0 0 o\ 

0 -A 2A 0 

02 A -A 0 

\0 0 0 A! 


(12.13) 


It means, of course, nothing more than that our differential equations for the four 
amplitudes C* are 

ihC 1 - AC U 

ihC 2 = - AC 2 + 2AC 3 , (12.14) 

ihC 3 — 2AC 2 — AC 3) 
ihC ± = AC 4. 


Before solving these equations we can’t resist telling you about a clever 
rule due to Dirac — it will make you feel that you are really advanced — although 
we don’t need it for our work. We have — from the equations (12.9) and (12.12) — 
that 

<r e * a p | + + ) = | + +), 

<r e * <jP j + — ) = 2| — + ) | + }» (12.15) 

a e -<r p | - +> = 2| + | 

or e -<r p | ) - | ). 
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Look, said Dirac, I can also write the first and last equations as 


a**a p | + +> = 2 | + +>-| + +>, 

<r e * ffP | ) ~ 2 | > I >; 

then they are all quite similar. Now I invent a new operator, which I will call 
P S pin exch and which I define to have the following properties :f 


^spin exch | "i” H" ) — 
P spin exch | “f" } “ 


spiti exch 


spin exch 


-+>“ 
I ) = 


+ +), 

h), 

_| ) } 


All the operator does is interchange the spin directions of the two particles. Then 
I can write the whole set of equations in (12.15) as a simple operator equation: 

= 2P Bpinexch - 1. (12.16) 


That’s the formula of Dirac. His 4 ‘spin exchange operator” gives a handy 
rule for figuring out a Q • ct p . (You see, you can do everything now. The gates 
are open.) 


12-3 The energy levels 

Now we are ready to work out the energy levels of the ground state of hydro- 
gen by solving the Hamiltonian equations (12.14). We want to find the energies 
of the stationary states. This means that we want to find those special states 
| $) for which each amplitude Q = (i | in the set belonging to | \p) has the 
same time dependence — namely, e~ lut . Then the state will have the energy E = hw. 
So we want a set for which 

Ci = (12.17) 

where the four coefficients ai are independent of time. To see whether we can 
get such amplitudes, we substitute (12.17) into Eq. (12.14) and see what happens. 
Each ih dC/dt in Eq. (12.14) turns into £C, and — after cancelling out the common 
exponential factor — each C becomes an a; we get 


Ea i = Aa lt 
Ea 2 — — Aa 2 ~b 2 Aa 3, 
Ea 3 = 2Aa 2 — Aa 3s 
Ea 4 = Aa 4i 


(12.18) 


which we have to solve for a 1 , o 2 , a 3 , and a 4 . Isn’t it nice that the first equation is 
independent of the rest — that means we can see one solution right away. If we 
choose E = A, 

ci 1 — 1 , ci 2 = ci 3 — (i 4 ~ 0 , 


gives a solution. (Of course, taking all the a’s equal to zero also gives a solution, 
but that’s no state at all!) Let’s call our first solution the state 1 1):% 


Its energy is 


I) = 1 1) = | + +). 
Et - A . 


(12.19) 


t This operator is now called the “Pauli spin exchange operator.” 
j The state is really | t)e^ i,K)Elt \ but, as usual we will identify the states by the con- 
stant vectors which are equal to the complete vectors at t — 0. 
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With that clue you can immediately see another solution from the last equation 
in (12.18): 

ai = a 2 = a 3 = 0 , ci 4 — 1 , 

E = A, 

We’ll call that solution state | II): 

\H) ~ \4) ~ | >, (12.20) 

Eh = A. 

Now it gets a little harder; the two equations left in (12.18) are mixed up. 
But we’ve done it all before. Adding the two, we get 

E(a 2 + a 3 ) = A(a 2 + a 3 ). (12.21) 

Subtracting, we have 

E(a 2 - a z ) = —3A(a 2 ~ a z ). (12.22) 


By inspection — and remembering ammonia — we see that there are two solutions: 


and 


a 2 = a 3 , E — A 
a 2 = —a 3 , E = —3 A. 


(12.23) 


They are mixtures of | 2) and | 3). Calling these states | III) and | IV), and putting 
in a factor l/\/2 to make the states properly normalized, we have 


E 0 + A 

E 0 


E 0 - 3A 


x, n.nt 



Fig. 12-2. Energy-level diagram for 
the ground state of atomic hydrogen. 


and 


| III) = /= (| 2) + | 5» = (| + -> + 

y /2 \/2 

Em = A 


+ ))» 


I IV) = -^(|2)- |5» = -L(| + -)- 
\/2 V2 

Ejv — — 3 A. 


b)), 


(12.24) 


(12.25) 


We have found four stationary states and their energies. Notice, incidentally, 
that our four states are orthogonal, so they also can be used for base states if 
desired. Our problem is completely solved. 

Three of the states have the energy A , and the last has the energy —3 A. 
The average is zero — which means that when we took £0 = 0 in Eq. (12.5), we 
were choosing to measure all the energies from the average energy. We can draw 
the energy-level diagram for the ground state of hydrogen as shown in Fig. 12-2. 

Now the difference in energy between state | IV) and any one of the others 
is 4 A. An atom which happens to have gotten into state 1 1) could fall from there 
to state | IV) and emit light. Not optical light, because the energy is so tiny— it 
would emit a microwave quantum. Or, if we shine microwaves on hydrogen gas, 
we will find an absorption of energy as the atoms in state | IV) pick up energy and 
go into one of the upper states— but only at the frequency co = 4 A/h. This 
frequency has been measured experimentally; the best result, obtained very 
recently,! is 


/ = co/ 27 r = (1,420,405,751.800 ± 0.028) cycles per second. (12.26) 


The error is only two parts in 100 billion! Probably no basic physical quantity is 
measured better than that — it’s one of the most remarkably accurate measurements 
in physics. The theorists were very happy that they could compute the energy to 
an accuracy of 3 parts in 10 5 , but in the meantime it has been measured to 2 parts in 
10 11 — a million times more accurate than the theory. So the experimenters are 


t Crampton, Kleppner, and Ramsey; Physical Review Letters, Vol. 11, page 338 (1963). 
12-8 



way ahead of the theorists. In the theory of the ground state of the hydrogen atom 
you are as good as anybody. You, too, can just take your value of A from experi- 
ment — that’s what everybody has to do in the end. 

You have probably heard before about the “ ‘21 -centimeter line” of hydrogen. 
That’s the wavelength of the 1420 megacycle spectral line between the hyperfine 
states. Radiation of this wavelength is emitted or absorbed by the atomic hydrogen 
gas in the galaxies. So with radio telescopes tuned in to 21 -cm waves (or 1420 
megacycles approximately) we can observe the velocities and the location of con- 
centrations of atomic hydrogen gas. By measuring the intensity, we can estimate 
the amount of hydrogen. By measuring the frequency shift due to the Doppler 
effect, we can find out about the motion of the gas in the galaxy. That is one of 
the big programs of radio astronomy. So now we are talking about something 
that’s very real — it is not an artificial problem. 


12-4 The Zeeman splitting 

Although we have finished the problem of finding the energy levels of the 
hydrogen ground state, we would like to study this interesting system some more. 
In order to say anything more about it — for instance, in order to calculate the 
rate at which the hydrogen atom absorbs or emits radio waves at 21 centimeters — 
we have to know what happens when the atom is disturbed. We have to do as we 
did for the ammonia molecule — after we found the energy levels we went on and 
studied what happened when the molecule was in an electric field. We were then 
able to figure out the effects from the electric field in a radio wave. For the hydro- 
gen atom, the electric field does nothing to the levels, except to move them all by 
some constant amount proportional to the square of the field — which is not of 
any interest because that won’t change the energy differences. It is now the 
magnetic field which is important. So the next step is to write the Hamiltonian 
for a more complicated situation in which the atom sits in an external magnetic 
field. 

What, then, is the Hamiltonian? We’ll just tell you the answer, because we 
can’t give you any “proof” except to say that this is the way the atom works. 

The Hamiltonian is 

A = A(<r K ■ ffP) - fi e a e ■ B - /v* p • B. (12.27) 

It now consists of three parts. The first term Aa c • a p represents the magnetic 
interaction between the electron and the proton — it is the same one that would 
be there if there were no magnetic field. This is the term we have already had; 
and the influence of the magnetic field on the constant A is negligible. The effect 
of the external magnetic field shows up in the last two terms. The second term, 
— ju e a e • B , is the energy the electron would have in the magnetic field if it were 
there alone.t In the same way, the last term — /x p ct p ■ B, would have been the 
energy of a proton alone. Classically, the energy of the two of them together would 
be the sum of the two, and that works also quantum mechanically. In a magnetic 
field, the energy of interaction due to the magnetic field is just the sum of the energy 
of interaction of the electron with the external field, and of the proton with the 
field — both expressed in terms of the sigma operators. In quantum mechanics 
these terms are not really the energies, but thinking of the classical formulas for 
the energy is a way of remembering the rules for writing down the Hamiltonian. 
Anyway, the correct Hamiltonian is Eq. (12.27). 

Now we have to go back to the beginning and do the problem all over again. 
Much of the work is, however, done — we need only to add the effects of the new 
terms. Let’s take a constant magnetic field B in the z-direction. Then we have to 


t Remember that classically U = ~fi • B, so the energy is lowest when the moment 
is along the field. For positive particles, the magnetic moment is parallel to the spin and 
for negative particles it is opposite. So in Eq. (12.27), /z p is a positive number, but /x 0 is 
*a negative number. 
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add to our Hamiltonian operator H the two new pieces— which we can call ft * : 

ft' = -( Me <r? + MpOA 
Using Table 12-1, we get right away that 

ft’ I + +) = “(Me + Mp)^ | + + ), 

#'! + -> = “(m, - M P )*| + ->, 

- , , (12.28) 
#'l“+> = -(-Mc + Mp)*|-+>, 

#'l “ “> = (Me + Mp)*| “ “). 

How very convenient! The H f operating on each state just gives a number times 
that state. The matrix {/ 1 H f \j) has, therefore, only diagonal elements — we can 
just add the coefficients in (12.28) to the corresponding diagonal terms of (12.13), 
and the Hamiltonian equations of (12.14) become 


ihdCJdt = {A - (ju e + )B}C l3 

ihdC 2 /dt ~ {A *T (m c — /x p )5}C2 + 2AC 3 , 

. , , (12.29) 

ihdC 3 /dt = 2AC 2 - {A - (m c - m p )*}C 3 , 

ihdC^/dt = {A + (Me + M P )^}U 4 . 

The form of the equations is not different — only the coefficients. So long 
as B doesn’t vary with time, we can continue as we did before. Substituting 
Q = aie~ illh)E \ we get— as a modification of (12.18)— 


Ea^ = A {- ^ + V P )B}a u 

Ea 2 - — (^ + (M e — Mp )^}<^2 + 2^r/ 3 , 

Ea s = 2^a 2 — - Qx e - M P )5}a 3 , 

~ {A A- (Me + Mj))^}^4- 


Fortunately, the first and fourth equations are still independent of the rest, so the 
same technique works again. 

One solution is the state 1 1) for which a x = 1, a 2 = a 3 = a 4 = 0, or 


U> - I 0 — I + +)» 

with (12.31) 

E r = A — (Me + M P )^- 

Another is 

1 #> - 1 4) = 1 - ->, 

With (12.32) 

= A + (Me + Mp)^- 


A little more work is involved for the remaining two equations, because the 
coefficients of a 2 and a 3 are no longer equal. But they are just like the pair we had 
for the ammonia molecule. Looking back at Eq. (9.20), we can make the following 
analogy (remembering that the labels 1 and 2 there correspond to 2 and 3 here): 


#11 -A - (Me - Mp)£, 
H 12 -*2A, 

#2i-* 2 A, 

#22 ~A + (Me — Mp)^* 


The energies are then given by (9.25), which was 


E = 


#11 + #22 
2 


4 


(tfn - #2 


+ # 12#2 


(12.33) 


(12.34) 
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Making the substitutions from (12.33), the energy formula becomes 
E = —A =t VG*e - M P ) 2 * 2 + 4A 2 . 

Although in Chapter 9 we used to call these energies E r and E IU and we are in 
this problem calling them Em and Eiv, 


Em = A{~ 1 + Vl + (Me - M P ) 2 fi 2 /4^2}, 

„ , . 1. (12.35) 

Eiv — ~A{ 1 + 2\/l + (/*„ — Hp) 2 B 2 /4A 2 } . 

So we have found the energies of the four stationary states of a hydrogen 
atom in a constant magnetic field. Let’s check our results by letting B go to zero 
and seeing whether we get the same energies we had in the preceding section. You 
see that we do. For B = 0, the energies E u E IU and E Ilr go to +A, and E IV 
goes to —3/f. Even our labeling of the states agrees with what we called them be- 
fore. When we turn on the magnetic field though, all of the energies change in a 
different way. Let’s see how they go. 

First, we have to remember that for the electron, ix Q is negative, and about 
1000 times larger than m p which is positive. So ju e + /r p and n e — /r p are both 
negative numbers, and nearly equal. Let’s call them — p and — p': 

M = “(Me + Mp), m' = “(Me - Mp). (12.36) 

(Both m and m' are positive numbers, nearly equal to magnitude of Me — which is 
about one Bohr magneton.) Then our four energies are 

Ei = A + nB, 


Eu = A — fiB, 

Em = A{— 1 + 2VT + fi' 2 B 2 /4A 2 }, 
Eiv = -A{ 1 + 2Vl + fi' 2 B 2 /4A 2 }. 


(12.37) 


The energy E, starts at A and increases linearly with B — with the slope m- The 



Fig. 12-3. The energy levels of the ground state Fig. 12-4. Transitions between the levels of 

of hydrogen in a magnetic field B. ground state energy levels of hydrogen in some 

particular magnetic field 6. 
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energy E u also starts at A but decreases linearly with increasing — its slope is 
— p. These two levels vary with B as shown in Fig. 12-3. We show also in the 
figure the energies E ni and E IV - They have a different ^-dependence. For small 
B , they depend quadratically on B, so they start out with horizontal slopes. Then 
they begin to curve, and for large B they approach straight lines with slopes 
±ju' s which are nearly the same as the slopes of Ej and Eu, 

The shift of the energy levels of an atom due to a magnetic field is called the 
Zeeman effect. We say that the curves in Fig. 12-3 show the Zeeman splitting of 
the ground state of hydrogen. When there is no magnetic field, we get just one 
spectral line from the hyperfine structure of hydrogen. The transitions between 
state | IV) and any one of the others occurs with the absorption or emission of a 
photon whose frequency 1420 megacycles is \/h times the energy difference 4 A. 
When the atom is in a magnetic field B , however, there are many more lines. 
There can be transitions between any two of the four states. So if we have atoms 
in all four states, energy can be absorbed— or emitted— in any one of the six 
transitions shown by the vertical arrows in Fig. 12-4. Many of these transitions 
can be observed by the Rabi molecular beam technique we described in Volume 11, 
Section 35-3 (see Appendix). 

What makes the transitions go? The transitions will occur if you apply a small 
disturbing magnetic field that varies with time (in addition to the steady strong 
field B). It’s just as we saw for a varying electric field on the ammonia molecule. 
Only here, it is the magnetic field which couples with the magnetic moments and 
does the trick. But the theory follows through in the same way that we worked 
it out for the ammonia. The theory is the simplest if you take a perturbing mag- 
netic field that rotates in the ;ty-plane — although any horizontal oscillating field 
will do. When you put in this perturbing field as an additional term in the Ham- 
iltonian, you get solutions in which the amplitudes vary with time — as we found 
for the ammonia molecule. So you can calculate easily and accurately the prob- 
ability of a transition from one state to another. And you find that it all agrees 
with experiment. 


12-5 The states in a magnetic field 

We would like now to discuss the shapes of the curves in Fig. 12-3. In the 
first place, the energies for large fields are easy to understand, and rather interesting. 
For B large enough (namely for pB/A » 1) we can neglect the 1 in the formulas 
of (12.37). The four energies become 


E x = A + pB, E a = A ~ pB, 

Eui s ~ —A + p f B, Ejv ~ ~~ A — p'B. 


(12.38) 


These are the equations of the four straight lines in Fig. 12-3. We can understand 
these energies physically in the following way. The nature of the stationary states 
in a zero field is determined completely by the interaction of the two magnetic 

moments. The mixtures of the base states | H } and | b) in the stationary 

states | III) and | IV) are due to this interaction. In large external fields, however, 
the proton and electron will be influenced hardly at all by the field of the other; 
each will act as if it were alone in the external field. Then— as we have seen many 
times— the electron spin will be either parallel to or opposite to the external 
magnetic field. 

Suppose the electron spin is “up”— that is, along the field; its energy will be 
—p c B. The proton can still be either way. If the proton spin is also “up,” its 
energy is ~p p B, The sum of the two is -(Me + /x p )B = pB. That is just what 
we find for £>— which is fine, because we are describing the state | + +> = | /). 
There is still the small additional term A (now pB » A) which represents the 
interaction energy of the proton and electron when their spins are parallel. (We 
originally took A as positive because the theory we spoke of says it should be, 
and experimentally it is indeed so.) On the other hand, the proton can have its 
spin down. Then its energy in the external field goes to —p p B, so it and the electron 
have the energy — (p c — p p )B = p f B. And the interaction energy becomes —A. 
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The sum is just the energy Em in (12.38). So the state | III ) must for large fields 
become the state | H ). 

Suppose now the electron spin is “down.” Its energy in the external field is 
fi e B. If the proton is also “down,” the two together have the energy (p e + m p )^ = 
fxB, plus the interaction energy A — since their spins are parallel. That makes just 

the energy E u in (12.38) and corresponds to the state | > = | II) — which is 

nice. Finally if the electron is “down” and the proton is “up,” we get the energy 
(Me — m V )B — A (minus A for the interaction because the spins are opposite) 
which is just Eiv And the state corresponds to | b). 

“But, wait a moment!”, you are probably saying, “The states | III) and 

| IV) are not the states | H ) and | b); they are mixtures of the two.” Well, 

only slightly. They are indeed mixtures for B = 0, but we have not yet figured 
out what they are for large B. When we used the analogies of (12.33) in our formu- 
las of Chapter 9 to get the energies of the stationary states, we could also have 
taken the amplitudes that go with them. They come from Eq. (9.23), which is 

a% _ E — # 22 

a Z #11 


The ratio a 2 /a 3 is, of course, just C 2 /C 3 . Plugging in the analogous quantities 
from (12.33), we get 

C 2 = E + A - (ju e - M P )B 
C 3 2 A 

or 

= E + A + >x ' B , (12.39) 

C 3 2 A 

where for E we are to use the appropriate energy — either E in or E IV . For instance, 
for state | III) we have 

So for large B the state | III) has C 2 » C 3 ; the state becomes almost completely 

the state | 2) — | -J ). Similarly, if we put E 1V into (12.39) we get (C 2 /C 3 )jf 

«1 ; for high fields state [ IV) becomes just the state | 3) — | b). You see that 

the coefficients in the linear combinations of our base states which make up the 
stationary states depend on B . The state we call | III) is a 50-50 mixture of | + — ) 

and | b) at very low fields, but shifts completely over to | H ) at high fields. 

Similarly, the state | IV), which at low fields is also a 50-50 mixture (with opposite 

signs) of | H ) and | b), goes over into the state [ b) when the spins are 

uncoupled by a strong external field. 

We would also like to call your attention particularly to what happens at 
very low magnetic fields. There is one energy — at —3 A — which does not change 
when you turn on a small magnetic field. And there is another energy — at +A — 
which splits into three different energy levels when you turn on a small magnetic 
field. For weak fields the energies vary with B as shown in Fig. 12-5. Suppose 
that we have somehow selected a bunch of hydrogen atoms which all have the 
energy —3 A. If we put them through a Stern-Gerlach experiment — with fields 
that are not too strong — we would find that they just go straight through. (Since 
their energy doesn’t depend on B, there is — according to the principle of virtual 
work — no force on them in a magnetic field gradient.) Suppose, on the other hand, 
we were to select a bunch of atoms with the energy -j-A, and put them through 
a Stern-Gerlach apparatus, say an S apparatus. (Again the fields in the apparatus 
should not be so great that they disrupt the insides of the atom, by which we mean 
a field small enough that the energies vary linearly with B .) We would find three 
beams. The states 1 1) and | II) get opposite forces — their energies vary linearly 
with B with the slopes ±/x so the forces are like those on a dipole with p z = =fm; 
but the state [ III) goes straight through. So we are right back in Chapter 5. 
A hydrogen atom with the energy +A is a spin-one particle . This energy state is a 
“particle” for which j = 1 , and it can be described— with respect to some set of 
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Fig. 1 2-5. The states of the hydrogen 
atom for small magnetic fields. 



axes in space — in terms of the base states | +S), | 0 S), and | —S) we used in Chap- 
ter 5. On the other hand, when a hydrogen atom has the energy —3 A, it is a spin- 
zero particle. (Remember, what we are saying is only strictly true for infinitesimal 
magnetic fields.) So we can group the states of hydrogen in zero magnetic field 
this way: 


1 D = 

in) = 

1+ +) 

1+ -) + 1 

- +) 

> spin 1 < 

1+5) 

|0S> 

(12.41) 

V2 


1 ir> = 

1- -) 



,1-5) 


IV) = 

1+ -) - 1 

- +> 

spin 0. 


(12.42) 

V2 




Table 12-3 


Zero field states of the hydrogen atom 


State | j, m) 

j 

m 

Our notation 

U.+i) 

1 

+i 

1 I) = 1 +5) 

1 1,0) 

1 

0 

|///) = | OS) 

11,-1) 

1 

-1 

| //) = | -S) 

1 0 , 0) 

0 

0 

1 IV) 


We have said in Chapter 35 of Volume II (Appendix) that for any particle its 
component of angular momentum along any axis can have only certain values 
always h apart. The z-component of angular momentum J z can be jh , (j — 1)/?, 
(j — 2)h, . . . , (-j)h, where j is the spin of the particle (which can be an integer or 
half-integer). Although we neglected to say so at the time, people usually write 

J z — mh , (12.43) 

where m stands for one of the numbers j,j — 1 ,j — 2, . . . , —j. You will, there- 
fore, see people in books label the four ground states of hydrogen by the so-called 
quantum numbers j and m [often called the “total angular momentum quantum 
number” (j), and “magnetic quantum number” (ra)]. Then, instead of our state 
symbols | /), | II), and so on, they will write a state as | j, m). So they would write 
our little table of states for zero field in (12.41) and (12.42) as shown in Table 12-3. 
It’s not new physics, it’s all just a matter of notation. 


12-6 The projection matrix for spin one | 

We would like now to use our knowledge of the hydrogen atom to do some- 
thing special. We discussed in Chapter 5 that a particle of spin one which was in 
one of the base states (+, 0, or — ) with respect to a Stern-Gerlach apparatus of a 
particular orientation — say an S apparatus — would have a certain amplitude to 
be in each of the three states with respect to a T apparatus with a different orienta- 
tion in space. There are nine such amplitudes (jT \ iS) which make up the pro- 
jection matrix. In Section 5-7 we gave without proof the terms of this matrix 
for various orientations of T with respect to S . Now we will show you one way 
they can be derived. 

In the hydrogen atom we have found a spin-one system which is made up 
of two spin one-half particles. We have already worked out in Chapter 6 how 
to transform the spin one-half amplitudes. We can use this information to calculate 
the transformation for spin one. This is the way it works: We have a system— a 
hydrogen atom with the energy + /4 — which has spin one. Suppose we run it 
through a Stern-Gerlach filter S, so that we know it is in one of the base states 
with respect to S, say | +5). What is the amplitude that it will be in one of the 
base states, say | + 7), with respect to the T apparatus? If we call the coordinate 
system of the S apparatus the x, y, z system, the | +S) state is what we have been 
calling the state | + +). But suppose another guy took his z-axis along the axis 
of T. He will be referring his states to what we will call the x', y', z f frame. His 
“up” and “down” states for the electron and proton would be different from ours. 
His “plus-plus” state — which we can write | + ' +'), referring to the “prime” 
frame — is the | +7 1 ) state of the spin-one particle. What we want is {+T | +S) 
which is just another way of writing the amplitude {+' +' | H — h). 

f Those who chose to jump over Chapter 6 should skip this section also. 
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We can find the amplitude (+' + ' | + + ) in the following way. In our frame 
the electron in the [ + +) state has its spin “up”. That means that it has some 
amplitude (+' | +) e of being “up” in his frame, and some amplitude ' | +) e 
of being “down” in that frame. Similarly, the proton in the | + +) state has 
spin “up” in our frame and the amplitudes {+' | + ) p and ' | + ) p of having 
spin “up” or spin “down” in the “prime” frame. Since we are talking about two 
distinct particles, the amplitude that both particles will be “up” together in his 
frame is the product of the two amplitudes, 

<+' +' | + +> = <+' | +>e<+' I +> P . (12.44) 

We have put the subscripts e and p on the amplitudes (+' | + ) to make it clear 
what we were doing. But they are both just the transformation amplitudes for a 
spin one-half particle, so they are really identical numbers. They are, in fact, just 
the amplitude we have called (+71 +5) in Chapter 6, and which we listed in 
the tables at the end of that chapter. 

Now, however, we are about to get into trouble with notation. We have to 
be able to distinguish the amplitude (+7 j + S) for a spin one-half particle from 
what we have also called (+7 | + S) for a spin-one particle — yet they are completely 
different! We hope it won’t be too confusing, but for the moment at least, we will 
have to use some different symbols for the spin one-half amplitudes. To help 
you keep things straight, we summarize the new notation in Table 12-4. We will 
continue to use the notation | +5), | 0 S), and | — S) for the states of a spin-one 
particle. 

With our new notation, Eq. (12.44) becomes simply 

{+' +'|++) = a 2 , 

and this is just the spin -one amplitude (+7 | +5). Now, let’s suppose, for in- 
stance, that the other guy’s coordinate frame — that is, the 7, or “primed,” appara- 
tus — is just rotated with respect to our z- axis by the angle <f>; then from Table 6-2, 

a = (+' I + > = e W2 . 

So from (12.44) we have that the spin-one amplitude is 

(+T I +S> = (+' +'|++)= (e W2 ) 2 = e«. (12.45) 

You can see how it goes. 

Now we will work through the general case for all the states. If the proton 
and electron are both “up” in our frame — the S-frame — the amplitudes that it 
will be in any one of the four possible states in the other guy’s frame — the 7-frame — 
are 

<+' +'|++)= <+' |+>e<+'|+>P = ^ 

<+' | + +> = <+' I +>e<~ / i +>„ - ab, 

(12.46) 

(— ' +' I + +) = ' I +) e (+' I +)p = ba, 

<-' -'!++>= (“ / |+>e(- / |+>p = b\ 

We can, then, write the state | + +) as the following linear combination: 

| + +) = A 2 | + ; +0 + ab{ | +' + | +0} + b 2 | -'}• (12.47) 

Now we notice that | +' +') is the state | +7), that {| +' —') + | — ' +')} is 
just v 7 2 times the state [ 0 7) — see (12.41) — and that | | — 7). In other 

words, Eq. (12.47) can be rewritten as 

| + 5 ) = a 2 | +7) + \/2 ab | 0 7) + b 2 \ -7). (12.48) 

In a similar way you can easily show that 

| -5) = c 2 | +7) + V2 cd | 0 7) + d 2 | -7). (12.49) 
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Table 12-4 
Spin one-half amplitudes 


This chapter 

«=(+'! +) 
b = (-' 1 +) 
c = (+'!-) 


Chapter 6 

<+7’| +5) 
(~T\ +S) 
(+T | -5) 
(~T | -S) 



For | 0 S) it’s a little more complicated, because 

|0S> = -4= {I + -) + !-+)}• 

V2 

But we can express each of the states | -t ) and | h) in terms of the “prime” 

states and take the sum. That is, 


| + -> = ac| +' +') + ad \+* -'} + be | -' +') + bd\-> -') (12.50) 
and 


| - +) = ac\ +* + '> + be | +' -0 + ad\ + ') + bd\-> -'}• (12.51) 
Taking 1 /\/2 times the sum, we get 

I 0 S) = ac | + +') + +'-')+ I -'+'>} +-~bd \-‘ 

V2 \/2 V 2 

It follows that 

|0S> = y/2 ac | +7) + (orf + Z>c) | 0 7) + s/2 bd | - 7 ). (12.52) 


We have now all of the amplitudes we wanted. The coefficients of Eqs. 
(12.48), (12.49), and (12.52) are the matrix elements (jT ( iS). Let’s pull them all 
together: 

iS 


jT ]/ a 2 \/2ac 

(jT | iS) = \/2 ab ad + be 

\ b 2 V2 bd 


c 2 ^ 
x/2 c</ 
</ 2 ) 


(12.53) 


We have expressed the spin-one transformation in terms of the spin one-half 
amplitudes a , b , t% and d. 

For instance, if the 7-frame is rotated with respect to S by the angle a about 
the y-axis — as in Fig. 5-6 — the amplitudes in Table 12-4 are just the matrix 
elements of R y (a) in Table 6-2. 


a 


a 

a = COS-, 

b = —sin 

2‘ 

. OL 

* a 


c - sin 

d = cos- 



(12.54) 


Using these in (12.53), we get the formulas of (5.38), which we gave there without 
proof. 

What ever happened to the state | IV)V. Well, it is a spin-zero system, so it 
has only one state — it is the same in all coordinate systems . We can check that 
everything works out by taking the difference of Eq. (12.50) and (12.51); we get 
that 

1+ -) - I - +) = (ad - bc){\ +• - | - +'}}. 

But (ad — be) is the determinant of the spin one-half matrix, and so is equal to 1. 
We get that 

I IV') = | IV) 

for any relative orientation of the two coordinate frames. 
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13 


Propagation in a Crystal Lattice 


13-1 States for an electron in a one-dimensional lattice 

You would, at first sight, think that a low-energy electron would have great 
difficulty passing through a solid crystal. The atoms are packed together with 
their centers only a few angstroms apart, and the etfective diameter of the atom 
for electron scattering is roughly an angstrom or so. That is, the atoms are large, 
relative to their spacing, so that you would expect the mean free path between 
collisions to be of the order of a few angstroms — which is practically nothing. 
You would expect the electron to bump into one atom or another almost imme- 
diately. Nevertheless, it is a ubiquitous phenomenon of nature that if the lattice 
is perfect, the electrons are able to travel through the crystal smoothly and easily — 
almost as if they were in a vacuum. This strange fact is what lets metals conduct 
electricity so easily; it has also permitted the development of many practical 
devices. It is, for instance, what makes it possible for a transistor to imitate the 
radio tube. In a radio tube electrons move freely through a vacuum, while in the 
transistor they move freely through a crystal lattice. The machinery behind the 
behavior of a transistor will be described in this chapter; the next one will describe 
the application of these principles in various practical devices. 

The conduction of electrons in a crystal is one example of a very common 
phenomenon. Not only can electrons travel through crystals, but other “things” like 
atomic excitations can also travel in a similar manner. So the phenomenon which 
we want to discuss appears in many ways in the study of the physics of the solid 
state. 

You will remember that we have discussed many examples of two-state sys- 
tems. Let’s now think of an electron which can be in either one of two positions, 
in each of which it is in the same kind of environment. Let’s also suppose that 
there is a certain amplitude to go from one position to the other, and, of course, 
the same amplitude to go back, just as we have discussed for the hydrogen molec- 
ular ion in Section 10-1. The laws of quantum mechanics then give the following 
results. There are two possible states of definite energy for the electron. Each 
state can be described by the amplitude for the electron to be in each of the two 
basic positions. In either of the definite-energy states, the magnitudes of these 
two amplitudes are constant in time, and the phases vary in time with the same 
frequency. On the other hand, if we start the electron in one position, it will later 
have moved to the other, and still later will swing back again to the first position. 
The amplitude is analogous to the motions of two coupled pendulums. 

Now consider a perfect crystal lattice in which we imagine that an electron 
can be situated in a kind of “pit” at one particular atom and with some particular 
energy. Suppose also that the electron has some amplitude to move into a different 
pit at one of the nearby atoms. It is something like the two-state system — but with 
an additional complication. When the electron arrives at the neighboring atom, 
it can afterward move on to still another position as well as return to its starting 
point. Now we have a situation analogous not to two coupled pendulums, but to 
an infinite number of pendulums all coupled together. It is something like what 
you see in one of those machines — made with a long row of bars mounted on a 
torsion wire — that is used in first-year physics to demonstrate wave propagation. 

If you have a harmonic oscillator which is coupled to another harmonic 
oscillator, and that one to another, and so on ... , and if you start an irregularity 
in one place, the irregularity will propagate as a wave along the line. The same 
situation exists if you place an electron at one atom of a long chain of atoms. 


13-1 States for an electron in a 
one-dimensional lattice 

13-2 States of definite energy 

13-3 Time-dependent states 

13-4 An electron in a three- 
dimensional lattice 
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imperfection 

13-8 Scattering amplitudes and 
bound states 
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The 

base states 

of 

an 


electron in a one-dimensional crystal. 


Usually, the simplest way of analyzing the mechanical problem is not to think 
in terms of what happens if a pulse is started at a definite place, but rather in 
terms of steady-wave solutions. There exist certain patterns of displacements 
which propagate through the crystal as a wave of a single, fixed frequency. Now 
the same thing happens with the electron — and for the same reason., because It’s 
described in quantum mechanics by similar equations. 

You must appreciate one thing, however; the amplitude for the electron to 
be at a place is an amplitude , not a probability. If the electron were simply leaking 
from one place to another, like water going through a hole, the behavior would 
be completely different. For example, if we had two tanks of water connected 
by a tube to permit some leakage from one to the other, then the levels would 
approach each other exponentially. But for the electron, what happens is amplitude 
leakage and not just a plain probability leakage. And it’s a characteristic of the 
imaginary term — the / in the differential equations of quantum mechanics — which 
changes the exponential solution to an oscillatory solution. What happens then 
is quite different from the leakage between interconnected tanks. 

We want now to analyze quantitatively the quantum mechanical situation. 
Imagine a one-dimensional system made of a long line of atoms as shown in 
Fig. 13-1 (a). (A crystal is, of course, three-dimensional but the physics is very 
much the same; once you understand the one-dimensional case you will be able 
to understand what happens in three dimensions.) Next, we want to see what 
happens if we put a single electron on this line of atoms. Of course, in a real crystal 
there are already millions of electrons. But most of them (nearly all for an in- 
sulating crystal) take up positions in some pattern of motion each around its own 
atom — and everything is quite stationary. However, we now want to think about 
what happens if we put an extra electron in. We will not consider what the other 
ones are doing because we suppose that to change their motion involves a lot of 
excitation energy. We are going to add an electron as if to produce one slightly 
bound negative ion. In watching what the one extra electron'does we are making 
an approximation which disregards the mechanics of the inside workings of the 
atoms. 

Of course the electron could then move to another atom, transferring the 
negative ion to another place. We will suppose that just as in the case of an 
electron jumping between two protons, the electron can jump from one atom to 
the neighbor on either side with a certain amplitude. 

Now how do we describe such a system? What will be reasonable base states? 
If you remember what we did when we had only two possible positions, you can 
guess how it will go. Suppose that in our line of atoms the spacings are all equal; 
and that we number the atoms in sequence, as shown in Fig. 13-1 (a). One of the 
base states is that the electron is at atom number 6, another base state is that the 
electron is at atom number 7, or at atom number 8, and so on. We can describe 
the nth base state by saying that the electron is at atom number n. Let’s say that 
this is the base state | n). Figure 13-1 shows what we mean by the three base 
states 


| n — 1), | n), and | n + 1). 


Using these base states, any state | <f>) of our one-dimensional crystal can be de- 
scribed by giving all the amplitudes (n \ <f > ) that the state | <f>) is in one of the 
base states — which means the amplitude that it is located at one particular atom. 
Then we can write the state [ <j>) as a superposition of the base states 


i < t >) = 2 1 n )(” i tf>- 03.1) 

Next, we are going to suppose that when the electron is at one atom, there is a 
certain amplitude that it will leak to the atom on either side. And we’ll take the 
simplest case for which it can only leak to the nearest neighbors — to get to the 
next-nearest neighbor, it has to go in two steps. We’ll take that the amplitudes for 
the electron jump from one atom to the next is iA/h (per unit time). 
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For the moment we would like to write the amplitude (n | <£) to be on the 
nth atom as C n . Then Eq. (13.1) will be written 

I *> = £ I n)C n . (13.2) 

n 

If we knew each of the amplitudes C n at a given moment, we could take their 
absolute squares and get the probability that you would find the electron if you 
looked at atom n at that time. 

What will the situation be at some later time? By analogy with the two-state 
systems we have studied, we would propose that the Hamiltonian equations for 
this system should be made up of equations like this : 

ih dCM = Eo c n (t) - AC n+1 (t ) - ACn^it). (13.3) 

The first coefficient on the right, E 0 , is, physically, the energy the electron 
would have if it couldn’t leak away from one of the atoms. (It doesn’t matter 
what we call E 0 ; as we have seen many times, it represents really nothing but our 
choice of the zero of energy.) The next term represents the amplitude per unit 
time that the electron is leaking into the nth pit from the (n + l)st pit; and the 
last term is the amplitude for leakage from the (n — l)st pit. As usual, we’ll 
assume that A is a constant (independent of t). 

For a full description of the behavior of any state | </>), we would have one 
equation like (13.3) for every one of the amplitudes C n . Since we want to consider 
a crystal with a very large number of atoms, w.e’11 assume that there are an in- 
definitely large number of states — that the atoms go on forever in both directions. 
(To do the finite case, we will have to pay special attention to what happens at the 
ends.) If the number N of our base states is indefinitely large, then also our full 
Hamiltonian equations are infinite in number! We’ll write down just a sample: 


ih = £oC n _! - AC n — 2 - AC n , 

ih ^ = E 0 C n - AC - AC n+ i, (13.4) 

ih ^±1 = EoCn+1 - AC n - AC n+2 , 


13-2 States of definite energy 

We could study many things about an electron in a lattice, but first let’s try 
to find the states of definite energy. As we have seen in earlier chapters this means 
that we have to find a situation in which the amplitudes all change at the same 
frequency if they change with time at all. We look for solutions of the form 

Cn = a n e - lEtl \ (13.5) 

The complex number a n tell us about the non-time-varying part of the amplitude 
to find the electron at the nth atom. If we put this trial solution into the equations 
of (13.4) to test them out, we get the result 


Ea n = E 0 a n - Aa n+1 - Aa n _ x . (13.6) 

We have an infinite number of such equations for the infinite number of unknowns 
a n — which is rather petrifying. 

All we have to do is take the determinant . . . but wait! Determinants are 
fine when there are 2, 3, or 4 equations. But if there are a large number — or an 
infinite number — of equations, the determinants are not very convenient. We’d 
better just try to solve the equations directl^. First, let’s label the atoms by their 
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positions; we'll say that the atom n is at x n and the atom (n + 1) is at x n +i. If 
the atomic spacing is b — as in Fig. 13-1 — we will have that •*n+i = x n + b. 
By choosing our origin at atom zero, we can even have it that x n = nb. We can 
rewrite Eq. (13.5) as 

C n = a(x n )e-' Etl \ (13.7) 


and Eq. (13.6) would become 

Ea(x n ) = E 0 a(x n +\) — Aa(x n+1 ) - Aa(x n _ x ). (13.8) 

Or, using the fact that = x n + b, we could also write 

Ea(x n ) = E 0 a(x n ) - Aa(x n + b) - Aa(x n — b). (13.9) 

This equation is somewhat similar to a differential equation. It tells us that a 
quantity, a(x ), at one point, (x„), is related to the same physical quantity at some 
neighboring points, (x n ± b). (A differential equation relates the value of a func- 
tion at a point to the values at infinitesimally nearby points.) Perhaps the methods 
we usually use for solving differential equations will also work here, let’s try. 

Linear differential equations with constant coefficients can always be solved 
in terms of exponential functions. We can try the same thing here; let’s take as a 
trial solution 

a(x n ) = e tk *\ (13.10) 


Then Eq. (13.9) becomes 


Ee lkx » = E 0 e lkx " - Ae xk{Xn+h ' - Ae lk(Xn ~ b) . 

(13.11) 

We can now divide out the common factor e lkx ^\ we get 


T 

1 

% 

1 

11 

(13.12) 

The last two terms are just equal to (1A cos kb), so 


E = E 0 — 2A cos kb. 

(13.13) 

We have found that for any choice at all for the constant k there is a solution 
whose energy is given by this equation. There are various possible energies 
depending on k , and each k corresponds to a different solution. There are an 
infinite number of solutions — which is not surprising, since we started out with 
an infinite number of base states. 

Let’s see what these solutions mean. For each k , the a’s are given by Eq. 
(13.10). The amplitudes C n are then given by 

C — o %kXn o~^ l I^ Et 

— c c , 

(13.14) 


where you should remember that the energy E also depends on k as given in Eq. 
(13.13). The space dependence of the amplitudes is e xhx «. The amplitudes 
oscillate as we go along from one atom to the next. 

We mean that, in space, the amplitude goes as a complex oscillation — the 
magnitude is the same at every atom, but the phase at a given time advances by the 
amount {ikb) from one atom to the next. We can visualize what is going on by 
plotting a vertical line to show just the real part at each atom as we have done in 
Fig. 13-2. The envelope of these vertical lines (as shown by the broken-line curve) 


Fig. 13-2. Variation of the real part 
of C n with x„. 
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is, of course, a cosine curve. The imaginary part of C n is also an oscillating function, 
but is shifted 90° in phase so that the absolute square (which is the sum of the 
squares of the real and imaginary parts) is the same for all the C s. 

Thus if we pick a /c, we get a stationary state of a particular energy E . And 
for any such state, the electron is equally likely to be found at every atom — there 
is no preference for one atom or the other. Only the phase is different for different 
atoms. Also, as time goes on the phases vary. From Eq. (13.14) the real and 
imaginary parts propagate along the crystal as waves — namely as the real or 
imaginary parts of 

e i[kx n -{EW)t] ' (13.15) 


The wave can travel toward positive or negative * depending on the sign we have 
picked for k. 

Notice that we have been assuming that the number k that we put in our 
trial solution, Eq. (13.10), was a real number. We can see now why that must be 
so if we have an infinite line of atoms. Suppose that k were an imaginary number, 
say ik\ Then the amplitudes a n would go as which means that the amplitude 
would get larger and larger as we go toward large x’s — or toward large negative 
x’s if k ' is a negative number. This kind of solution would be O.K. if we were 
dealing with line of atoms that ended, but cannot be a physical solution for an 
infinite chain of atoms. It would give infinite amplitudes — and, therefore, infinite 
probabilities — which can’t represent a real situation. Later on we will see an ex- 
ample in which an imaginary k does make sense. 

The relation between the energy E and the wave number k as given in Eq. 
(13.13) is plotted in Fig. 13-3. As you can see from the figure, the energy can go 
from (Eq — 2 A) at k = 0 to (E 0 + 2 A) at k = =*= 7 r/b. The graph is plotted 
for positive A; if A were negative, the curve would simply be inverted, but the 
range would be the same. The significant result is that any energy is possible 
within a certain range or “band” of energies, but no others. According to our 
assumptions, if an electron in a crystal is in a stationary state, it can have no 
energy other than values in this band. 

According to Eq. (13.10), the smallest k 3 s correspond to low-energy states — 
E ^ (E 0 — 2 A). As k increases in magnitude (toward either positive or negative 
values) the energy at first increases, but then reaches a maximum at k — =*=7r/6, 
as shown in Fig. 13-3. For k 3 s larger than ir/b, the energy would start to decrease 
again. But we do not really need to consider such values of k , because they do 
not give new states — they just repeat states we already have for smaller k. We 
can see that in the following way. Consider the lowest energy state for which 
k = 0. The coefficient a(x n ) is the same for all x n . Now we would get the same 
energy for k = 2w/b. But then, using Eq. (13.10), we have that 

a(x n ) = e i(2rlb)x \ 



However, taking x 0 to be at the origin, we can set x n = nb ; then a(x„) becomes Fig. 1 3 3. The energy of the station- 

ary states as a function of the param- 

a(x„) = e i2rn = 1. e,er k - 

The state described by these a(x n ) is physically the same state we got for k = 0. 

It does not represent a different solution. 

As another example, suppose that k were ir/4b. The real part of a(x n ) would 
vary as shown by curve 1 in Fig. 13-4. If k were seven times larger (k = 77r/4), 
the real part of a(x n ) would vary as shown by curve 2 in the figure. (The complete 



Fig. 13-4. Two values of k which 
represent the same physical situation; 
curve 1 is for k = ir/4, curve 2 is for 
k = 7tt/4. 
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cosine curves don’t mean anything, of course; all that matters is their values at 
the points x n . The curves are just to help you see how things are going.) You see 
that both values of k give the same amplitudes at all of the x n ’s. 

The upshot is that we have all the possible solutions of our problem if we take 
only k's in a certain limited range. We’ll pick the range between — 7 r/b and 
+7 r/b — the one shown in Fig. 13-3. In this range, the energy of the stationary 
states increases uniformly with an increase in the magnitude of k. 

One side remark about something you can play with. Suppose that the elec- 
tron cannot only jump to the nearest neighbor with amplitude iA/h, but also has 
the possibility to jump in one direct leap to the next nearest neighbor with some 
other amplitude iB/h. You will find that the solution can again be written in the 
form a n ~ e lkXn — this type of solution is universal You will also find that the 
stationary states with wave number k have an energy equal to {E 0 — 2 A cos kb — 
2B cos 2kb). This shows that the shape of the curve of E against k is not universal, 
but depends upon the particular assumptions of the problem. It is not always a 
cosine wave — it’s not even necessarily symmetrical about some horizontal line. 
It is true, however, that the curve always repeats itself outside of the interval from 
— 7 r/b to 7 r/b, so you never need to worry about other values of k. 

Let’s look a little more closely at what happens for small k— that is, when 
the variations of the amplitudes from one x n to the next are quite slow. Suppose 
we choose our zero of energy by defining £0 = 2 A; then the minimum of the 
curve in Fig. 13-3 is at the zero of energy. For small enough k, we can write that 

cos kb « 1 - k 2 b 2 / 2, 

and the energy of Eq. (13.13) becomes 

E = Ak 2 b 2 . (13.16) 

We have that the energy of the state is proportional to the square of the wave 
number which describes the spatial variations of the amplitudes C n . 


13-3 Time-dependent states 

In this section we would like to discuss the behavior of states in the one- 
dimensional lattice in more detail. If the amplitude for an electron to be at 
is C n , the probability of finding it there is \C n \ 2 . For the stationary states described 
by Eq. (13.12), this probability is the same for all x n and does not change with time. 
How can we represent a situation which we would describe roughly by saying an 
electron of a certain energy is localized in a certain region — so that it is more likely 
to be found at one place than at some other place? We can do that by making 
a superposition of several solutions like Eq. (13.12) with slightly different values 
of k — and, therefore, slightly different energies. Then at t = 0, at least, the ampli- 
tude C n will vary with position because of the interference between the various 
terms, just as one gets beats when there is a mixture of waves of different wave- 
lengths (as we discussed in Chapter 48, Vol. I). So we can make up a “wave packet” 
with a predominant wave number k 0i but with various other wave numbers near k 0 . f 
In our superposition of stationary states, the amplitudes with different k J s 
will represent states of slightly different energies, and, therefore, of slightly different 
frequencies; the interference pattern of the total C n will, therefore, also vary with 
time — there will be a pattern of “beats.” As we have seen in Chapter 48 of Volume 
I, the peaks of the beats [the place where |C(x rt )| 2 is large] will move along in x 
as time goes on; they move with the speed we have called the “group velocity.” 
We found that this group velocity was related to the variation of k with frequency by 

_ did 

v * xo " v ~ Hk ’ 


t Provided we do not try to make the packet too narrow. 
13-6 


(13.17) 



the same derivation would apply equally well here. An electron state which is a 
“clump" — namely one for which the C n vary in space like the wave packet of 
Fig. 13-5 — will move along our one-dimensional “crystal” with the speed v equal 
to dco/dk , where c a — E/h. Using (13.16) for E, we get that 

(13.18) 

In other words, the electrons move along with a speed proportional to the typical 
k. Equation (13.16) then says that the energy of such an electron is proportional 
to the square of its velocity — it acts like a classical particle. So long as we look 
at things on a scale gross enough that we don’t see the fine structure, our quantum 
mechanical picture begins to give results like classical physics. In fact, if we solve 
Eq. (13.18) for k and substitute into (13.16), we can write 

E=im et{ v 2 , (13.19) 

where m c a is a constant. The extra “energy of motion” of the electron in a packet 
depends on the velocity just as for a classical particle. The constant m e a — called 
the “effective mass” — is given by 


h 2 

2Ab2 


Also notice that we can write 


m e a v ~ hk . 


(13.20) 


(13.21) 


If we choose to call m c u v the “momentum,” it is related to the wave number k 
in the way we have described earlier for a free particle. 

Don’t forget that m e ff has nothing to do with the real mass of an electron. 
It may be quite different — although in real crystals it often happens to turn out to be 
the same general order of magnitude, about 2 to 20 times the free-space mass of 
the electron. 

We have now explained a remarkable mystery — how an electron in a crystal 
(like an extra electron put into germanium) can ride right through the crystal and 
flow perfectly freely even though it has to hit all the atoms. It does so by having 
its amplitudes going pip-pip-pip from one atom to the next, working its way through 
the crystal. That is how a solid can conduct electricity. 


13-4 An electron in a three-dimensional lattice 

Let’s look for a moment at how we could apply the same ideas to see what 
happens to an electron in three dimensions. The results turn out to be very similar. 
Suppose we have a rectangular lattice of atoms with lattice spacings of a, 6, c in 
the three directions (If you want a cubic lattice, take the three spacings all equal.) 
Also suppose that the amplitude to leap in the ^-direction to a neighbor is ( iA x /h ), 
to leap in the jr-direction is ( iA y /h ), and to leap in the z-direction is (iA z /h). Now 
how should we describe the base states? As in the one-dimensional case, one 
base state is that the electron is at the atom whose locations are x, y, z, where 
(x, y , z) is one of the lattice points. Choosing our origin at one atom, these points 
are all at 

x — n r a, y = n y b , and z = n z c , 

where n x , n y , n z are any three integers. Instead of using subscripts to indicate such 
points, we will now just use x, y, and z, understanding that they take on only their 
values at the lattice points. Thus the base state is represented by the symbol 
| electron at x, y, z), and the amplitude for an electron in some state | ip) to be in 
this base state is C(x, y, z) = (electron at x, y, z | p). 


Re CfXn) 


ft 



I 


Fig. 1 3-5. The real part of C(x n ) as 
a function of x for a superposition of 
several states of similar energy. (The 
spacing b is very small on the scale of 
x shown.) 
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As before, the amplitudes C(x, y, z) may vary with time. With our assump- 
tions, the Hamiltonian equations should be like this: 

ifi = E 0 C(x, y, z) - A x C(x + a, y, z) - A x C(x - a, y, z) 

— A u C(x, y + b,z) — AyC(x, y — b,z) 

— A z C(x,y,z + c) — A z C(x,y,z — c ). (13.22) 

It looks rather long, but you can see where each term comes from. 

Again we can try to find a stationary state in which all the C’s vary with time 
in the same way. Again the solution is an exponential: 

C(x,y,z) = e -*£‘/y<M+M+M> (13.23) 

If you substitute this into (13.22) you see that it works, provided that the energy 
E is related to k x , k y , and k z in the following way: 

E = E 0 — 2A x cos k x a — 2A V cos k y b — 2A z co$k z c. (13.24) 

The energy now depends on the three wave numbers k x , k y , k z , which, incidentally, 
are the components of a three-dimensional vector k. In fact, we can write Eq. 
(13.23) in vector notation as 

C(x,y,z) = e - t * tl *e~' k ‘ r (13.25) 

The amplitude varies as a a mplex plane wave in three dimensions, moving in the 
direction of k, and with the wave number k = (k 2 x 4- k% 4- k 2 z ) 112 . 

The energy associated with these stationary states depends on the three com- 
ponents of k in the complicated way given in Eq. (13.24). The nature, of the 
variation of E with k depends on relative signs and magnitudes of A x , A y , and A z . 
If these three numbers are all positive, and if we are interested in small values of 
k , the dependence is relatively simple. 

Expanding the cosines as we did before to get Eq. (13.16), we can now get that 

E = E min + A x a 2 kl + Aybkl + A z ck\. (13.26) 

For a simple cubic lattice with lattice spacing a we expect that A x and A v 
and A z would be equal — say all are just A — and we would have just 

E = E m ^ + Aa 2 (k x + k 2 + kl), 
or 

E = E min + Aa 2 k 2 . (13.27) 

This is just like Eq. (13.16). Following the arguments used there, we would con- 
clude that an electron packet in three dimensions (made up by superposing many 
states with nearly equal energies) also moves like a classical particle with some 
effective mass. 

In a crystal with a lower symmetry than cubic (or even in a cubic crystal in 
which the state of the electron at each atom is not symmetrical) the three coefficients 
A Xi A yy and A z are different. Then the “effective mass” of an electron localized 
in a small region depends on its direction of motion . It could, for instance, have a 
different inertia for motion in the x-direction than for motion in the ^-direction. 
(The details of such a situation are sometimes described in terms of an “effective 
mass tensor.”) 


13-5 Other states in a lattice 

According to Eq. (13.24) the electron states we have been talking about can 
have energies only m a certain “band” of energies which covers the energy range 
from the minimum energy 
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E o — 2(A X + A y + A z ) 



to the maximum energy 


E o + 2 (A x + A y + A z ). 

Other energies are possible, but they belong to a different class of electron states. 
For the states we have described, we imagined base states in which an electron is 
placed on an atom of the crystal in some particular state, say the lowest energy 
state. 

If you have an atom in empty space, and add an electron to make an ion, the 
ion can be formed in many ways. The electron can go on in such a way as to make 
the state of lowest energy, or it can go on to make one or another of many possible 
“excited states” of the ion each with a definite energy above the lowest energy. The 
same thing can happen in a crystal. Let’s suppose that the energy E 0 we picked 
above corresponds to base states which are ions of the lowest possible energy. 
We could also imagine a new set of base states in which the electron sits near the 
nth atom in a different way — in one of the excited states of the ion — so that the 
energy E 0 is now quite a bit higher. As before there is some amplitude A (different 
from before) that the electron will jump from its excited state at one atom to the 
same excited state at a neighboring atom. The whole analysis goes as before, we 
find a band of possible energies centered at a higher energy. There can, in general, 
be many such bands each corresponding to a different level of excitation. 

There are also other possibilities. There may be some amplitude that the 
electron jumps from an excited condition at one atom to an unexcited condition 
at the next atom. (This is called an interaction between bands.) The mathematical 
theory gets more and more complicated as you take into account more and more 
bands and add more and more coefficients for leakage between the possible states. 
No new ideas are involved, however; the equations are set up much as we have 
done in our simple example. 

We should remark also that there is not much more to be said about the vari- 
ous coefficients, such as the amplitude A, which appear in the theory. Generally 
they are very hard to calculate, so in practical cases very little is known theoretically 
about these parameters and for any particular real situation we can only take 
values determined experimentally. 

There are other situations where the physics and mathematics are almost 
exactly like what we have found for an electron moving in a crystal, but in which 
the “object” that moves is quite different. For instance, suppose that our original 
crystal — or rather linear lattice — was a line of neutral atoms, each with a loosely 
bound outer electron. Then imagine that we were to remove one electron. Which 
atom has lost its electron? Let C n now represent the amplitude that the electron 
is missing from the atom at x n . There will, in general, be some amplitude iA/h 
that the electron at a neighboring atom — say the (n — l)st atom — will jump to 
the nth leaving the (n — l)st atom without its electron. This is the same as saying 
that there is an amplitude A for the “missing electron” to jump from the nth 
atom to the (n — l)st atom. You can see that the equations will be exactly the 
same — of course, the value of A need not be the same as we had before. Again 
we will get the same formulas for the energy levels, for the “waves” of probability 
which move through the crystal with the group velocity of Eq. (13.18), for the 
effective mass, and so on. Only now the waves describe the behavior of the missing 
electron — or “hole” as it is called. So a “hole” acts just like a particle with a 
certain mass m c ff. You can see that this particle will appear to have a positive 
charge. We’ll have some more to say about such holes in the next chapter. 

As another example, we can think of a line of identical neutral atoms one of 
which has been put into an excited state — that is, with more than its normal 
ground state energy. Let C n be the amplitude that the nth atom has the excitation. 
It can interact with a neighboring atom by handing over to it the extra energy and 
returning to the ground state. Call the amplitude for this process iA/h. You 
can see that it’s the same mathematics all over again. Now the object which moves 
is called an exciton . It behaves like a neutral “particle” moving through the crystal, 
carrying the excitation energy. Such motion may be involved in certain biological 
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processes such as vision, or photosynthesis. It has been guessed that the absorption 
of light in the retina produces an “exciton” which moves through some periodic 
structure (such as the layers in the rods we described in Chapter 36, VoL 1 ; see 
Fig. 36-5) to be accumulated at some special station where the energy is used to 
induce a chemical reaction. 


13-6 Scattering from imperfections in the lattice 

We want now to consider the case of a single electron in a crystal which is 
not perfect. Our earlier analysis says that perfect crystals have perfect conductivity 
— that electrons can go slipping through the crystal, as in a vacuum, without friction. 
One of the most important things that can stop an electron from going on forever 
is an imperfection or irregularity in the crystal. As an example, suppose that 
somewhere in the crystal there is a missing atom; or suppose that someone put 
one wrong atom at one of the atomic sites so that things there are different than 
at the other atomic sites. Say the energy, E 0 or the amplitude A could be different. 
How would we describe what happens then? 

To be specific, we will return to the one-dimensional case and we will assume 
that atom number “zero” is an “impurity” atom and has a different value of E 0 
than any of the other atoms. Let’s call this energy (E 0 + F). What happens? 
When an electron arrives at atom “zero” there is some probability that the electron 
is scattered backwards. If a wave packet is moving along and it reaches a place 
where things are a little bit different, some of it will continue onward and some of 
it will bounce back. It’s quite difficult to analyze such a situation using a wave 
packet, because everything varies in time. It is much easier to work with steady- 
state solutions. So we will work with stationary states, which we will find can be 
made up of continuous waves which have transmitted and reflected parts. In 
three dimensions we would call the reflected part the scattered wave, since it 
would spread out in various directions. 

We start out with a set of equations which are just like the ones in Eq. (13.6) 
except that the equation for n = 0 is different from all the rest. The five equations 
for n = — 2 , — 1 , 0 , + 1 , and +2 look like this: 


Ea- 2 = E 0 a^ 2 ~ Aa_i — Ad- 3 , 

Ed— i = E Qdi_ — Aa 0 — Ad_ 2 , 

Ea 0 = (E 0 - 1 - F)d 0 - Adi — (13.28) 

Ed i = E 0 d i — Ad 2 ~ Ad 0 , 

Ed 2 — EqQz — Ad 3 — Ad i, 


There are, of course, all the other equations for \n\ is greater than 2. They will 
look just like Eq. (13.16). 

For the general case, we really ought to use a different A for the amplitude 
that the electron jumps to or from atom “zero,” but the main features of what 
goes on will come out of a simplified example in which all the A’s are equal. 

Equation (13.10) would still work as a solution for all of the equations except 
the one for atom “zero” — it isn’t right for that one equation. We need a different 
solution which we can cook up in the following way. Equation (13.10) represents 
a wave going in the positive x-direction. A wave going in the negative x-direction 
would have been an equally good solution. It would be written 

d(x n ) = 

The most general solution we could have taken for Eq. (13.6) would be a com- 
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bination of a forward and a backward wave, namely 


a n = ae tkXn + 0e~' kXn . (13.29) 

This solution represents a complex wave of amplitude a moving in the +x-direction 
and a wave of amplitude 0 moving in the — x-direction. 

Now take a look at the set of equations for our new problem — the ones in 
(13.28) together with those for all the other atoms. The equations involving 
a n ’s with n < 1 are all satisfied by Eq. (13.29), with the condition that k is related 
to E and the lattice spacing b by 

E = E 0 - 2A cos kb. (13.30) 

The physical meaning is an “incident” wave of amplitude a approaching atom 
“zero” (the “scatterer”) from the left, and a “scattered” or “reflected” wave of 
amplitude ft going back toward the left. We do not loose any generality if we set 
the amplitude a of the incident wave equal to 1. Then the amplitude /3 is, in 
general, a complex number. 

We can say all the same things about the solutions of a n for n > 1. The 
coefficients could be different, so we would have for them 


a n = + he~ lkx \ for n > 1. (13.31) 


Here, 7 is the amplitude of a wave going to the right and 5 a wave coming from 
the right. We want to consider the physical situation in which a wave is originally 
started only from the left, and there is only a “transmitted” wave that comes out 
beyond the scatterer — or impurity atom. We will try for a solution in which 
6 = 0. We can, certainly, satisfy all of the equations for the a n except for the 
middle three in Eq. (13.28) by the following trial solutions. 


SCATTERED WAVE 


p- 

! TRANSMITTED WAvE 

INCIDENT WAVE 

1 

• • • • 

| V 

1 

1 

■ • • • • 


a n (forn < 0) = e lkXn + 0e~' kZn , 32) 

a n (for n > 0) = 7e ,fcr ". 

The situation we are talking about is illustrated in Fig. 13-6. 

By using the formulas in Eq. (13.32) for and a +1 , the three middle equa- 
tions of Eq. (13.28) will allow us to solve for a 0 and also for the two coefficients 
/? and 7. So we have found a complete solution. Setting x n = nb , we have to solve 
the three equations 


n — -4 -3 -2 -I 0 I 2 3 4 

Fig. 13-6. Waves in a one-dimen- 
sional lattice with one “impurity” atom 
at n = 0. 


(E - E 0 ){e' ki ~ b) + 0e- tki ~ b) } = -A{a 0 + e‘ k{ ~ 2b) + ^e~ lki - 2b) } , 

(E - E 0 - F)a 0 = -A{ ye lkb + e lk{ ~ b) + Ue~ lH - b) } , (13.33) 

(E - E 0 ) ye M = -A{ye lH2b) + a 0 ). 

Remember that E is given in terms of k by Eq. (13.30). If you substitute this 
value for E into the equations, and remember that cos x = \{e lx + e~ iJr % you 
get from the first equation that 


a 0 = 

i + A; 

(13.34) 

and from the third equation that 



an 

= T. 

(13.35) 

These are consistent only if 



y = 

1 + P 

(13.36) 


This equation says that the transmitted wave (7) is just the original incident wave 
(1) with an added wave (£) equal to the reflected wave. This is not always true, 
but happens to be so for a scattering at one atom only. If there were a clump of 
impurity atoms, the amount added to the forward wave would not necessarily 
be the same as the reflected wave. 
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We can get the amplitude /3 of the reflected wave from the middle equation 
of Eq. (13.33); we find that 


0 = 


— F 

F — 2iA sin kb 


(13.37) 


We have the complete solution for the lattice with one unusual atom. 

You may be wondering how the transmitted wave can be “more” than the 
incident wave as it appears in Eq. (13.34). Remember, though, that and 7 are 
complex numbers and that the number of particles (or rather, the probability of 
finding a particle) in a wave is proportional to the absolute square of the amplitude. 
In fact, there will be “conservation of electrons” only if 

|/3| 2 + (t| 2 = 1. (13.38) 

You can show that this is true for our solution. 


13-7 Trapping by a lattice imperfection 

There is another interesting situation that can arise if F is a negative number. 
If ihe energy of the electron is lower at the impurity atom (at n = 0) than it is 
anywhere else, then the electron can get caught on this atom. That is, if (E 0 + F) 
is below the bottom of the band at ( E 0 — 2A), then the electron can get “trapped” 
in a state with E < E 0 — 2A. Such a solution cannot come out of what we have 
done so far. We can get this solution, however, if we permit the trial solution we 
took in Eq. (13.15) to have an imaginary number for k. Let’s set k = Ik Again, we 
can have different solutions for n < 0 and for n > 0. A possible solution for 
n < 0 might be 

a n (for n < 0) = ce +K:c “. (13.39) 

We have to take a plus sign in the exponent; otherwise the amplitude would get 
indefinitely large for large negative values of n. Similarly, a possible solution for 
n > 0 would be 

a n (for n > 0) = c'e”*** (13.40) 

If we put these trial solutions into Eq. (13.28) all but the middle three are 
satisfied provided that 

E = E 0 - A(e* b + e~ Kb ). (13.41) 

Since the sum of the two exponential terms is always greater than 2, this energy 
is below the regular band, and is what we are looking for. The remaining three 
equations in Eq. (13.28) are satisfied if c = c r and if k is chosen so that 

A(e* b - e ~ Kb ) = —F. (13.42) 

Combining this equation with Eq. (13.41) we can find the energy of the trapped 
electron ; we get 

E = E 0 - V 4A 2 + F 2 . (13.43) 

The trapped electron has a unique energy — located somewhat below the con- 
duction band. 

Notice that the amplitudes we have in Eq. (13.39) and (13.40) do not say that 
the trapped electron sits right on the impurity atom. The probability of finding 
the electron at nearby atoms is given by the square of these amplitudes. For one 
particular choice of the parameters it might vary as shown in the bar graph of 
Fig. 13-7. The probability is greatest for finding the electron on the impurity 
atom. For nearby atoms the probability drops off exponentially with the distance 
from the impurity atom. This is another example of “barrier penetration.” From 
the point-of-view of classical physics the electron doesn’t have enough energy to 
get away from the energy “hole” at the trapping center. But quantum mechanically 
it can leak out a little way. 
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Fig. 13-7. The relative probabilities 
of finding a trapped electron at atomic 
sites near the trapping impurity atom. 



13-8 Scattering amplitudes and bound states 

Finally, our example can be used to illustrate a point which is very useful 
these days in the physics of high-energy particles. It has to do with a relationship 
between scattering amplitudes and bound states. Suppose we have discovered — 
through experiment and theoretical analysis — the way that pions scatter from 
protons. Then a new particle is discovered and someone wonders whether maybe 
it is just a combination of a pion and a proton held together in some bound state 
(in an analogy to the way an electron is bound to a proton to make a hydrogen 
atom). By a bound state we mean a combination which has a lower energy than 
the two free-particles. 

There is a general theory which says that a bound state will exist at that 
energy at which the scattering amplitude becomes infinite if extrapolated alge- 
braically (the mathematical term is “analytically continued”) to energy regions 
outside of the permitted band. 

The physical reason for this is as follows. A bound state is a situation in 
which there are only waves tied on to a point and there’s no wave coming in to get 
it started, it just exists there by itself. The relative proportion between the so-called 
“scattered” or created wave and the wave being “sent in” is infinite. We can test 
this idea in our example. Let’s write our expression Eq. (13.37) for the scattered 
amplitude directly in terms of the energy E of the particle being scattered (instead 
of in terms of A:). Since Equation (13.30) can be rewritten as 

2 A sin kb = \J 4A 2 - (E - F 0 ) 2 , 
the scattered amplitude is 

~F 

^ F - is/AA'i - (E - E 0 )* (13.44) 

From our derivation, this equation should be used only for real states — those with 
energies in the energy band, E = E 0 ± 2 A. But suppose we forget that fact and 
extend the formula into the "unphysical” energy regions where | E — E 0 \ > 2 A, 
For these unphysical regions we can writef 

V4A 2 - (E - £oF = iV(E ~ E 0 ) 2 — 4 A 2 . 

Then the “scattering amplitude,” whatever it may mean, is 

P = ^ ~ F — • (13.45) 

F + V(E - E 0 y - 4 A* 

Now we ask: Is there any energy E for which f$ becomes infinite (i.e., for which the 
expression for ft has a “pole”)? Yes, so long as F is negative, the denominator of 
Eq (13.45) wilt be zero when 

(E - E 0 ) 2 - 4 A 2 = F 2 , 

or when 

E = Eq =fc \/4A*~+ F 2 . 

The minus sign gives just the energy we found in Eq. (13.43) for the trapped energy. 

What about the plus sign? This gives an energy above the allowed energy 
band. And indeed there is another bound state there which we missed when we 
solved the equations of Eq. (13.28). We leave it as a puzzle for you to find the 
energy and amplitudes a n for this bound state. 

The relation between scattering and bound states provides one of the most 
useful clues in the current search for an understanding of the experimental ob- 
servations about the new strange particles. 


f The sign of the root to be chosen here is a technical point related to the allowed 
signs of k in Eqs. (13.39) and (13 40). We won’t go into it here. 
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14 


Semiconductors 


14-1 Electrons and holes in semiconductors 

One of the remarkable and dramatic developments in recent years has been 
the application of solid state science to technical developments in electrical devices 
such as transistors. The study of semiconductors led to the discovery of their 
useful properties and to a large number of practical applications. The field is 
changing so rapidly that what we tell you today may be incorrect next year. It 
will certainly be incomplete. And it is perfectly clear that with the continuing 
study of these materials many new and more wonderful things will be possible 
as time goes on. You will not need to understand this chapter for what comes 
later in this volume, but you may find it interesting to see that at least something 
of what you are learning has some relation to the practical world. 

There are large numbers of semiconductors known, but we’ll concentrate 
on those which now have the greatest technical application. They are also the 
ones that are best understood, and in understanding them we will obtain a degree 
of understanding of many of the others. The semiconductor substances in most 
common use today are silicon and germanium. These elements crystallize in the 
diamond lattice, a kind of cubic structure in which the atoms have tetrahedral 
bonding with their four nearest neighbors. They are insulators at very low tempera- 
tures — near absolute zero — although they do conduct electricity somewhat at 
room temperature. They are not metals; they are called semiconductors . 

If we somehow put an extra electron into a crystal of silicon or germanium 
which is at a low temperature, we will have just the situation we described in the 
last chapter. The electron will be able to wander around in the crystal jumping 
from one atomic site to the next. Actually, we have looked only at the behavior 
of electrons in a rectangular lattice, and the equations would be somewhat different 
for the real lattice of silicon or germanium. All of the essential points are, however, 
illustrated by the results for the rectangular lattice. 

As we saw in Chapter 13, these electrons can have energies only in a certain 
energy band — called the conduction band . Within this band the energy is related 
to the wave-number k of the probability amplitude C (see Eq. 13.24) by 

E = E 0 — 2k x coskra — 2A ?y cos k,,b — 2A cos k z c\ (14.1) 

The T’s are the amplitudes for jumping in the x>, j;-, and z-directions, and a , A. 
and c are the lattice spacings in these directions. 

For energies near the bottom of the band, we can approximate Eq. (14.1) by 

E = E, + Asrk'i + A,,b 2 k 2 + A 2 c 2 ki (14.2) 


(see Section 13-4) 

If we think of electron motion in some particular direction, so that the com- 
ponents of k are always in the same ratio, the energy is a quadratic function of 
the wave number — and as we have seen of the momentum of the electron. We 
can write 

E = E lnm + ak\ (14.3) 

where a is some constant, and we can make a graph of E versus k as in Fig. 14-1. 
We’ll call such a graph an ‘‘energy diagram.” An electron in a particular state of 
energy and momentum can be indicated by a point such as S m the figure 
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Fig. 14-1. The energy diagram for 
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As we also mentioned in Chapter 13, we can have a similar situation if we 
remove an electron from a neutral insulator. Then, an electron can jump over 
from a nearby atom and fill the “hole,” but leaving another “hole” at the atom it 
started from. We can describe this behavior by writing an amplitude to find the 
hole at any particular atom, and by saying that the hole can jump from one atom to 
the next, (Clearly, the amplitudes A that the hole jumps from atom a to atom b 
is just the same as the amplitude that an electron on atom b jumps into the hole 
at atom a.) The mathematics is just the same for the hole as it was for the extra 
electron, and we get again that the energy of the hole is related to its wave number 
by an equation just like Eq. (14.1) or (14.2), except, of course, with different nu- 
merical values for the amplitudes A x , A y , and A z . The hole has an energy related 
to the wave number of its probability amplitudes. Its energy lies in a restricted 
band, and near the bottom of the band its energy varies quadratically with the 
wave number — or momentum— just as in Fig. 14-1. Following the arguments of 
Section 13-3, we would find that the hole also behaves like a classical particle 
with a certain effective mass — except that in noncubic crystals the mass depends 
on the direction of motion. So the hole behaves like a positive particle moving 
through the crystal. The charge of the hole-particle is positive, because it is located 
at the site of a missing electron; and when it moves in one direction there are ac- 
tually electrons moving in the opposite direction. 

If we put several electrons into a neutral crystal, they will move around much 
like the atoms of a low-pressure gas. If there are not too many, their interactions 
will not be very important. If we then put an electric field across the crystal, the 
electrons will start to move and an electric current will flow. Eventually they would 
all be drawn to one edge of the crystal, and, if there is a metal electrode there, 
they would be collected, leaving the crystal neutral. 

Similarly we could put many holes into a crystal. They would roam around 
at random unless there is an electric field. With a field they would flow toward 
the negative terminal, and would be “collected” — what actually happens is that 
they are neutralized by electrons from the metal terminal. 

One can also have both holes and electrons together. If there are not too 
many, they will all go their way independently. With an electric field, they will 
all contribute to the current. For obvious reasons, electrons are called the negative 
carriers and the holes are called the positive carriers . 

We have so far considered that electrons are put into the crystal from the 
outside, or are removed to make a hole. It is also possible to “create” an electron- 
hole pair by taking a bound electron away from one neutral atom and putting it 
some distance away in the same crystal. We then have a free electron and a free 
hole, and the two can move about as we have described. 

The energy required to put an electron into a state S — we say to “create” 
the state S — is the energy E~ shown in Fig. 14-2. It is some energy above £' n . 
The energy required to “create” a hole in some state S' is the energy of Fig. 
14-3, which is some energy greater than Emm- Now if we create a pair in the states 
S and S' , the energy required is just E~ + 




Fig. 14-2. The energy E is required Fig, 14-3. The energy E + is required 

to “create" a free electron. to “create" a hole in the state S'. 
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The creation of pairs is a common process (as we will see later), so many 
people lik^ to put Fig. 14-2 and Fig. 14-3 together on the same graph — with the 
hole energy plotted downward , although it is, of course a positive energy. We have 
combined our two graphs in this way in Fig. 14-4. The advantage of such a 
graph is that the energy f pair = E~ + E + required to create a pair with the 
electron in S and the hole in S' is just the vertical distance between S and S' as 
shown in Fig. 14-4. The minimum energy required to create a pair is called the 
“gap” energy and is equal to £“ n + £„,„. 

Sometimes you will see a simpler diagram called an energy level diagram which 
is drawn when people are not interested in the k variable. Such a diagram — shown 
in Fig. 14-5 — just shows the possible energies for the electrons and holes. f 

How can electron-hole pairs be created 7 There are several ways. For ex- 
ample, photons of light (or x-rays) can be absorbed and create a pair if the photon 
energy is above the energy of the gap. The rate at which pairs are produced is 
proportional to the light intensity. If two electrodes are plated on a wafer of the 
crystal and a “bias” voltage is applied, the electrons and holes will be drawn to 
the electrodes. The circuit current will be proportional to the intensity of the light. 
This mechanism is responsible for the phenomenon of photoconductivity and the 
operation of photoconductive cells. 

Electron hole pairs can also be produced by high-energy particles. When a 
fast-moving charged particle — for instance, a proton or a pion with an energy of 
tens or hundreds of Mev — goes through a crystal, its electric field will knock elec- 
trons out of their bound states creating electron-hole pairs. Such events occur 
hundreds of thousands of times per millimeter of track. After the passage of the 
particle, the carriers can be collected and in doing so will give an electrical pulse. 
This is the mechanism at play in the semiconductor counters recently put to use 
for experiments in nuclear physics. Such counters do not require semiconductors; 
they can also be made with crystalline insulators. In fact, the first of such counters 
was made using a diamond crystal which is an insulator at room temperature. 
Very pure crystals are required if the holes and electrons are to be able to move 
freely to the electrodes without being trapped. The semiconductors silicon and 
germanium are used because they can be produced with high purity in reasonable 
large sizes (centimeter dimensions). 

So far we have been concerned with semiconductor crystals at temperatures 
near absolute zero. At any finite temperature there is still another mechanism by 
which electron-hole pairs can be created. The pair energy can be provided from 
the thermal energy of the crystal. The thermal vibrations of the crystal can transfer 
their energy to a pair — giving rise to “spontaneous” creation. 

The probability per unit time that the energy as large as the gap energy E gap 
will be concentrated at one atomic site is proportional to e~ E ^ !>lT , where T is the 
temperature and #c is Boltzmann’s constant (see Chapter 40, Vol. I). Near absolute 
zero there is no appreciable probability, but as the temperature rises there is 
an increasing probability of producing such pairs. At any finite temperature the 
production should continue forever at a constant rate giving more and more 
negative and positive carriers. Of course that does not happen because after 
awhile the electrons and holes accidentally find each other — the electron drops 
into the hole and the excess energy is given to the lattice. We say that the electron 
and hole “annihilate.” There is a certain probability per second that a hole meets 
an electron and the two things annihilate each other. 

If the number of electrons per unit volume is N n ( n for negative carriers) 
and the density of positive carriers is N pt the chance per unit time that an electron 
and a hole will find each other and annihilate is proportional to the product N n N p . 
In equilibrium this rate must equal the rate that pairs are created. You see that in 


t In many books this same energy diagram is interpreted in a different way. The energy 
scale refers only to electrons. Instead of thinking of the energy of the hole, they think of 
the energy an electron would have if it filled the hole. This energy is lower than the free- 
electron energy — in fact, just the amount lower that you see in Fig. 14-5. With this 
interpretation of the energy scale, the gap energy is the minimum energy which must be 
given to an electron to move it from its bound state to the conduction band. 



(Positive energy downward) 

Fig. 14-4. Energy diagrams for an 
electron and a hole drawn together. 


\ E (electron) 


ELECTRON 
CONDUCTION- 
BAND 


..STATE S 


HOLE 
CONDUCTION^ 
BAND 


ESTATE S 


r E^(hole) 


Fig. 14-5. 
electrons and 


Energy level diagram for 
holes. 


14-3 



equilibrium the product of N n and N p should be given by some constant times the 
Boltzmann factor: 

N n N p = const e~ E ™ lKT . (144) 

When we say constant, we mean nearly constant. A more complete theory — which 
includes more details about how holes and electrons “find” each other — shows 
that the “constant” is slightly dependent upon temperature, but the major de- 
pendence on temperature is in the exponential. 

Let’s consider, as an example, a pure material which is originally neutral. 
At a finite temperature you would expect the number of positive and negative 
carriers to be equal, N n = N p . Then each of them should vary with temperature 
as e~ E z a P l2KT . The variation of many of the properties of a superconductor — the 
conductivity for example — is mainly determined by the exponential factor because 
all the other factors vary much more slowly with temperature. The gap energy for 
germanium is about 0.72 ev and for silicon LI ev. 

At room temperature kT is about 1 /40 of an electron volt. At these tempera- 
tures there are enough holes and electrons to give a significant conductivity, while 
at, say, 30°K — one-tenth of room temperature — the conductivity is imperceptible. 
The gap energy of diamond is 6 or 7 ev and diamond is a good insulator at room 
temperature. 


14-2 Impure semiconductors 

So far we have talked about two ways that extra electrons can be put into an 
otherwise ideally perfect crystal lattice. One way was to inject the electron from 
an outside source; the other way, was to knock a bound electron off* a neutral 
atom creating simultaneously an electron and a hole. It is possible to put electrons 
into the conduction band of a crystal in still another way. Suppose we imagine a 
crystal of germanium in which one of the germanium atoms is replaced by an 
arsenic atom. The germanium atoms have a valence of 4 and the crystal structure 
is controlled by the four valence electrons. Arsenic, on the other hand, has a 
valence of 5. It turns out that a single arsenic atom can sit in the germanium lattice 
(because it has approximately the correct size), but in doing so it must act as a 
valence 4 atom — using four of its valence electrons to form the crystal bonds and 
having one electron left over. This extra electron is very loosely attached — the 
binding energy is less than 1/10 of a volt. At room temperature the electron easily 
picks up that much energy from the thermal energy of the crystal, and then takes 
off* on its own — moving about in the lattice as a free electron. An impurity atom 
such as the arsenic is called a donor site because it can give up a negative carrier 
to the crystal. If a crystal of germanium is grown from a melt to which a very small 
amount of arsenic has been added, the arsenic donor sites will be distributed 
throughout the crystal and the crystal will have a certain density of negative 
carriers built in. 

You might think that these carriers would get swept away as soon as any small 
electric field was put across the crystal. This will not happen, however, because 
the arsenic atoms in the body of the crystal each have a positive charge. If the body 
of the crystal is to remain neutral, the average density of negative carrier electrons 
must be equal to the density of donor sites. If you put two electrodes on the edges 
of such a crystal and connect them to a battery, a current will flow; but as the 
carrier electrons are swept out at one end, new conduction electrons must be 
introduced from the electrode on the other end so that the average density of 
conduction electrons is left very nearly equal to the density of donor sites. 

Since the donor sites are positively charged, there will be some tendency for 
them to capture some of the conduction electrons as they diffuse around inside 
the crystal. A donor site can, therefore, act as a trap such as those we discussed 
in the last section. But if the trapping energy is sufficiently small — as it is for arsenic 
— the number of carriers which are trapped at any one time is a small fraction 
of the total. For a complete understanding of the behavior of semiconductors 
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one must take into account this trapping. For the rest of our discussion, however, 
we will assume that the trapping energy is sufficiently low and the temperature is 
sufficiently high, that all of the donor sites have given up their electrons. This is, 
of course, just an approximation. 

It is also possible to build into a germanium crystal some impurity atom 
whose valence is 3, such as aluminum. The aluminum atom tries to act as a 
valence 4 object by stealing an extra electron. It can steal an electron from some 
nearby germanium atom and end up as a negatively charged atom with an effective 
valence of 4. Of course, when it steals the electron from a germanium atom, it 
leaves a hole there; and this hole can wander around in the crystal as a positive 
carrier. An impurity atom which can produce a hole in this way is called an 
acceptor because it “accepts” an electron. If a germanium or a silicon crystal is 
grown from a melt to which a small amount of aluminum impurity has been 
added, the crystal will have built-in a certain density of holes which can act as 
positive carriers. 

When a donor or an acceptor impurity is added to a semiconductor, we say 
that the material has been “doped.” 

When a germanium crystal with some built-in donor impurities is at room 
temperature, some conduction electrons are contributed by the thermally induced 
electron-hole pair creation as well as by the donor sites. The electrons from both 
sources are, naturally, equivalent, and it is the total number N n which comes into 
play in the statistical processes that lead to equilibrium. If the temperature is not 
too low, the number of negative carriers contributed by the donor impurity atoms 
is roughly equal to the number of impurity atoms present. In equilibrium Eq. 
(14.4) must still be valid; at a given temperature the product N n N p is determined. 
This means that if we add some donor impurity which increases N n , the number 
N v of positive carriers will have to decrease by such an amount that N n N p is 
unchanged. If the impurity concentration is high enough, the number N n of nega- 
tive carriers is determined by the number of donor sites and is nearly independent 
of temperature — all of the variation in the exponential factor is supplied by N p , 
even though it is much less than N n . An otherwise pure crystal with a small con- 
centration of donor impurity will have a majority of negative carriers; such a 
material is called an “w-type” semiconductor. 

If an acceptor-type impurity is added to the crystal lattice, some of the new 
holes will drift around and annihilate some of the free electrons produced by 
thermal fluctuation. This process will go on until Eq. (14.4) is satisfied. Under 
equilibrium conditions the number of positive carriers will be increased and the 
number of negative carriers will be decreased, leaving the product a constant. A 
material with an excess of positive carriers is called a “p-t ype” semiconductor. 

If we put two electrodes on a piece of semiconductor crystal and connect 
them to a source of potential difference, there will be an electric field inside the 
crystal. The electric field will cause the positive and the negative carriers to move, 
and an electric current will flow. Let’s consider first what will happen in an 
H-type material in which there is a large majority of negative carriers. For such 
material we can disregard the holes, they will contribute very little to the current 
because there are so few of them. In an ideal crystal the carriers would move across 
without any impediment. In a real crystal at a finite temperature, however, — 
especially in a crystal with some impurities — the electrons do not move completely 
freely. They are continually making collisions which knock them out of their 
original trajectories, that is, changing their momentum. These collisions are just 
exactly the scatterings we talked about in the last chapter and occur at any irregu- 
larity in the crystal lattice. In an «-type material the main causes of scattering are 
the very donor sites that are producing the carriers. Since the conduction electrons 
have a very slightly different energy at the donor sites, the probability waves are 
scattered from that point. Even in a perfectly pure crystal, however, there are 
(at any finite temperature) irregularities in the lattice due to thermal vibrations. 
From the classical point of view we can say that the atoms aren’t lined up exactly 
on a regular lattice, but are, at any instant, slightly out of place due to their thermal 
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vibrations. The energy E 0 associated with each lattice point in the theory we 
described in Chapter 13 varies a little bit from place to place so that the waves of 
probability amplitude are not transmitted perfectly but are scattered in an irregular 
fashion. At very high temperatures or for very pure materials this scattering may 
become important, but in most doped materials used in practical devices the 
impurity atoms contribute most of the scattering. We would like now to make an 
estimate of the electrical conductivity of such a material. 

When an electric field is applied to an n - type semiconductor, each negative 
carrier will be accelerated in this field, picking up velocity until it is scattered from 
one of the donor sites. This means that the carriers which are ordinarily moving 
about in a random fashion with their thermal energies will pick up an average 
drift velocity along the lines of the electric field and give rise to a current through 
the crystal. The drift velocity is in general rather small compared with the typical 
thermal velocities so that we can estimate the current by assuming that the average 
time that the carrier travels between scatterings is a constant. Let's say that the 
negative carrier has an effective electric charge q n . In an electric field £, the force 
on the carrier will be q n S. In Section 43-3 of Volume I we calculated the average 
drift velocity under such circumstances and found that it is given by Fr/m, where 
Fis the force on the charge, r is the mean free time between collisions, and m is the 
mass. We should use the effective mass we calculated in the last chapter but 
since we want to make a rough calculation we will suppose that this effective mass 
is the same in all directions. Here we will call it m n . With this approximation the 
average drift velocity will be 

•Wt = ^ • (14.5) 

m n 

Knowing the drift velocity we can find the current. Electric current density j is 
just the number of carriers per unit volume, N n , multiplied by the average drift 
velocity, and by the charge on each carrier. The current density is therefore 

j = N n v dtlH q n S = N ^ n - £. (14.6) 

We see that the current density is proportional to the electric field ; such a semi- 
conductor material obeys Ohm’s law. The coefficient of proportionality between 
j and £, the conductivity a, is 



(14.7) 


For an «-type material the conductivity is relatively independent of temperature. 
First, the number of majority carriers N n is determined primarily by the density 
of donors in the crystal (so long as the temperature is not so low that too many 
of the carriers are trapped). Second, the mean time between collisions r n is mainly 
controlled by the density of impurity atoms, which is, of course, independent of 
the temperature. 

We can apply all the same arguments to a /?-type material, changing only the 
values of the parameters which appear in Eq. (14.7). If there are comparable 
numbers of both negative and positive carriers present at the same time, we must 
add the contributions from each kind of carrier. The total conductivity will be 
given by 

__ N n q n T n | N p q p T p * g. 

m n m p 

For very pure materials, N p and N n will be nearly equal. They will be smaller 
than in a doped material, so the conductivity will be less. Also they will vary 
rapidly with temperature (like e~ E ^ lKT , as we have seen), so the conductivity 
may change extremely fast with temperature. 
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14-3 The Hall effect 


It is certainly a peculiar thing that in a substance where the only relatively 
free objects are electrons, there should be an electrical current carried by holes 
that behave like positive particles. We would like, therefore, to describe an experi- 
ment that shows in a rather clear way that the sign of the carrier of electric current 
is quite definitely positive. Suppose we have a block made of semiconductor 
material — it could also be a metal — and we put an electric field on it so as to draw a 
current in some direction, say the horizontal direction as drawn in Fig. 14-6. 
Now suppose we put a magnetic field on the block pointing at a right angle to 
the current, say into the plane of the figure. The moving carriers will feel a mag- 
netic force q{v X B). And since the average drift velocity is either right or left — 
depending on the sign of the charge on the carrier — the average magnetic force on 
the carriers will be either up or down. No, that is not right! For the directions 
we have assumed for the current and the magnetic field the magnetic force on the 
moving charges will always be up. Positive charges moving in the direction of j 
(to the right) will feel an upward force. If the current is carried by negative charges, 
they will be moving left (for the same sign of the conduction current) and they 
will also feel an upward force. Under steady conditions, however, there is no 
upward motion of the carriers because the current can flow only from left to right. 
What happens is that a few of the charges initially flow upward, producing a sur- 
face charge density along the upper surface of semiconductor — leaving an equal 
and opposite surface charge density along the bottom surface of the crystal. The 
charges pile up on the top and bottom surfaces until the electric forces they produce 
on the moving charges just exactly cancel the magnetic force (on the average) so 
that the steady current flows horizontally. The charges on the top and bottom 
surfaces will produce a potential difference vertically across the crystal which can 
be measured with a high-resistance voltmeter, as shown in Fig. 14-7. The sign 
of the potential difference registered by the voltmeter will depend on the sign of 
the carrier charges responsible for the current. 

When such experiments were first done it was expected that the sign of the 
potential difference would be negative as one would expect for negative conduction 
electrons. People were, therefore, quite surprised to find that for some materials 
the sign of the potential difference was in the opposite direction. It appeared that 
the current carrier was a particle with a positive charge. From our discussion of 
doped semiconductors it is understandable that an «-type semiconductor should 
produce the sign of potential difference appropriate to negative carriers, and that 
a />-type semiconductor should give an opposite potential difference, since the 
current is carried by the positively charged holes. 

The original discovery of the anomalous sign of the potential difference in 
the Hall effect was made in a metal rather than a semiconductor. It had been 
assumed that in metals the conduction was always by electron; however, it was 
found out that for berylium the potential difference had the wrong sign. It is now 
understood that in metals as well as in semiconductors it is possible, in certain 
circumstances, that the “objects” responsible for the conduction are holes. Al- 
though it is ultimately the electrons in the crystal which do the moving, neverthe- 
less, the relationship of the momentum and the energy, and the response to external 
fields is exactly what one would expect for an electric current carried by positive 
particles. 

Let’s see if we can make a quantitative estimate of the magnitude of the volt- 
age difference expected from the Hall effect. If the voltmeter in Fig 14-7 draws a 
negligible current, then the charges inside the semiconductor must be moving 
from left to right and the vertical magnetic force must be precisely cancelled by a 
vertical electric field which we will call £ tr (the “tr” is for “transverse”). If this 
electric field is to cancel the magnetic forces, we must have 



Fig. 14-6. The Hall effect comes from 
the magnetic forces on the carriers. 
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Str = “"drift X B. (14.9) 

Using the relation between the drift velocity and the electric current density given 
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in Eq. (14.6), we get 
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Fig. 14-8. A p-n junction 



Fig. 14-9. The electric potential and 
the carrier densities in an unbiased 
semiconductor junction. 
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The potential difference between the top and the bottom of the crystal is, of course, 
this electric field strength multiplied by the height of the crystal. The electric field 
strength & t r in the crystal is proportional to the current density and to the mag- 
netic field strength. The constant of proportionality 1 /qN is called the Hall 
coefficient and is usually represented by the symbol R //. The Hall coefficient de- 
pends just on the density of carriers — provided that carriers of one sign are in a 
large majority. Measurement of the Hall effect is, therefore, one convenient way 
of determining experimentally the density of carriers in a semiconductor. 


14-4 Semiconductor junctions 

We would like to discuss now what happens if we take two pieces of germanium 
or silicon with different internal characteristics — say different kinds or amounts 
of doping — and put them together to make a “junction/’ Let’s start out with what 
is called a p-n junction in which we have p - type germanium on one side of the 
boundary and «-type germanium on the other side of the boundary — as sketched 
in Fig. 14-8. Actually, it is not practical to put together two separate pieces of 
crystal and have them in uniform contact on an atomic scale. Instead, junctions 
are made out of a single crystal which has been modified in the two separate 
regions. One way is to add some suitable doping impurity to the “melt” after 
only half of the crystal has grown. Another way is to paint a little of the impurity 
element on the surface and then heat the crystal causing some impurity atoms to 
diffuse into the body of the crystal. Junctions made in these ways do not have a 
sharp boundary, although the boundaries can be made as thin as 10 -4 centimeters 
or so. For our discussions we wilt imagine an ideal situation in which these two 
regions of the crystal with different properties meeting at a sharp boundary. 

On the rt-type side of p-n junction there are free electrons which can move 
about, as well as the fixed donor sites which balance the overall electric charge. 
On the p - type side there are free holes moving about and an equal number of 
negative acceptor sites keeping the charge balanced. Actually, that describes the 
situation before we put the two materials in contact. Once they are connected 
together the situation will change near the boundary. When the electrons in 
the n-type material arrive at the boundary they will not be reflected back as they 
would at a free surface, but are able to go right on into the p-type material. Some 
of the electrons of the n-type material will, therefore, tend to diffuse over into the 
p-type material where there are fewer electrons. This cannot go on forever because 
as we lose electrons from the n-side the net positive charge there increases until 
finally an electric voltage is built up which retards the diffusion of electrons into 
the p-side. In a similar way, the positive carriers of the p-type material can diffuse 
across the junction into the n-type material. When they do this they leave behind 
an excess of negative charge. Under equilibrium conditions the net diffusion cur- 
rent must be zero. This brought about by the electric fields which are established 
in such a way as to draw the positive carriers back toward the p- type material. 

The two diffusion processes we have been describing go on simultaneously 
and, you will notice, both act in the direction which will charge up the n-type 
material in a positive sense and the p- type material in a negative sense. Because 
of the finite conductivity of the semiconductor material, the change in potential 
from the p-side to the n-side will occur in a relatively narrow region near the bound- 
ary; the main body of each block of material will have a uniform potential. Let’s 
imagine an x-axis in a direction perpendicular to the boundary surface. Then the 
electric potential will vary with x, as shown in Fig. l4-9(b). We have also shown 
in part (c) of the figure the expected variation of the density N n of /z-carriers and 
the density N p of p-carriers. Far away from the junction the carrier densities 
N p and N n should be just the equilibrium density we would expect for individual 
blocks of materials at the same temperature. (We have drawn the figure for a 
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junction in which the p-type material is more heavily doped than the n-type 
material.) Because of the potential gradient at the junction, the positive carriers 
have to climb up a potential hill to get to the p- type side. This means that under 
equilibrium conditions there can be fewer positive carriers in the n-type material 
than there are in the p-type material Remembering the laws of statistical me- 
chanics, we expect that the ratio of p-type carriers on the two sides to be given by 
the following equation: 


N p {p~ side) = e ~Q P vur 
A^p(n-side) 


(14.10) 


The product q p V in the numerator of the exponential is just the energy required to 
carry a charge of q p through a potential difference V. 

We have a precisely similar equation for the densities of the /z-type carriers: 


A^tt-sid e) = e - qn vuT 
7V„(p-side) 


(14.11) 


If we know the equilibrium densities in each of the two materials, we can use 
either of the two equations above to determine the potential difference across the 


junction. 

Notice that if Eqs. (14.10) and (14.11) are to give the same value for the 
potential difference V 9 the product N p N n must be the same for the p- side as for 
the w-side. (Remember that q n = — q p .) We have seen earlier, however, that this 
product depends only on the temperature and the gap energy of the crystal. 
Provided both sides of the crystal are at the same temperature, the two equations 
are consistent with the same value of the potential difference. 

Since there is a potential difference from one side of the junction to the other, 
it looks something like a battery. Perhaps if we connect a wire from the n-type side 
to the p-type side we will get an electrical current. That would be nice because 
then the current would flow forever without using up any material and we would 
have an infinite source of energy in violation of the second law of thermodynamics! 
There is, however, no current if you connect a wire from the p - side to the n- side. 
And the reason is easy to see. Suppose we imagine first a wire made out of a piece 
of undoped material. When we connect this wire to the w-type side, we have a 
junction. There will be a potential difference across this junction. Let’s say that 
it is just one-half the potential difference from the p- type material to the n - type 
material. When we connect our undoped wire to the p-type side of the junction, 
there is also a potential difference at this junction — again, one-half the potential 
drop across the p-n junction. At all the junctions the potential differences adjust 
themselves so that there is no net current flow in the circuit. Whatever kind of wire 


you use to connect together the two sides of the n-p junction, you are producing 
two new junctions, and so long as all the junctions are at the same temperature, the 
potential jumps at the junctions all compensate each other and no current will 
flow m the circuit. It does turn out, however — if you work out the details — that if 
some of the junctions are at a different temperature than the other junctions, 
currents will flow. Some of the junctions will be heated and others will be cooled 
by this current and thermal energy will be converted into electrical energy. This 
effect is responsible for the operation of thermocouples which are used for measur- 
ing temperatures, and of thermoelectric generators. The same effect is also used 
to make small refrigerators. 

If we cannot measure the potential difference between the two sides of an 
n-p junction, how can we really be sure that the potential gradient shown in Fig. 
14-9 really exists? One way is to shine light on the junction. When the light 
photons are absorbed they can produce an electron-hole pair. In the strong 
electric field that exists at the junction (equal to the slope of the potential curve of 
Fig. 14-9) the hole will be driven into the p-type region and the electron will be 
driven into the /z-type region. If the two sides of the junction are now connected 
to an external circuit, these extra charges will provide a current. The energy of 
the light will be converted into electrical energy in the junction. The solar cells 
which generate electrical power for the operation of some of our satellites operate 
on this principle. 
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In our discussion of the operation of a semiconductor junction we have been 
assuming that the holes and the electrons act more-or-less independently — except 
that they somehow get into proper statistical equilibrium. When we were describing 
the current produced by light shining on the junction, we were assuming that an 
electron or a hole produced in the junction region would get into the main body of 
the crystal before being annihilated by a carrier of the opposite polarity. In the 
immediate vicinity of the junction, where the density of carriers of both signs is 
approximately equal, the effect of electron-hole annihilation (or as it is often 
called, * ‘recombination”) is an important effect, and in a detailed analysis of a semi- 
conductor junction must be properly taken into account. We have been assuming 
that a hole or an electron produced in a junction region has a good chance of 
getting into the main body of the crystal before recombining. The typical time 
for an electron or a hole to find an opposite partner and annihilate it is for typical 
semiconductor materials in the range between 10~ 3 and 10~ 7 seconds. This time 
is, incidentally, much longer than the mean free time r between collisions with 
scattering sites in the crystal which we used in the analysis of conductivity. In 
a typical n-p junction, the time for an electron or hole formed in the junction region 
to be swept away into the body of the crystal is generally much shorter than the 
recombination time. Most of the pairs will, therefore, contribute to an external 
current. 


14-5 Rectification at a semiconductor junction 

We would like to show next how it is that a p-n junction can act like a rectifier. 
If we put a voltage across the junction, a large current will flow if the polarity is in 
one direction, but a very small current will flow if the same voltage is applied in the 
opposite direction. If an alternating voltage is applied across the junction, a net 
current will flow in one direction — the current is “rectified.” Let’s look again at 
what is going on in the equilibrium condition described by the graphs of Fig. 
14-9. In the p-t ype material there is a large concentration N p of positive carriers. 
These carriers are diffusing around and a certain number of them each second 
approach the junction. This current of positive carriers which approaches the 
junction is proportional to N p . Most of them, however, are turned back by the 
high potential hill at the junction and only the fraction e~ qVlKT gets through. 
There is also a current of positive carriers approaching the junction from the other 
side. This current is also proportional to the density of positive carriers in the 
n-type region, but the carrier density here is much smaller than the density on the 
/7-type side. When the positive carriers approach the junction from the n- type 
side, they find a hill with a negative slope and immediately slide downhill to the 
/7-type side of the junction. Let’s call this current / 0 . Under equilibrium the cur- 
rents from the two directions are equal. We expect then the following relation: 

/ 0 ~ N p (n- side) = N n (p-%[dt)e~ qVlKT . (14.12) 

You will notice that this equation is really just the same as Eq. (14-10). We have 
just derived it in a different way. 

Suppose, however, that we lower the voltage on the n-side of the junction by 
an amount AV — which we can do by applying an external potential difference to 
the junction. Now the difference in potential across the potential hill is no longer 
V but V — AV. The current of positive carriers from the /?-side to the n-side will 
now have this potential difference in its exponential factor. Calling this current 
I u we have 

h ~ N p (p-sid<i)e- qiV -* V)UT . 

This current is larger than / () by just the factor c / ' Ar/k:r . So we have the following 
relation between and 7 0 : 


h = I n e +q * v,KT . (14.13) 

The current from the /7-side increases exponentially with the externally applied 
voltage AV. The current of positive carriers from the zz-side, however, remains 
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constant so long as A V is not too large. When they approach the barrier, these 
carriers will still find a downhill potential and will all fall down to the /?-side 
(If AV is larger than the natural potential difference V, the situation would change, 
but we will not consider what happens at such high voltages.) The net current I of 
positive carriers which flows across the junction is then the difference between the 
currents from the two sides: 

/ = Io(e +qAVlKT - 1). (14.14) 

The net current I of holes flows into the n-type region. There the holes diffuse 
into the body of the ^-region, where they are eventually annihilated by the majority 
n-type carriers — the electrons. The electrons which are lost in this annihilation 
will be made up by a current of electrons from the external terminal of the n-type 
material. 

When AV is zero, the net current in Eq. (14.14) is zero For positive AV the 
current increases rapidly with the applied voltage. For negative AV the current 
reverses m sign, but the exponential term soon becomes negligible and the negative 
current never exceeds 7 0 — which under our assumptions is rather small. This 
back current J 0 is limited by the small density of the minority carriers on the «-side 
of the junction. 

If you go through exactly the same analysis for the current of negative carriers 
which flows across the junction, first with no potential difference and then with a 
small externally applied potential difference AV , you get again an equation just 
like (14.14) for the net electron current. Since the total current is the sum of the 
currents contributed by the two carriers, Eq. (14.14) still applies for the total 
current provided we identify / 0 as the maximum current which can flow for a 
reversed voltage. 

The voltage-current characteristic of Eq. (14.14) is shown in Fig. 14-10. It 
shows the typical behavior of solid state diodes — such as those used in modern 
computers. We should remark that Eq. (14.14) is true only for small voltages. 
For voltages comparable to or larger than the natural internal voltage difference 
V, other effects come into play and the current no longer obeys the simple equation. 

You may remember, incidentally, that we got exactly the same equation we 
have found here in Eq. (14.14) when we discussed the “mechanical rectifier” — the 
ratchet and pawl — in Chapter 46 of Volume I. We get the same equations in the 
two situations because the basic physical processes are quite similar. 

14-6 The transistor 

Perhaps the most important application of semiconductors is in the transistor. 
The transistor consists of two semiconductor junctions very close together. Its 
operation is based in part on the same principles that we just described for the 
semiconductor diode — the rectifying junction. Suppose we make a little bar of 
germanium with three distinct regions, a p - type region, an «-type region, and 
another p- type region, as shown in Fig. 14-1 1(a). This combination is called a 
p-n-p transistor. Each of the two junctions m the transistor will behave much in 
the way we have described in the last section. In particular, there will be a potential 
gradient at each junction having a certain potential drop from the n-type region to 
each /?-type region. If the two /?-type regions have the same internal properties, 
the variation in potential as we go across the crystal will be as shown in the graph 
of Fig. 14-1 1(b). 

Now let's imagine that we connect each of the three regions to external voltage 
sources as shown in part (a) of Fig. 14-1 2 We will refer all voltages to the terminal 
connected to the left-hand /^-region so it will be, by definition, at zero potential. 
We will call this terminal the emitter . The /z-type region is called the base and it is 
connected to a slightly negative potential. The right-hand p-type region is called 
the collector , and is connected to a somewhat larger negative potential. Under 
these circumstances the variation of potential across the crystal will be as shown in 
the graph of Fig. 14-1 2(b). 

Let’s first see what happens to the positive carriers, since it is primarily their 
behavior which controls the operation of the p-n-p transistor. Since the emitter is 



Fig. 14-10. The current through a 
junction as a function of the voltage 
across it. 



Fig. 14-11. The potential distribu- 
tion in a transistor with no applied 
voltages. 
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Fig. 14-12. The potential distribu- 
tion in an operating transistor. 
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at a relatively more positive potential than the base, a current of positive carriers 
will flow from the emitter region into the base region. A relatively large current 
flows, since we have a junction operating with a “forward voltage” — corresponding 
to the right-hand half of the graph in Fig. 14-10. With these conditions, positive 
carriers or holes are being “emitted” from the /?-type region into the /i-type region. 
You might think that this current would flow out of the n-type region through the 
base terminal b. Now, however, comes the secret of the transistor. The «-type 
region is made very thin — typically 10~ 3 cm or less, much narrower than its trans- 
verse dimensions. This means that as the holes enter the «-type region they have 
a very good chance of diffusing across to the other junction before they are anni- 
hilated by the electrons in the «-type region. When they get to the right-hand 
boundary of the /z-type region they find a steep downward potential hill and im- 
mediately fall into the right-hand p-type region. This side of the crystal is called 
the collector because it “collects” the holes after they have diffused across the zz-type 
region. In a typical transistor, all but a fraction of a percent of the hole current 
which leaves the emitter and enters the base is collected in the collector region, 
and only the small remainder contributes to the net base current. The sum of the 
base and collector currents is, of course, equal to the emitter current. 

Now imagine what happens if we vary slightly the potential Vb on the base 
terminal. Since we are on a relatively steep part of the curve of Fig. 14-10, a 
small variation of the potential V b will cause a rather large change in the emitter 
current I e , Since the collector voltage V c is much more negative than the base 
voltage, these slight variations in potential will not effect appreciably the steep 
potential hill between the base and the collector. Most of the positive carriers 
emitted into the zz-region will still be caught by the collector. Thus as we vary 
the potential of the base electrode, there will be a corresponding variation in the 
collector current 7 C . The essential point, however, is that the base current h 
always remains a small fraction of the collector current. The transistor is an 
amplifier; a small current I h introduced into the base electrode gives a large current 
— 100 or so times higher — at the collector electrode. 

What about the electrons — the negative carriers that we have been neglecting 
so far? First, note that we do not expect any significant electron current to flow 
between the base and the collector. With a large negative voltage on the collector, 
the electrons in the base would have to climb a very high potential energy hill and 
the probability of doing that is very small. There is a very small current of elec- 
trons to the collector. 

On the other hand, the electrons in the base can go into the emitter region. 
In fact, you might expect the electron current in this direction to be comparable to 
the hole current from the emitter into the base. Such an electron current isn’t 
useful, and, on the contrary, is bad because it increases the total base current 
required for a given current of holes to the collector. The transistor is, therefore, 
designed to minimize the electron current to the emitter. The electron current is 
proportional to A n (base), the density of negative carriers in the base material 
while the hole current from the emitter depends on 7V p (emitter), the density of 
positive carriers in the emitter region. By using relatively little doping in the zz-type 
material A n (base) can be made much smaller than Ap (emitter). (The very thin 
base region also helps a great deal because the sweeping out of the holes in this 
region by the collector increases significantly the average hole current from the 
emitter into the base, while leaving the electron current unchanged.) The net 
result is that the electron current across the emitter-base junction can be made 
much less than the hole current, so that the electrons do not play any significant 
role in operation of the p-n-p transistor. The currents are dominated by motion of 
the holes, and the transistor performs as an amplifier as we have described above. 

It is also possible to make a transistor by interchanging the p-type and n - type 
materials in Fig. 14-11. Then we have what is called an n-p-n transistor. In the 
n-p-n transistor the main currents are carried by the electrons which flow from the 
emitter into the base and from there to the collector. Obviously, all the arguments 
we have made for the p-n-p transistor also apply to the n-p-n transistor if the po- 
tentials of the electrodes are chosen with the opposite signs. 
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The Independent Particle Approximation 


15-1 Spin waves 

In Chapter 13 we worked out the theory for the propagation of an electron or 
of some other “particle,” such as an atomic excitation, through a crystal lattice. 
In the last chapter we applied the theory to semiconductors. But when we talked 
about situations in which there are many electrons we disregarded any interactions 
between them. To do this is of course only an approximation. In this chapter 
we will discuss further the idea that you can disregard the interaction between the 
electrons. We will also use the opportunity to show you some more applications 
of the theory of the propagation of particles. Since we will generally continue to 
disregard the interactions between particles, there is very little really new in this 
chapter except for the new applications. The first example to be considered is, 
however, one in which it is possible to write down quite exactly the correct equa- 
tions when there is more than one “particle” present. From them we will be able 
to see how the approximation of disregarding the interactions is made. We will 
not, though, analyze the problem very carefully. 

As our first example we will consider a “spin wave” in a ferromagnetic crystal. 
We have discussed the theory of ferromagnetism in Chapter 36 of Volume II. 
At zero temperature all the electron spins that contribute to the magnetism in the 
body of a ferromagnetic crystal are parallel. There is an interaction energy between 
the spins, which is lowest when all the spins are down. At any nonzero temperature, 
however, there is some chance that some of the spins are turned over. We calculated 
the probability in an approximate manner in Chapter 36. This time we will describe 
the quantum mechanical theory — so you will see what you would have to do if you 
wanted to solve the problem more exactly. (We will still make some idealizations 
by assuming that the electrons are localized at the atoms and that the spins interact 
only with neighboring spins.) 

We consider a model in which the electrons at each atom are all paired except 
one, so that all of the magnetic effects come from one spin-J electron per atom. 
Further, we imagine that these electrons are localized at the atomic sites in the 
lattice. The model corresponds roughly to metallic nickel. 

We also assume that there is an interaction between any two adjacent spinning 
electrons which gives a term in the energy of the system 

E = - J2 05.1) 

where a’s represent the spins and the summation is over all adjacent pairs of 
electrons. We have already discussed this kind of interaction energy when we 
considered the hyperfine splitting of hydrogen due to the interaction of the mag- 
netic moments of the electron and proton in a hydrogen atom. We expressed it 
then as Aa e * a> Now, for a given pair, say the electrons at atom 4 and at atom 5, 
the Hamiltonian would be — Ka 4 • a 5 . We have a term for each such pair, and 
the Hamiltonian is (as you would expect for classical energies) the sum of these 
terms for each interacting pair. The energy is written with the factor — K so that 
a positive K will correspond to ferromagnetism — that is, the lowest energy results 
when adjacent spins are parallel. In a real crystal, there may be other terms which 
are the interactions of next nearest neighbors, and so on, but we don’t need to con- 
sider such complications at this stage. 

With the Hamiltonian of Eq. (15.1) we have a complete description of the 
ferromagnet — within our approximation — and the properties of the magnetization 
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should come out. We should also be able to calculate the thermodynamic proper- 
ties due to the magnetization. If we can find all the energy levels, the properties 
of the crystal at a temperature T can be found from the principle that the prob- 
ability that a system will be found in a given state of energy E is proportional to 
e -EiKT * Thi s problem has never been completely solved. 

We will show some of the problems by taking a simple example in which all 
the atoms are in a line — a one-dimensional lattice. You can easily extend the ideas 
to three dimensions. At each atomic location there is an electron which has two 
possible states, either spin up or spin down, and the whole system is described by 
telling how all of the spins are arranged. We take the Hamiltonian of the system 
to be the operator of the interaction energy. Interpreting the spin vectors of Eq. 
(15.1) as the sigma-operators — or the sigma-matrices — we write for the linear lattice 

it = 2 — — B n • a n+ 1. (15.2) 

n 

In this equation we have written the constant as A/2 for convenience (so that some 
of the later equations will be exactly the same as the ones in Chapter 13). 

Now what is the lowest state of this system? The state of lowest energy is 
the one in which all the spins are parallel — let’s say, all up.f We can write this 
state as | • • • H — i — | — [-*’*)» or I gnd) f° r the “ground,” or lowest, state. It’s 
easy to figure out the energy for this state. One way is to write out all the vector 
sigmas in terms of & x , & y , and & z , and work through carefully what each term of 
the Hamiltonian does to the ground state, and then add the results. We can, 
however, also use a good short cut. We saw in Section 12-2, that a t • ay could 
be written in terms of the Pauli spin exchange operator like this : 

a, -a, = (2Pir eX - 1), (15.3) 

where the operator P*f mex interchanges the spins of the /th and yth electrons. 
With this substitution the Hamiltonian becomes 

a = -a £ (PTn e A - i)- (15.4) 

n 

It is now easy to work out what happens to different states. For instance if / and j 
are both up, then exchanging the spins leaves everything unchanged, so P tJ acting 
on the state just gives the same state back, and is equivalent to multiplying by +1. 
The expression (P tJ — \) is just equal to one-half. (From now on we will leave 
off the descriptive superscript on the P.) 

For the ground state all spins are up ; so if you exchange a particular pair of 
spins, you get back the original state. The ground state is a stationary state. If 
you operate on it with the Hamiltonian you get the same state again multiplied 
by a sum of terms, — (A/2) for each pair of spins. That is, the energy of the system 
in the ground state is — A/2 per atom 

Next we would like to look at the energies of some of the excited states. It 
will be convenient to measure the energies with respect to the ground state — that 
is, to choose the ground state as our zero of energy. We can do that by adding the 
energy A/2 to each term in the Hamiltonian. That just changes the “J” in Eq. 
(15.4) to “1.” Our new Hamiltonian is 

H = -A^iKn+i - 1). (15.5) 

n 

With this Hamiltonian the energy of the lowest state is zero; the spin exchange 
operator is equivalent to multiplying by unity (for the ground state) which is 
cancelled by the “1” in each term. 


t The ground state here is really “degenerate” ; there are other states with the same 
energy — for example, all spins down, or all in any other direction. The slightest external 
field in the z-direction will give a different energy to all these states, and the one we have 
chosen will be the true ground state. 
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For describing states other than the ground state we will need a suitable set 
of base states. One convenient approach is to group the states according to whether 
one electron has spin down, or two electrons have spin down, and so on. There 
are, of course, many states with one spin down. The down spin could be at atom 
“4,” or at atom “5,” or at atom “6,” . . . We can, in fact, choose just such states 
for our base states. We could write them this way: | 4), | 5), | 6), . . . It will, 
however, be more convenient later if we label the “odd atom” — the one with the 
down-spinning electron — by its coordinate x. That is, well define the state | x 5 ) 
to be one with all the electrons spinning up except for the one on the atom at x 5 , 
which has a down-spinning electron (see Fig. 15-1). In general, | x„) is the state 
with one down spin that is located at the coordinate x n of the nth atom. 

What is the action of the Hamiltonian (15.5) on the state | x 5 )? One term of 
the Hamiltonian is say —A(P 7t $ — 1). The operator P 7t8 exchanges the two spins 
of the adjacent atoms 7, 8. But in the state | x 5 ) these are both up, and nothing 
happens: P 78 is equivalent to multiplying by 1: 

^ 7,8 | * 5 ) = | * 5 >- 

It follows that 

(^7,8 - 1) | X S ) = 0. 


Thus all the terms of the Hamiltonian give zero — except those involving atom 5, 
of course. On the state | 5), the operation P it5 exchanges the spin of atom 4 (up) 
and atom 5 (down). The result is the state with all spins up except the atom at 4. 
That is 


In the same way 


P 4,5 

1 * 5 > = 

1 *4>. 

P 5,6 

1 *«> = 

1 * 6 >- 




Fig. 15-1. The base state \ x$) of a 
linear array of spins. All the spins are up 
except the one at x 5# which is down. 


Hence, the only terms of the Hamiltonian which survive are —A(P± t5 — 1) 
and —A(P 5f6 — 1). Acting on | x 5 ) they produce —A \ x 4 ) + A | x 5 ) and 
— A | x 6 ) + A | x 5 ), respectively. The result is 

A I X S ) = -A (p„,n+l - 1) | *5> = -A{\ X 6 > + I *4> - 2 | *5>}- (15.6) 

n 

When the Hamiltonian acts on state | x 5 ) it gives rise to some amplitude to be 
in states | x 4 ) and | x 6 ). That just means that there is a certain amplitude to have 
the down spin jump over to the next atom. So because of the interaction between 
the spins, if we begin with one spin down, then there is some probability that at a 
later time another one will be down instead. Operating on the general state | x n ), 
the Hamiltonian gives 


7?|x„> = —A{\ x n+l ) + | Xn-i) — 2 I x n )}. (15.7) 


Notice particularly that if we take a complete set of states with only one spin 
down, they will only be mixed among themselves. The Hamiltonian will never 
mix these states with others that have more spins down. So long as you only ex- 
change spins you never change the total number of down spins. 

It will be convenient to use the matrix notation for the Hamiltonian, say 
H n ,m = (x n | H | x m >; Eq. (15.7) is equivalent to 


H n ,n — A J 

Hn,n+1 ~ H n ,n—l — A, (15.8) 

H n ,m =0, for | n — m \ > 1. 

Now what are the energy levels for states with one spin down? As usual we 
let C n be the amplitude that some state | is in the state | x n ). If | \p) is to be a 
definite energy state, all the C n * s must vary with time in the same way, namely, 

C„ = a n e-' Etl \ (15.9) 
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We can put this trial solution into our usual Hamiltonian equation 


lh = Z 05 1°) 

m 

using Eq (15.8) for the matrix elements. Of course we get an infinite number of 
equations, but they can all be written as 


Ea n = 2 Aa n — Aa n ^ 1 — Aa n+1 (15.11) 

We have again exactly the same problem we worked out in Chapter 13, except that 
where we had E 0 we now have 2 A. The solutions correspond to amplitudes C n 
(the down-spin amplitude) which propagate along the lattice with a propagation 
constant k and an energy 


E = 2,4(1 — cos kb), (15.12) 

where b is the lattice constant. 

The definite energy solutions correspond to “waves” of down spin — called 
“spin waves.” And for each wavelength there is a corresponding energy. For 
large wavelengths (small k ) this energy varies as 

E = Ab 2 k 2 . (15.13) 


Just as before, we can consider a localized wave packet (containing, however, 
only long wavelengths) which corresponds to a spin-down electron in one part of 
the lattice. This down spin will behave like a “particle.” Because its energy is 
related to k by (15.13) the “particle” will have an effective mass: 


JWcff 


h 2 

2 Ab 2 


(15.14) 


These “particles” are sometimes called “magnons.” 


15-2 Two spin waves 
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Fig. 15-2. 
spins. 
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A state with two down 


Now we would like to discuss what happens if there are two down spins. 
Again we pick a set of base states. We’ll choose states in which there are down 
spins at two atomic locations, such as the state shown in Fig. 15-2. We can label 
such a state by the ^-coordinates of the two sites with down spins. The one shown 
can be called | x 2 , x 5 ). In general the base states are | x n , x m ) — a doubly infinite 
set! In this system of description, the state | x Q ) and the state | x 9 , x 4 ) are 
exactly the same state, because each simply says that there is a down spin at 4 and 
one at 9; there is no meaning to the order. Furthermore, the state | x 4 , x 4 ) has 
no meaning, there isn’t such a thing. We can describe any state | yp) by giving the 
amplitudes to be in each of the base states. Thus C m>n = (x m , x n \ \p) now means 
the amplitude for a system in the state | \p) to be in a state in which both the mth 
and nth atoms have a down spin. The complications which now arise are not 
complications of ideas — they are merely complexities in bookkeeping. (One of the 
complexities of quantum mechanics is just the bookkeeping. With more and 
more down spins, the notation becomes more and more elaborate with lots of 
indices and the equations always look very horrifying, but the ideas are not neces- 
sarily more complicated than in the simplest case.) 

The equations of motion of the spin system are the differential equations for 
the C n>m . They are 


ih —jr = E (15.15) 

Suppose we want to find the stationary states. As usual, the derivatives with re- 
spect to time become E times the amplitudes and the C m , n can be replaced by the 
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coefficients Next we have to work out carefully the effect of H on a state 

with spins m and n down. It is not hard to figure out. Suppose for a moment 
that m and n are far enough apart so that we don’t have to worry about the obvious 
trouble. The operation of exchange at the location x n will move the down spin 
either to the (n + 1) or (n — 1) atom, and so there’s an amplitude that the 
present state has come from the state | x„ l9 x n+ i) and also an amplitude that it 
has come from the state | x m9 x n _i). Or it may have been the other spin that 
moved; so there’s a certain amplitude that C m ,n is fed from C m+1 , n or from 
C TO _ 1>n . These effects should all be equal. The final result for the Hamiltonian 
equation on C m(U is 


1 ,n "b T* 1 H“ @m,n— l) T" (15.16) 


This equation is correct except in two situations. If m = n there is no equation 
at all, and if m = n 1, then two of the terms in Eq. (15.16) should be missing. 
We are going to disregard these exceptions. We simply ignore the fact that some 
few of these equations are slightly altered. After all, the crystal is supposed to be 
infinite, and we have an infinite number of terms; neglecting a few might not 
matter much. So for a first rough approximation let’s forget about the altered 
equations. In other words, we assume that Eq. (15.16) is true for all m and 
n, even for m and n next to each other. This is the essential part of our approxi- 
mation. 

Then the solution is not hard to find. We get immediately 



C = a 

^ m,n u m,n c ? 

(15.17) 

with 


a m , n = (const.) e lklX "e lk2 \ 

(15.18) 

where 


E — 4A — 2A cos k x b — 2 A cos k 2 b. 

(15.19) 


Think for a moment what would happen if we had two independent , single 
spin waves (as in the previous section) corresponding to k = /c x and k = k 2 \ 
they would have energies, from Eq. (15.12), of 


and 


6i = (2 A — 2A cos k x b) 
e 2 = (2 A — 2 A cos k 2 b). 


Notice that the energy E in Eq. (15.19) is just their sum, 


E = e(k x ) + e(k 2 ). (15.20) 

In other words we can think of our solution in this way. There are two particles — 
that is, two spin waves. One of them has a momentum described by ku the other 
by k 2% and the energy of the system is the sum of the energies of the two objects. 
The two particles act completely independently. That’s all there is to it. 

Of course we have made some approximations, but we do not wish to discuss 
the precision of our answer at this point. However, you might guess that in a 
reasonable size crystal with billions of atoms — and, therefore, billions of terms in 
the Hamiltonian — leaving out a few terms wouldn’t make much of an error. 
If we had so many down spins that there was an appreciable density, then we would 
certainly have to worry about the corrections. 

[Interestingly enough, an exact solution can be written down if there are just 
the two down spins. The result is not particularly important. But it is interesting 
that the equations can be solved exactly for this case. The solution is: 

a m ,n = exp UMx "‘ +a: " )1 sin k \ x m - x n |, (15.21) 


with the energy 


E = 4 A — 2 A cos k x b — 2 A cos k 2 b, 
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and with the wave numbers k c and k related to k x and k 2 by 


ki = k c - k, k 2 = k c + k . (15.22) 

This solution includes the “interaction” of the two spins. It describes the fact 
that when the spins come together there is a certain chance of scattering. The 
spins act very much like particles with an interaction. But the detailed theory of 
their scattering goes beyond what we want to talk about here.] 


15-3 Independent particles 

In the last section we wrote down a Hamiltonian, Eq. (15.15), for a two- 
particle system. Then using an approximation which is equivalent to neglecting 
any “interaction” of the two particles, we found the stationary states described 
by Eqs. (15.17) and (15.18). This state is just the product of two single-particle 
states. The solution we have given for a m , n in Eq. (15.18) is, however, really not 
satisfactory. We have very carefully pointed out earlier that the state | x 9 , x 4 ) 
is not a different state from | x 4 , x 9 ) — the order of x m and x n has no significance. 
In general, the algebraic expression for the amplitude C m>n must be unchanged if 
we interchange the values of x m and x„, since that doesn’t change the state. Either 
way, it should represent the amplitude to find a down spin at x m and a down spin 
at x n . But notice that (15.18) is not symmetric in x m and x n — since k x and k 2 
can in general be different. 

The trouble is that we have not forced our solution of Eq. (15.15) to satisfy 
this additional condition. Fortunately it is easy to fix things up. Notice first that 
a solution of the Hamiltonian equation just as good as (15.18) is 

0^ - (15.23) 

It even has the same energy we got for (15.18). Any linear combination of (15.15) 
and (15.23) is also a good solution, and has an energy still given by Eq. (15.19). 
The solution we should have chosen — because of our symmetry requirement— is 
just the sum of (15.15) and (15.23): 

a m ,n = K[e iklXm e ik2Xn + e ik2Xfn e ikiXn l (15.24) 

Now, given any k x and k 2 the amplitude C w , n is independent of which way we 
put x m and x n — if we should happen to define x m and x n reversed we get the same 
amplitude. Our interpretation of Eq. (15.24) in terms of “magnons” must also be 
different. We can no longer say that the equation represents one particle with wave 
number k x and a second particle with wave number k 2 . The amplitude (15.24) 
represents one state with two particles (magnons). The state is characterized by 
the two wave numbers k x and k 2 . Our solution looks like a compound state of 
one particle with the momentum p x = h/k x and another particle with the mo- 
mentum p 2 = h/k 2 , but in our state we can’t say which particle is which. 

By now, this discussion should remind you of Chapter 4 and our story of 
identical particles. We have ju^t been showing that the particles of the spin waves— 
the magnons — behave like identical Bose particles. All amplitudes must be sym- 
metric in the coordinates of the two particles — which is the same as saying that 
if we “interchange the two particles,” we get back the same amplitude and with 
the same sign. But, you may be thinking, why did we choose to add the two terms 
in making Eq. (15.24). Why not subtract? With a minus sign, interchanging 
x m and x: n would just change the sign of a m , n which doesn’t matter. But inter- 
changing x m and x n doesn't change anything — all the electrons of the crystal are 
exactly where they were before, so there is no reason for even the sign of the 
amplitude to change. The magnons will behave like Bose particles. f 


t In general, the quasi particles of the kind we are discussing may act like either Bose 
particles or Fermi particles, and as for free particles, the particles with integral spin are 
bosons and those with half-integral spins are fermions. The “magnon” stands for a spin-up 
electron turned over. The change in spin is one. The magnon has an integral spin, and 
is a boson. 
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The main points of this discussion have been twofold: First, to show you 
something about spin waves, and, second, to demonstrate a state whose amplitude 
is a product of two amplitudes, and whose energy is the sum of the energies corre- 
sponding to the two amplitudes. For independent particles the amplitude is the 
product and the energy is the sum. You can easily see why the energy is the sum. 
The energy is the coefficient of t in an imaginary exponential — it is proportional 
to the frequency. If two objects are doing something, one of them with the ampli- 
tude e~ iE d jn and the other with the amplitude e ~ lE2tlT \ and if the amplitude 
for the two things to happen together is the product of the amplitudes for each, 
then there is a single frequency in the product which is the sum of the two fre- 
quencies. The energy corresponding to the amplitude product is the sum of the two 
energies. 

We have gone through a rather long-winded argument to tell you a simple 
thing. When you don’t take into account any interaction between particles, you 
can think of each particle independently. They can individually exist in the various 
different states they would have alone, and they will each contribute the energy 
they would have had if they were alone. However, you must remember that if they 
are identical particles, they may behave either as Bose or as Fermi particles de- 
pending upon the problem. Two extra electrons added to a crystal, for instance, 
would have to behave like Fermi particles. When the positions of two electrons 
are interchanged, the amplitude must reverse sign. In the equation corresponding 
to Eq. (15.24) there would have to be a minus sign between the two terms on the 
right. As a consequence, two Fermi particles cannot be in exactly the same con- 
dition — with equal spins and equal /c’s. The amplitude for this state is zero. 


15-4 The benzene molecule 

Although quantum mechanics provides the basic laws that determine the 
structures of molecules, these laws can be applied exactly only to the most simple 
compounds. The chemists have, therefore, worked out various approximate 
methods for calculating some of the properties of complicated molecules. We 
would now like to show you how the independent particle approximation is used 
by the organic chemists. We begin with the benzene molecule. 

We discussed the benzene molecule from another point of view in Chapter 10. 
There we took an approximate picture of the molecule as a two-state system, 
with the two base states shown in Fig. 15-3. There is a ring of six carbons with a 
hydrogen bonded to the carbon at each location. With the conventional picture 
of valence bonds it is necessary to assume double bonds between half of the carbon 
atoms, and in the lowest energy condition there are the two possibilities shown in 
the figure. There are also other, higher-energy states. When we discussed benzene 
in Chapter 10, we just took the two states and forgot all the rest. We found that 
the ground-state energy of the molecule was not the energy of one of the states in 
the figure, but was lower than that by an amount proportional to the amplitude 
to flip from one of these states to the other. 

Now we’re going to look at the same molecule from a completely different 
point of view — using a different kind of approximation. The two points of view 
will give us different answers, but if we improve either approximation it should 
lead to the truth, a valid description of benzene. However, if we don’t bother to 
improve them, which is of course the usual situation, then you should not be 
surprised if the two descriptions do not agree exactly. We shall at least show that 
also with the new point-of-view the lowest energy of the benzene molecule is 
lower than either of the three-bond structures of Fig. 15-3. 

Now we want to use the following picture. Suppose we imagine the six 
carbon atoms of a benzene molecule connected only by single bonds as in Fig. 
15-4. We have removed six electrons — since a bond stands for a pair of electrons 
— so we have a six-times ionized benzene molecule. Now we will consider what 
happens when we put back the six electrons one at a time, imagining that each 
one can run freely around the ring. We assume also that all the bonds shown in 
Fig. 15-4 are satisfied, and don’t need to be considered further. 
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Fig. 15-3. The two base states for 
the benzene molecule used in Chapter 10. 
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Fig. 1 5-4. A benzene ring with six 
electrons removed. 
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Fig. 15-5. The ethylene molecule. 
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Fig. 1 5-6. The possible energy levels 
for the "extra” electrons in the ethylene 
molecule. 
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What happens when we put one electron back into the molecular ion? It 
might, of course, be located in any one of the six positions around the ring- 
corresponding to six base states. It would also have a certain amplitude, say A , to 
go from one position to the next. If we analyze the stationary states, there would 
be certain possible energy levels. That’s only for one electron. 

Next put a second electron in. And now we make the most ridiculous ap- 
proximation that you can think of — that what one electron does is not affected by 
what the other is doing. Of course they really will interact; they repel each other 
through the Coulomb force, and furthermore when they are both at the same site, 
they must have considerably different energy than twice the energy for one being 
there. Certainly the approximation of independent particles is not legitimate 
when there are only six sites — particularly when we want to put in six electrons. 
Nevertheless the organic chemists have been able to learn a lot by making this 
kind of an approximation. 

Before we work out the benzene molecule in detail, let’s consider a simpler 
example — the ethylene molecule which contains just two carbon atoms with two 
hydrogen atoms on either side as shown in Fig. 15-5. This molecule has one “extra” 
bond involving two electrons between the two carbon atoms. Now remove one 
of these electrons; what do we have? We can look at it as a two-state system — the 
remaining electron can be at one carbon or the other. We can analyze it as a two- 
state system. The possible energies for the single electron are either ( E 0 — A) 
or (£*0 + A), as shown in Fig. 15-6. 

Now add the second electron. Good, if we have two electrons, we can put 
the first one in the lower state and the second one in the upper. Not quite; we 
forgot something. Each one of the states is really double. When we say there’s 
a possible state with the energy (£ 0 — A), there are really two. Two electrons 
can go into the same state if one has its spin up and the other, its spin down. 
(No more can be put in because of the exclusion principle.) So there really are 
two possible states of energy ( E 0 — A). We can draw a diagram, as in Fig. 15-7, 
which indicates both the energy levels and their occupancy. In the condition of 
lowest energy both electrons will be in the lowest state with their spins opposite. 
The energy of the extra bond in the ethylene molecule therefore is 2(£ 0 — A) if 
we neglect the interaction between the two electrons. 

Let’s go back to the benzene. Each of the two states of Fig. 15-3 has three 
double bonds. Each of these is just like the bond in ethylene, and contributes 
2(£ 0 — A) to the energy if E 0 is now the energy to put an electron on a site in 
benzene and A is the amplitude to flip to the next site. So the energy should 
be roughly 6(£ 0 — A). But when we studied benzene before, we got that the 
energy was lower than the energy of the structure with three extra bonds. Let’s see 
if the energy for benzene comes out lower than three bonds from our new point 
of view. 

We start with the six-times ionized benzene ring and add one electron. Now 
we have a six-state system. We haven’t solved such a system yet, but we know 
what to do. We can write six equations in the six amplitudes, and so on. But 
let’s save some work — by noticing that we’ve already solved the problem, when 
we worked out the problem of an electron on an infinite line of atoms. Of course, 
the benzene is not an infinite line, it has 6 atomic sites in a circle. But imagine that 
we open out the circle to a line, and number the atoms along the line from 1 to 6. 
In an infinite line the next location would be 7, but if we insist that this location 
be identical with number 1 and so on, the situation will be just like the benzene 
ring. In other words we can take the solution for an infinite line with an added 
requirement that the solution must be periodic with a cycle six atoms long. From 
Chapter 13 the electron on a line has states of definite energy when the amplitude 
at each site is e lkx « — e rkbn . For each k the energy is 


Fig. 1 5-7. In the extra bond of the 
ethylene molecule two electrons (one 
spin up, one spin down) can occupy the 
lowest energy level. 


E = E 0 - 2 A cos kb. (15.25) 

We want to use now only those solutions which repeat every 6 atoms. Let’s 
do first the general case for a ring of N atoms. If the solution is to have a period 
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of N atomic spacing, e %kbN must be unity ; or kbN must be a multiple of 2ir. Taking 
s to represent any integer, our condition is that 


kbN = 2tts. 


(15.26) 


We have seen before that there is no meaning to taking k’s outside the range 
=t 7 r/b. This means that we get all possible states by taking values of s in the range 
±N/2. 

We find then that for an Adatom ring there are N definite energy statesf 
and they have wave numbers k s given by 


k. 


2ir 

~Nb 


(15.27) 



(1) 

( 2 ) 

( 2 ) 

(I) 


Each state has the energy (15.25). We have a line spectrum of possible energy 

levels. The spectrum for benzene (N = 6) is shown in Fig. 15— 8(b). (The numbers Fig ] 5 _g jh e energy levels in a 
in parentheses indicate the number of different states with the same energy.) r j ng w jth six electron locations (for 

There’s a nice way to visualize the six energy levels, as we have shown in example, a benzene ring), 
part (a) of the figure. Imagine a circle centered on a level with E 0 , and with a radius 
of 2 A. If we start at the bottom and mark off six equal arcs (at angles from the 
bottom point of k s b = 2ws/N, or 27rs/6 for benzene), then the vertical heights of 
the points on the circle are the solutions of Eq. (15.25). The six points represent 
the six possible states. The lowest energy level is at (E 0 — 2 A); there are two 
states with the same energy (E 0 — A), and so on.J These are possible states for 
one electron. If we have more than one electron, two — with opposite spins — can 
go into each state. 

For the benzene molecule we have to put in six electrons. For the ground 
state they will go into the lowest possible energy states — two at s = 0, two at 
s — +1, and two at s = — 1. According to the independent particle approxima- 
tion the energy of the ground state is 


ground = 2(Eq “ 2A) + 4 (Eq — A) 

= 6 E 0 - 8 A. (15.28) 

The energy is indeed less than that of three separate double bonds — by the amount 
2A. 

By comparing the energy of benzene to the energy of ethylene it is possible 
to determine A. It comes out to be 0.8 electron volt, or, in the units the chemists 
like, 18 kilocalories per mole. 

We can use this description to calculate or understand other properties of 
benzene. For example, using Fig. 15-8 we can discuss the excitation of benzene 
by light. What would happen if we tried to excite one of the electrons? It could 
move up to one of the empty higher states. The lowest energy of excitation would 
be a transition from the highest filled level to the lowest empty level. That takes 
the energy 2 A, Benzene will absorb light of frequency v when hv = 2 A. There 
will also be absorption of photons with the energies 3 A and A A. Needless to say, 
the absorption spectrum of benzene has been measured and the pattern of spectral 
lines is more or less correct except that the lowest transition occurs in the ultra- 
violet; and to fit the data one would have to choose a value of A between 1.4 and 
2.4 electron volts. That is, the numerical value of A is two or three times larger 
than is predicted from the chemical binding energy. 

What the chemist does in situations like this is to analyze many molecules 
of a similar kind and get some empirical rules. He learns, for example: For 
calculating binding energy use such and such a value of A, but for getting the 
absorption spectrum approximately right use another value of A. You may feel 


| You might think that for N an even number there are N + 1 states. That is not 
so because s — ±N/2 give the same state. 

t When there are two states (which will have different amplitude distributions) with 
the same energy, we say that the two states are “degenerate.” Notice that four electrons 
can have the energy Eq — A. 
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that this sounds a little absurd. It is not very satisfactory from the point of view 
of a physicist who is trying to understand nature from first principles. But the 
problem of the chemist is different. He must try to guess ahead of time what 
is going to happen with molecules that haven’t been made yet, or which aren’t 
understood completely. What he needs is a series of empirical rules; it doesn’t 
make much difference where they come from. So he uses the theory in quite a 
different way than the physicist. He takes equations that have some shadow of the 
truth in them, but then he must alter the constants in them — making empirical 
corrections. 

In the case of benzene, the principal reason for the inconsistency is our 
assumption that the electrons are independent — the theory we started with is 
really not legitimate. Nevertheless, it has some shadow of the truth because its 
results seem to be going in the right direction. With such equations plus some 
empirical rules — including various exceptions — the organic chemist makes his 
way through the morass of complicated things he chooses to study. (Don’t forget 
that the reason a physicist can really calculate from first principles is that he 
chooses only simple problems. He never solves a problem with 42 or even 6 
electrons in it. So far, he has been able to calculate reasonably accurately only the 
hydrogen atom and the helium atom.) 


15-5 More organic chemistry 
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Fig. 1 5-9. The valence bond repre- 
sentation of the molecule butadiene (1, 3). 
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Fig. 15-10. A line of N molecules. 



Fig. 15-11. The energy levels of 
butadiene. 


Let’s see how the same ideas can be used to study other molecules. Consider 
a molecule like butadiene (1, 3) — it is drawn in Fig. 15-9 according to the usual 
valence bond picture. 

We can play the same game with the extra four electrons corresponding to 
the two double bonds. If we remove four electrons, we have four carbon atoms 
in a line. You already know how to solve a line. You say, “Oh no, I only know 
how to solve an infinite line.” But the solutions for the infinite line also include 
the ones for a finite line. Watch. Let N be the number of atoms on the line and 
number them from 1 to vVas shown in Fig. 15-10. In writing the equations for the 
amplitude at position 1 you would not have a term feeding from position 0. 
Similarly, the equation for position TV would differ from the one that we used for 
an infinite line because there would be nothing feeding from position TV + 1. 
But suppose that we can obtain a solution for the infinite line which has the follow- 
ing property: the amplitude to be at atom 0 is zero and the amplitude to be at 
atom (TV -f- 1) is also zero. Then the set of equations for all the locations from 
1 to TV on the finite line are also satisfied. You might think no such solution exists 
for the infinite line because our solutions all looked like e lk * n which has the same 
absolute value of the amplitude everywhere. But you will remember that the en- 
ergy depends only on the absolute value of k, so that another solution, which is 
equally legitimate for the same energy, would be e~ lkx n. And the same is true of 
any superposition of these two solutions. By subtracting them we can get the 
solution sin kx n , which satisfies the requirement that the amplitude be zero at 
x = 0. It still corresponds to the energy (E 0 — 2 A cos kb). Now by a suitable 
choice for the value of k we can also make the amplitude zero at jca'+j. This 
requires that (TV + 1 )kb be a multiple of 7 r, or that 


kb = 


7 T 

(N + 1 ) S ’ 


(15.29) 


where s is an integer from 1 to TV. (We take only positive k’s because each solution 
contains -\-k and ~k \ changing the sign of k gives the same state all over again.) 
For the butadiene molecule, TV = 4, so there are four states with 


kb = 7 r/5, 27 r/5, 3t/5 , and 47 t/5. (15.30) 

We can represent the energy levels using a circle diagram similar to the one 
for benzene. This time we use a semicircle divided into five equal parts as shown 
in Fig. 15-11. The point at the bottom corresponds to s = 0, which gives no 
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state at all. The same is true of the point at the top, which corresponds to s = 
N + 1. The remaining 4 points give us four allowed energies. There are four 
stationary states, which is what we expect having started with four base states. 
In the circle diagram, the angular intervals are ir/5 or 36 degrees. The lowest 
energy comes out (£ 0 — 1.618/1). (Ah, what wonders mathematics holds; the 
golden mean of the Greeksf gives us the lowest energy state of the butadiene 
molecule according to this theory!) 

Now we can calculate the energy of the butadiene molecule when we put 
in four electrons. With four electrons, we fill up the lowest two levels, each with 
two electrons of opposite spin. The total energy is 

E = 2(E 0 - 1.618 A) + 2 (E {) - 0.618 A) = 4 (£„ - A) - 0.477/1. 

(15.31) 

This result seems reasonable. The energy is a little lower than for two simple 
double bonds, but the binding is not so strong as in benzene. Anyway this is the 
way the chemist analyzes some organic molecules. 

The chemist can use not only the energies but the probability amplitudes as 
well. Knowing the amplitudes for each state, and which states are occupied, he 
can tell the probability of finding an electron anywhere in the molecule. Those 
places where the electrons are more likely to be are apt to be reactive in chemical 
substitutions which require that an electron be shared with some other group of 
atoms. The other sites are more likely to be reactive in those substitutions which 
have a tendency to yield an extra electron to the system. 

The same ideas we have been using can give us some understanding of a 
molecule even as complicated as chlorophyll, one version of which is shown in 
Fig. 15-12. Notice that the double and single bonds we have drawn with heavy 
lines form a long closed ring with twenty intervals. The extra electrons of the 
double bonds can run around this ring. Using the independent particle method 
we can get a whole set of energy levels. There are strong absorption lines from 
transitions between these levels which lie in the visible part of the spectrum, and 
give this molecule its strong color. Similar complicated molecules such as the 
xanthophylls, which make leaves turn red, can be studied in the same way. 

There is one more idea which emerges from the application of this kind of 
theory in organic chemistry. It is probably the most successful or, at least in a 
certain sense, the most accurate. This idea has to do with the question: In what 
situations does one get a particularly strong chemical binding? The answer is very 
interesting. Take the example, first, of benzene, and imagine the sequence of events 
that occurs as we start with the six-times ionized molecule and add more and more 
electrons. We would then be thinking of various benzene ions — negative or 
positive. Suppose we plot the energy of the ion (or neutral molecule) as a function 
of the number of electrons. If we take £*0 = 0 (since we don’t know what it is), 
we get the curve shown in Fig, 15-13. For the first two electrons the slope of the 
function is a straight line. For each successive group the slope increases, and 
there is a discontinuity in slope between the groups of electrons. The slope changes 
when one has just finished filling a set of levels which all have the same energy and 
must move up to the next higher set of levels for the next electron. 

The actual energy of the benzene ion is really quite different from the curve 
of Fig. 15-13 because of the interactions of the electrons and because of electro- 
static energies we have been neglecting. These corrections will, however, vary 
with n in a rather smooth way. Even if we were to make all these corrections, the 
resulting energy curve would still have kinks at those values of n which just fill 
up a particular energy level. 

Now consider a very smooth curve that fits the points on the average like the 
one drawn in Fig. 15-14. We can say that the points above this curve have “higher- 
than-normal” energies, and the points below the curve have “lower-than-normal” 


f The ratio of the sides of a rectangle which can be divided into a square and a similar 
rectangle. 
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Fig. 15-12. A chlorophyll molecule. 



Fig. 15-13. The sum of all the elec- 
tron energies when the lowest states in 
Fig. 15-8 are occupied by n electrons 
if we take that E 0 — 0. 



Fig. 15-14. The points of Fig. 1 5-1 2 
with a smooth curve. Molecules with 
n — 2, 6, 10 are more stable than the 
others. 
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Fig. 15-15. Energy diagram for a 
ring of three. 



Fig. 15-16. The triphenyl cyclo- 
propanyl cation. 


energies. We would, in general, expect that those configurations with a lower-than- 
normal energy would have an above average stability — chemically speaking. 
Notice that the configurations farther below the curve always occur at the end of 
one of the straight line segments — namely when there are enough electrons to fill 
up an “energy shell,” as it is called. This is the very accurate prediction of the 
theory. Molecules — or ions — are particularly stable (in comparison with other 
similar configurations) when the available electrons just fill up an energy shell. 

This theory has explained and predicted some very peculiar chemical facts. 
To take a very simple example, consider a ring of three. It’s almost unbelievable 
that the chemist can make a ring of three and have it stable, but it has been done. 
The energy circle for three electrons is shown in Fig. 15-15. Now if you put two 
electrons in the lower state, you have only two of the three electrons that you re- 
quire. The third electron must be put in at a much higher level. By our argument 
this molecule should not be particularly stable, whereas the two-electron structure 
should be stable. It does turn out, in fact, that the neutral molecule of triphenyl 
cyclopropenyl is very hard to make, but that the positive ion shown in Fig. 15-16 is 
relatively easy to make. The ring of three is never really easy because there is 
always a large stress when the bonds in an organic molecule make an equilateral 
triangle. To make a stable compound at all, the structure must be stabilized in 
some way. Anyway if you add three benzene rings on the corners, the positive 
ion can be made. (The reason for this requirement of added benzene rings is not 
really understood.) 

In a similar way the five-sided ring can also be analyzed. If you draw the 
energy diagram, you can see in a qualitative way that the six-electron structure 
should be an especially stable structure, so that such a molecule should be most 
stable as a negative ion. Now the five-ring is well known and easy to make and 
always acts as a negative ion. Similarly, you can easily verify that a ring of 4 or 8 
is not very interesting, but that a ring of 14 or 10 — like a ring of 6 — should be 
especially stable as a neutral object. 


15-6 Other uses of the approximation 

There are two other similar situations which we will describe only briefly. 
In considering the structure of an atom, we can consider that the electrons fill 
successive shells. The Schrodinger theory of electron motion can be worked out 
easily only for a single electron moving in a “central” field — one which varies only 
with the distance from a point. How can we then understand what goes on in an 
atom which has 22 electrons?! One way is to use a kind of independent particle 
approximation. First you calculate what happens with one electron. You get a 
number of energy levels. You put an electron into the lowest energy state. You 
can, for a rough model, continue to ignore the electron interactions and go on 
filling successive shells, but there is a way to get better answers by taking into 
account — in an approximate way at least — the effect of the electric charge carried 
by the electron. Each time you add an electron you compute its amplitude to be 
at various places, and then use this amplitude to estimate a kind of spherically 
symmetric charge distribution. You use the field of this distribution — together 
with the field of the positive nucleus and all the previous electrons — to calculate 
the states available for the next electron. In this way you can get reasonably cor- 
rect estimates for the energies for the neutral atom and for various ionized states. 
You find that there are energy shells, just as we saw for the electrons in a ring 
molecule. With a partially filled shell, the atom will show a preference for taking 
on one or more extra electrons, or for losing some electrons so as to get into the 
most stable state of a filled shell. 

This theory explains the machinery behind the fundamental chemical 
properties which show up in the periodic table of the elements. The inert gases are 
those elements in which a shell has just been completed, and it is especially difficult 
to make them react. (Some of them do react of course — with fluorine and oxygen, 
for example; but such compounds are very weakly bound; the so-called inert 
gases are nearly inert.) An atom which has one electron more or one electron less 
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than an inert gas will easily lose or gain an electron to get into the especially stable 
(low-energy) condition which comes from having a completely filled shell — they 
are the very active chemical elements of valence +1 or — 1. 

The other situation is found in nuclear physics. In atomic nuclei the protons 
and neutrons interact with each other quite strongly. Even so, the independent 
particle model can again be used to analyze nuclear structure. It was first discovered 
experimentally that nuclei were especially stable if they contained certain particular 
numbers of neutrons— namely 2, 8, 20, 28, 50, 82. Nuclei containg protons 
in these numbers are also especially stable. Since there was initially no explanation 
for these numbers they were called the “magic numbers” of nuclear physics. It is 
well known that neutrons and protons interact strongly with each other; people 
were, therefore, quite surprised when it was discovered that an independent 
particle model predicted a shell structure which came out with the first few magic 
numbers. The model assumed that each nucleon (proton or neutron) moved in a 
central potential which was created by the average effects of all the other nucleons. 
This model failed, however, to give the correct values for the higher magic numbers. 
Then it was discovered by Maria Mayer, and independently by Jensen and his 
collaborators, that by taking the independent particle model and adding only a 
correction for what is called the “spin-orbit interaction,” one could make an 
improved model which gave all of the magic numbers. (The spin-orbit interaction 
causes the energy of a nucleon to be lower if its spin has the same direction as its 
orbital angular momentum from motion in the nucleus.) The theory gives even 
more — its picture of the so-called “shell structure” of the nuclei enables us to 
predict certain characteristics of nuclei and of nuclear reactions. 

The independent particle approximation has been found useful in a wide 
range of subjects — from solid-state physics, to chemistry, to biology, to nuclear 
physics. It is often only a crude approximation, but is able to give an understanding 
of why there are especially stable conditions — in shells. Since it omits all of the 
complexity of the interactions between the individual particles, we should not be 
surprised that it often fails completely to give correctly many important details. 
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The Dependence of Amplitudes on Position 


16-1 Amplitudes on a line 

We are now going to discuss how the probability amplitudes of quantum 
mechanics vary in space. In some of the earlier chapters you may have had a 
rather uncomfortable feeling that some things were being left out. For example, 
when we were talking about the ammonia molecule, we chose to describe it in terms 
of two base states. For one base state we picked the situation in which the nitrogen 
atom was “above" the plane of the three hydrogen atoms, and for the other base 
state we picked the condition in which the nitrogen atom was “below" the plane 
of the three hydrogen atoms. Why did we pick just these two states? Why is it 
not possible that the nitrogen atom could be at 2 angstroms above the plane of the 
three hydrogen atoms, or at 3 apgstroms, or at 4 angstroms above the plane? 
Certainly, there are many positions that the nitrogen atom could occupy. Again 
when we talked about the hydrogen molecular ion, in which there is one electron 
shared by two protons, we imagined two base states: one for the electron in the 
neighborhood of proton number one, and the other for the electron in the neigh- 
borhood of proton number two. Clearly we were leaving out many details. The 
electron is not exactly at proton number two but is only in the neighborhood. 
It could be somewhere above the proton, somewhere below the proton, somewhere 
to the left of the proton, or somewhere to the right of the proton. 

We intentionally avoided discussing these details. We said that we were 
interested in only certain features of the problem, so we were imagining that when 
the electron was in the vicinity of proton number one, it would take up a certain 
rather definite condition. In that condition the probability to find the electron 
would have some rather definite distribution around the proton, but we were not 
interested in the details. 

We can also put it another way. In our discussion of a hydrogen molecular 
ion we chose an approximate description when we described the situation in terms 
of two base states. In reality there are lots and lots of these states. An electron can 
take up a condition around a proton in its lowest, or ground, state, but there are 
also many excited states. For each excited state the distribution of the electron 
around the proton is different. We ignored these excited states, saying that we 
were interested in only the conditions of low energy. But it is just these other 
excited states which give the possibility of various distributions of the electron 
around the proton. If we want to describe in detail the hydrogen molecular ion, 
we have to take into account also these other possible base states. We could do 
this in several ways, and one way is to consider in greater detail states in which the 
location of the electron in space is more carefully described. 

We are now ready to consider a more elaborate procedure which will allow 
us to talk in detail about the position of the electron, by giving a probability 
amplitude to find the electron anywhere and everywhere in a given situation. This 
more complete theory provides the underpinning for the approximations we have 
been making in our earlier discussions. In a sense, our early equations can be 
derived as a kind of approximation to the more complete theory. 

You may be wondering why we did not begin with the more complete theory 
and make the approximations as we went along. We have felt that it would be 
much easier for you to gain an understanding of the basic machinery of quantum 
mechanics by beginning with the two-state approximations and working gradually 
up to the more complete theory than to approach the subject the other way around. 
For this reason our approach to the subject appears to be in the reverse order to 
the one you will find in many books. 
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As we go into the subject of this chapter you will notice that we are breaking 
a rule we have always followed in the past. Whenever we have taken up any 
subject we have always tried to give a more or less complete description of the 
physics — showing you as much as we could about where the ideas led to. We 
have tried to describe the general consequences of a theory as well as describing 
some specific detail so that you could see where the theory would lead. We are 
now going to break that rule; we are going to describe how one can talk about 
probability amplitudes in space and show you the differential equations which 
they satisfy. We will not have time to go on and discuss many of the obvious 
implications which come out of the theory. Indeed we will not even be able to get 
far enough to relate this theory to some of the approximate formulations we have 
used earlier — for example, to the hydrogen molecule or to the ammonia molecule. 
For once, we must leave our business unfinished and open-ended. We are approach- 
ing the end of our course, and we must satisfy ourselves with trying to give you an 
introduction to the general ideas and with indicating the connections between what 
we have been describing and some of the other ways of approaching the subject 
of quantum mechanics. We hope to give you enough of an idea that you can go 
off by yourself and by reading books learn about many of the implications of the 
equations we are going to describe. We must, after all, leave something for the 
future. 

Let’s review once more what we have found out about how an electron can 
move along a line of atoms. When an electron has an amplitude to jump from 
one atom to the next, there are definite energy states in which the probability ampli- 
tude for finding the electron is distributed along the lattice in the form of a travel- 
ing wave. For long wavelengths — for small values of the wave number k — the 
energy of the state is proportional to the square of the wave number. For a crystal 
lattice with the spacing b , in which the amplitude per unit time for the electron to 
jump from one atom to the next is iA/h, the energy of the state is related to k 
(for small kb) by 


E - Ak 2 b 2 


(16.1) 


(see Section 13-3). We also saw that groups of such waves with similar energies 
would make up a wave packet which would behave like a classical particle with a 
mass m e{ i given by: 


h 2 

m 'tt 2 Ab* 


(16.2) 


Since waves of probability amplitude in a crystal behave like a particle, one 
might well expect that the general quantum mechanical description of a particle 
would show the same kind of wave behavior we observed for the lattice. Suppose 
we were to think of a lattice on a line and imagine that the lattice spacing b were to 
be made smaller and smaller. In the limit we would be thinking of a case in which 
the electron could be anywhere along the line. We would have gone over to a 
continuous distribution of probability amplitudes. We would have the amplitude 
to find an electron anywhere along the line. This would be one way to describe 
the motion of an electron in a vacuum. In other words, if we imagine that space can 
be labeled by an infinity of points all very close together and we can work out the 
equations that relate the amplitudes at one point to the amplitudes at neighboring 
points, we will have the quantum mechanical laws of motion of an electron in space. 

Let’s begin by recalling some of the general principles of quantum mechanics. 
Suppose we have a particle which can exist in various conditions in a quantum 
mechanical system. Any particular condition an electron can be found in, we call 
a “state,” which we label with a state vector, say | <f>). Some other condition would 
be labeled with another state vector, say | ^). We then introduce the idea of base 
states. We say that there is a set of states | 1), | 2), | 3), | 4), and so on, which 
have the following properties. First, all of these states are quite distinct — we say 
they are orthogonal. By this we mean that for any two of the base states | i) and 
\j\ the amplitude (/ | j) that an electron known to be in the state | i) is also in the 
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state | j) is equal to zero— unless, of course, | /) and |./> stand for the same state. 
We represent this symbolically by 


{i | j) = Sij. (16.3) 

You will remember that 6,-y = 0 if / and j are different, and 5,y = 1 if i and./ are 
the same number. 

Second, the base states | i) must be a complete set, so that any state at all can 
be described in terms of them. That is, any state | <f>) at all can be described com- 
pletely by giving all of the amplitudes (i | <t>) that a particle in the state | <fi) will 
also be found in the state | /'). In fact, the state vector | 4>) is equal to the sum of 
the base states each multiplied by a coefficient which is the amplitude of the 
state | <j>) is also in the state | i): 

\<t>) = EIM»- 06.4) 

i 

Finally, if we consider any two states | <f>) and | \p), the amplitude that the state 
| yj/} will also be in the state | <£) can be found by first projecting the state | \p) into 
the base states and then projecting from each base state into the state | 4>). We 
write that in the following way: 

<*|*> = (16-5) 

i 

The summation is, of course, to be carried out over the whole set of base state | /}. 

In Chapter 13 when we were working out what happens with an electron placed 
on a linear array of atoms, we chose a set of base states in which the electron was 
localized at one or other of the atoms in the line. The base state | n) represented 
the condition in which the electron was localized at atom number “n.” (There is, 
of course, no significance to the fact that we called our base states | n) instead of 
| /).) A little later, we found it convenient to label the base states by the coordinate 
x n of the atom rather than by the number of the atom in the array. The state 
| x n ) is just another way of writing the state | n). Then, following the general rules, 
any state at all, say | \p) is described by giving the amplitudes and that an electron 
in the state 1 i p) is also in one of the states | x n ). For convenience we have chosen 
to let the symbol C n stand for these amplitudes, 

C n =' <* n | *>. (16.6) 

Since the base states are associated with a location along the line, we can think 
of the amplitude C n as a function of the coordinate * and write it as C(x n ), The 
amplitudes C(x n ) will, in general, vary with time and are, therefore, also functions 
of t , We will not generally bother to show explicitly this dependence. 

In Chapter 13 we then proposed that the amplitudes C(x n ) should vary with 
time in a way described by the Hamiltonian equation (Eq. 13.3). In our new 
notation this equation is 

ih = E 0 C(x n ) - AC(x„ + b)- AC(x n - b). (16.7) 

Ot 

The last two terms on the right-hand side represent the process in which an electron 
at atom (n + 1) or at atom (n — 1) can feed into atom n. 

We found that Eq. (16.7) has solutions corresponding to definite energy states, 
which we wrote as 

C(x n ) - e iEtih e ikx 1K ( 16 . 8 ) 

For the low-energy states the wavelengths are large (k is small), and the energy is 
related to k by 

E = (Eq - 2 A) + Ak 2 b 2 , (16.9) 

or, choosing our zero of energy so that (E 0 — 2 A) — 0, the energy is given by 
Eq. (16.1). 
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Let's see what might happen if we were to let the lattice spacing b go to zero, 
keeping the wave number k fixed. If that is all that were to happen the last term 
in Eq. (16.9) would just go to zero and there would be no physics. But suppose 
A and b are varied together so that as b goes to zero the product Ab 2 is kept 
constantf — using Eq. (16.2) we will write Ab 2 as the constant h 2 /2m C f f . Under 
these circumstances, Eq. (16.9) would be unchanged, but what would happen to the 
differential equation (16.7)? 

First we will rewrite Eq. (16.7) as 

ih = (E 0 - 2A)C(x n ) + A[2C(x n ) - C(x n + b) - C(x n - b)}. 

(16.10) 

For our choice of E Q , the first term drops out. Next, we can think of a continuous 
function C(x) that goes smoothly through the proper values C(x n ) at each * n . As 
the spacing b goes to zero, the points x n get closer and closer together, and (if we 
keep the variation of C(x) fairly smooth) the quantity in the brackets is just pro- 
portional to the second derivative of C(x). We can write — as you can see by making 
a Taylor expansion of each term — the equality 

2 C(x) - C(x + b) - C(x - b) « —b 2 • (16.11) 


In the limit, then, as b goes to zero, keeping b 2 A equal to K, Eq. (16.7) goes over 
into 


.. 6C(x) h 2 S 2 C(x) 

lh ~JT= 


(16.12) 


We have an equation which says that the time rate of change of C(x) — the ampli- 
tude to find the electron at x — depends on the amplitude to find the electron at 
nearby points in a way which is proportional to the second derivative of the 
amplitude with respect to position. 

The correct quantum mechanical equation for the motion of an electron in 
free space was first discovered by Schrodinger. For motion along a line it has 
exactly the form of Eq. (16.12) if we replace m e a by m, the free-space mass of the 
electron. For motion along a line in free space the Schrodinger equation is 


3C(x) _ h 2 d 2 C(x ) 

tft dt 2m 5x2 


(16.13) 


We do not intend to have you think we have derived the Schrodinger equation 
but only wish to show you one way of thinking about it. When Schrodinger first 
wrote it down, he gave a kind of derivation based on some heuristic arguments and 
some brilliant intuitive guesses. Some of the arguments he used were even false, but 
that does not matter; the only important thing is that the ultimate equation gives 
a correct description of nature. The purpose of our discussion is then simply to 
show you that the correct fundamental quantum mechanical equation (16.13) 
has the same form you get for the limiting case of an electron moving along a line 
of atoms. This means that we can think of the differential equation in (16.13) 
as describing the diffusion of a probability amplitude from one point to the ne^t 
along the line. That is, if an electron has a certain amplitude to be at one point, it 
will, a little time later, have some amplitude to be at neighboring points. In fact, 
the equation looks something like the diffusion equations which we have used in 
Volume I. But there is one main difference: the imaginary coefficient in front of 
the time derivative makes the behavior completely different from the ordinary 
diffusion such as you would have for a gas spreading out along a thin tube. Ordi- 
nary diffusion gives rise to real exponential solutions, whereas the solutions of 
Eq. (16.13) are complex waves. 


f Vou can imagine that as the points x n get closer together, the amplitude A to jump 
from x n i to x n will increase. 
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16-2 The wave function 

Now that you have some idea about how things are going to look, we want 
to go back to the beginning and study the problem of describing the motion of an 
electron along a line without having to consider states connected with atoms on a 
lattice. We want to go back to the beginning and see what ideas we have to use 
if we want to describe the motion of a free particle in space. Since we are interested 
in the behavior of a particle along a continuum, we will be dealing with an infinite 
number of possible states and, as you will see, the ideas we have developed for 
dealing with a finite number of states will need some technical modifications. 

We begin by letting the state vector | x) stand for a state in which a particle is 
located precisely at the coordinate x . For every value x along the line — for instance 
1.73, or 9.67, or 10.00 — there is the corresponding state. We will take these states 
| x) as our base states and, if we include all the points on the line, we will have 
a complete set for motion in one dimension. Now suppose we have a different 
kind of a state, say | \p), in which an electron is distributed in some way along the 
line. One way of describing this state is to give all the amplitudes that the electron 
will be also found in each of the base states | x). We must give an infinite set of 
amplitudes, one for each value of x. We will write these amplitudes as (x \ \p). 
Each of these amplitudes is a complex number and since there is one such complex 
number for each value of x, the amplitude (x \ \p) is indeed just a function of x, 
We will also write it as C(x), 


C(x)= <x|*>. (16.14) 

We have already considered such amplitudes which vary in a continuous way 
with the coordinates when we talked about the variations of amplitude with time 
in Chapter 7. We showed there, for example, that a particle with a definite mo- 
mentum should be expected to have a particular variation of its amplitude in 
space. If a particle has a definite momentum p and a corresponding definite energy 
E , the amplitude to be found at any position x would look like 

(x | \p) — C(x) oc e +ipxl \ (16.15) 

This equation expresses an important general principle of quantum mechanics which 
connects the base states corresponding to different positions in space to another 
system of base states — all the states of definite momentum. The definite momentum 
states are often more convenient than the states in x for certain kinds of problems. 
Either set of base states is, of course, equally acceptable for a description of a 
quantum mechanical situation. We will come back later to the matter of the 
connection between them. For the moment we want to stick to our discussion of 
a description in terms of the states | x). 

Before proceeding, we want to make one small change in notation which we 
hope will not be too confusing. The function C(x), defined in Eq. (16.14), will 
of course have a form which depends on the particular state | under considera- 
tion. We should indicate that in some way. We could, for example, specify which 
function C(x) we are talking about by a subscript say, C^(x). Although this would 
be a perfectly satisfactory notation, it is a little bit cumbersome and is not the one 
you will find in most books. Most people simply omit the letter C and use the 
symbol ^ to define the function 

*(x) = CV(x) = <x|^>. (16.16) 

Since this is the notation used by everybody else in the world, you might as well 
get used to it so that you will not be frightened when you come across it somewhere 
else. Remember though, that we will now be using ^ in two different ways. In 
Eq. (16.14), \p stands for a label we have given to a particular physical state of the 
electron. On the left-hand side of Eq. (16.16), on the other hand, the symbol 
is used to define a mathematical function of x which is equal to the amplitude to 
be associated with each point x along the line. We hope it will not be too confusing 
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once you get accustomed to the idea. Incidentally, the function is usually 
called “the wave function” — because it more often than not has the form of a com- 
plex wave in its variables. 

Since we have defined \l/(x) to be the amplitude that an electron in the state p 
will be found at the location x, we would like to interpret the absolute square of 
p to be the probability of finding an electron at the position x. Unfortunately, the 
probability of finding a particle exactly at any particular point is zero. The electron 
will, in general, be smeared out in a certain region of the line, and since, in any 
small piece of the line, there are an infinite number of points, the probability that 
it will be at any one of them cannot be a finite number. We can only describe the 
probability of finding an electron in terms of a probability distribution f which gives 
the relative probability of finding the electron at various approximate locations 
along the line. Let’s let prob (x, Ax) stand for the chance of finding the electron 
in a small interval Ax located near x. If we go to a small enough scale in any 
physical situation, the probability will be varying smoothly from place to place, 
and the probability of finding the electron in any small finite line segment Ax will 
be proportional to Ax. We can modify our definitions to take this into account. 

We can think of the amplitude (x | p) as representing a kind of “amplitude 
density” for all the base states | x) in a small region. Since the probability of 
finding an electron in a small interval Ax at x should be proportional to the interval 
Ax, we choose our definition of (x | p) so that the following relation holds: 

prob (x, Ax) = |<x | p)\ 2 Ax. 

The amplitude <x | p) is therefore proportional to the amplitude that an electron 
in the state p will be found in the base state x and the constant of proportionality 
is chosen so that the absolute square of the amplitude (x | yp) gives the probability 
density of finding an electron in any small region. We can write, equivalently, 

prob (x, Ax) = |^(x)| 2 Ax. (16.17) 

We will now have to modify some of our earlier equations to make them 
compatible with this new definition of a probability amplitude. Suppose we have 
an electron in the state | yp) and we want to know the amplitude for finding it in a 
different state | <p) which may correspond to a different spread-out condition 
of the electron. When we were talking about a finite set of discrete states, we would 
have used Eq. (16.5). Before modifying our definition of the amplitudes we would 
have written 

(<*> | f) = X) fa I x)(x I f). (16.18) 

all x 

Now if both of these amplitudes are normalized in the same way as we have de- 
scribed above, then a sum of all the states in a small region of x would be equivalent 
to multiplying by Ax, and the sum over all values of x simply becomes an integral. 
With our modified definitions, the correct form becomes 

{<P | P) = f (<i> | *){x | P) dx. (16.19) 

Jail x 

The amplitude (x | p) is what we are now calling ^(x) and, in a similar way, 
we will choose to let the amplitude (x | p) be represented by p(x), Remembering 
that (<f> | x) is the complex conjugate of (x | p), we can write Eq. (16.18) as 

(<t> 1^) = f<i>* (xypix) dx. (16.20) 

With our new definitions everything follows with the same formulas as before if 
you always replace a summation sign by an integral over x. 

We should mention one qualification to what we have been saying. Any 
suitable set of base states rnust be complete if it is to be used for an adequate 


t For a discussion of probability distributions see Vol. I, Section 6-4. 
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description of what is going on. For an electron in one dimension it is not really 
sufficient to specify only the base states | x), because for each of these states the 
electron may have a spin which is either up or down. One way of getting a complete 
set is to take two sets of states in x, one for up spin and the other for down spin. 
We will, however, not worry about such complications for the time being. 

16-3 States of definite momentum 

Suppose we have an electron in a state | p) which is described by the prob- 
ability amplitude (x \ p) = p(x). We know that this represents a state in which 
the electro!! is spread out along the line in a certain distribution so that the prob- 
ability of finding the electron in a small interval dx at the location x is just 

prob (x, dx) — \p(x)\ 2 dx. 

What can we say about the momentum of this electron? We might ask what is 
the probability that this electron has the momentum pi Let’s start out by cal- 
culating the amplitude that the state [ p) is in another state | mom p) which we 
define to be a state with the definite momentum p. We can find this amplitude by 
using our basic equation for the resolution of amplitudes, Eq. (16.20). In terms 
of the state | mom p) 

(mom p | p) = / (mom p \ x)(x \ p) dx. (16.21) 

J X— — 00 

And the probability that the electron will be found with the momentum p should 
be given in terms of the absolute square of this amplitude. We have again, however, 
a small problem about the normalizations. In general we can only ask about the 
probability of finding an electron with a momentum in a small range dp at the 
momentum p . The probability that the momentum is exactly some value p must be 
zero (unless the state | p) happens to be a state of definite momentum). Only if we 
ask for the probability of finding the momentum in a small range dp at the mo- 
mentum p will we get a finite probability. There are several ways the normalizations 
can be adjusted. We will choose one of them which we think to be the most 
convenient, although that may not be apparent to you just now. 

We take our normalizations so that the probability is related to the amplitude 
by 

prob ( p , dp) = | (mom p | ^>| 2 • (16.22) 

With this definition the normalization of the amplitude (mom | jc) is determined. 
The amplitude (mom p \ x) is, of course, just the complex conjugate of the ampli- 
tude (x | mom/?), which is just the one we have written down in Eq. (16.15). 
With the normalization we have chosen, it turns out that the proper constant of 
proportionality in front of the exponential is just 1. Namely, 

(mom p | x) = (x | mom p)* — e~‘ lpxir> . (16.23) 

Equation (16.21) then becomes 

(mom p\p) — J + e~ ipxin (x \ p) dx. (16.24) 

This equation together with Eq. (16.22) allows us to find the momentum distribu- 
tion for any state ( p). 

Let’s look at a particular example — for instance one in which an electron 
is localized in a certain region around a: — 0. Suppose we take a wave function 
which has the following form: 


P(x) = Ke~ xliA<T \ (16.25) 

The probability distribution in * for this wave function is the absolute square, or 

prob (x, dx) = P(x) dx = K 2 e~ x2j 2ff2 dx. (16.26) 

16-7 



The probability density function P(x ) is the Gaussian curve shown in Fig. 16-1. 
Most of the probability is concentrated between jc = +cr and jc = —a. We say 
that the “half-width” of the curve is <r. (More precisily, <x is equal to the root-mean- 
square of the coordinate jc for something spread out according to this distribution.) 
We would normally choose the constant K so that the probability density P(jc) 
is not merely proportional to the probability per unit length in jc of finding the 
electron, but has a scale such that P(x ) Ajc is equal to the probability of finding 
the electron in Ax near jc. The constant K which does this can be found by requiring 
that P(x) dx = 1, since there must be unit probability that the electron is 
found somewhere. Here, we get that K = (2tt < r 2 )“ 1/4 . [We have used the fact 
that e~ t2 dt = \Zw; see Vol. I, page 40-6.] 


Fig. 16—1. The probability density 
for the wave function of Eq. (16.24). 



Now let’s find the distribution in momentum. Let’s let <j>(p) stand for the 
amplitude to find the electron with the momentum /?, 

4>(p) = (mom p | ^). (16,27) 

Substituting Eq. (16.25) into Eq. (16.24) we get 

*00 = r* e~ ipxlh ■ Ke~ x2ji ° 2 dx. (16.28) 

J — oo 

the intregral can also be rewritten as 

Ke -pWl*2 r+* e -(l/4„’)(*+2 itnWfa' (16.29) 

We can now make the substitution u = jc + 2ipa 2 /h, and the integral is 

f +K e-^'^du = 2 <tVt. (16.30) 

(The mathematicians would probably object to the way we got there, but the result 
is, nevertheless, correct.) 


*(/0 = (8? r<r 2 ) 1/4 e-v 2ff2 t*\ (16.31) 

We have the interesting result that the amplitude function in p has precisely 

the same mathematical form as the amplitude function in jc; only the width of the 
Gaussian is different. We can write this as 

*CP) = (27n 7 2 r 1/4 e- J,2/4 ’> 2 , (16.32) 

where the half-width -q of the /^-distribution function is related to the half-width <x 
of the x-distribution by 

V = ~- (16.33) 
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Our result says: if we make the width of the distribution in x very small by 
making cr small, 7) becomes large and the distribution in p is very much spread out. 
Or, conversely: if we have a narrow distribution in p, it must correspond to a 
spread-out distribution in x. We can, if we like, consider 17 and a to be some meas- 
ure of the uncertainty in the localization of the momentum and of the position of 
the electron in the state we are studying. If we call them Ap and Ax respectively 
Eq. (16.33) becomes 

Ap Ax — - ■ (16.34) 

Interestingly enough, it is possible to prove that for any other form of a 
a distribution in x or in /?, the product Ap Ax cannot be smaller than the one 
we have found here. The Gaussian distribution gives the smallest possible value 
for the product of the root-mean-square widths. In general, we can say 

ApAx > -■ (16.35) 

This is a quantatative statement of the Heisenberg uncertainty principle, which we 
have discussed qualitatively many times before. We have usually made the ap- 
proximate statement that the minimum value of the product Ap Ax is of the same 
order as h. 


16-4 Normalization of the states in x 

We return now to the discussion of the modifications of our basic equations 
which are required when we are dealing with a continuum of base states. When 
we have a finite number of discrete states, a fundamental condition which must be 
satisfied by the set of base states is 


(i\j) = (16.36) 

If a particle is in one base state, the amplitude to be in another base state is 0. By 
choosing a suitable normalization, we have defined the amplitude {/ 1 /) to be 1 . 
These two conditions are described by Eq. (16.36). We want now to see how this 
relation must be modified when we use the base states |x) of a particle on a 
line. If the particle is known to be in one of the base states |x), what is the 
amplitude that it will be in anojher base state | x')l If x and x' are two different 
locations along the line, thenjthe amplitude (x | x') is certainly 0, so that is 
consistent with Eq. (16.36). But if x and x ' are equal, the amplitude (x | x') will 
not be 1, because of the same old normalization problem. To see how we have to 
patch things up, we go back to Eq. (16.19), and apply this equation to the special 
case in which the state | <(>) is just the base state | x'). We would have then 

<•*' | i) = J(x' I x) \j/{x) dx. (16.37) 

Now the amplitude (x | \p) is just what we have been calling the function \p(x). 
Similarly the amplitude ( x ' 1 1 /'), since it refers to the same state is the same func- 
tion of the variable x', namely \fr(x'). We can, therefore, rewrite Eq. (16.37) as 

\p(x f ) = J (x f | x) i^(x) dx. (16.38) 

This equation must be true for any state \p and, therefore, for any arbitrary function 
^(x). This requirement should completely determine the nature of the amplitude 
(x | x') — which is, of course, just a function that depends on x and x'. 

Our problem now is to find a function /(x, x') which when multiplied into 
\p(x), and integrated over all x gives just the quantity ^(x'). It turns out that there 
is no mathematical function which will do this! At least nothing like what we 
ordinarily mean by a “function.” 
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Suppose we pick x' to be the special number 0 and define the amplitude 
(0 | x) to be some function of x, let’s say f{x). Then Eq. (16.36) would read as 
follows: 

lK0) = jfixyp{x) dx. (16.39) 

What kind of function f(x) could possibly satisfy this equation? Since the integral 
must not depend on what values \p(x) takes for values of x other than 0, /(x) 
must clearly be 0 for all values of x except 0. But if /(x) is 0 everywhere, the 
integral will be 0, too, and Eq. (16.39) will not be satisfied. So we have an im- 
possible situation: we wish a function to be 0 everywhere but at a point, and still 
to give a finite integral. Since we can’t find a function that does this, the easiest 
way out is just to say that the function /(x) is defined by Eq. (16.37). Namely, 
/(x) is that function which makes (16.39) correct. The function which does this 
was first invented by Dirac and carries his name. We write it 5(x). All we are say- 
ing is that the function 5(x) has the strange property that if it is substituted for 
/(x) in the Eq. (16.39), the integral picks out the value that ^(x), takes on when 
x is equal 0; and, since the integral must be independent of ^(x) for all values 
of x other than 0, the function 5(x) must be 0 everywhere except at x = 0. Sum- 
marizing, we write 


(0 | x) = 5(x), (16.40) 

where 5(x) is defined by 

*(0) = f i(xMx) dx. (16.41) 

Notice what happens if we use the special function “1” for the function \p in Eq. 
(16.41). Then we have the result 


1 = 


J 5(x) dx. 


(16.42) 



Fig. 16-2. A set of functions, all of 
unit area, which look more and more 
like 5(x). 


That is, the function 5(x) has the property that it is 0 everywhere except at x = 0 
but has a finite integral equal to unity. We must imagine that the function 
5(x) has such a fantastic infinity at one point that the total area comes out equal 
to one. 

One way of imagining what the Dirac 5-function is like is to think of a sequence 
of rectangles — or any other peaked function you care to — which gets narrower 
and narrower and higher and higher, always keeping a unit area, as sketched in 
Fig. 16-2. The integral of this function from — oo to +oo is always 1. If you 
multiply it by any function ^(x) and integrate the product, you get something 
which is approximately the value of the function at x = 0, the approximation 
getting better and better as you use the narrower and narrower rectangles. You 
can if you wish, imagine the 5-function in terms of this kind of limiting process. 
The only important thing, however, is that the 5-function is defined so that Eq. 
(16.41) is true for every possible function \p(x). That uniquely defines the 5-function. 
Its properties are then as we have described. 

If we change the argument of the 5-function from x to x — x', the corre- 
sponding relations are 


5(x — x') = 0, x r 9 ^ x, 

j 5(x — x f yp(x)dx = $(x f ). (16.43) 

If we use 5(x — x') for the amplitude (x | x' ) in Eq, (16.38), that equation is 
satisfied. Our result then is that for our base states in x, the condition corre- 
sponding to (16.36) is 
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(x' | x) = 5(x — x'). 


(16.44) 



We have now completed the necessary modifications of our basic equations 
which are necessary for dealing with the continuum of base states corresponding 
to the points along a line. The extension to three dimensions is fairly obvious; 
first we replace the coordinate x by the vector r. Then integrals over x become re- 
placed by integrals over x, y , and z. In other words, they become volume integrals. 
Finally, the one-dimensional 5-function must be replaced by just the product of 
three 5-functions, one in x, one in y 9 and the other in z, 5(x — x') 8(y — /) 
5(z — z'). Putting everything together we get the following set of equations for 
the amplitudes for particle in three dimensions: 


<*|*> = /<*|r><r|*>dVol, 

(16.45) 

(r \f) = iK*0> 

(16.46) 

<»• I 0) = 


(<m> = /«*(rV(r)dVol. 

(16.47) 

<r' | r) = S(x - x') S(y - /) S(z - z'), 

(16.48) 


What happens when there is more than one particle? We will tell you about 
how to handle two particles and you will easily see what you must do if you want 
to deal with a larger number. Suppose there are two particles, which we can call 
particle No. 1 and particle No. 2. What shall we use for the base states? One 
perfectly good set can be described by saying that particle 1 is at x x and particle 
2 is at x 2 , which we can write as | XiX 2 ). Notice that describing the position of 
only one particle does not define a base state. Each base state must define the 
condition of the entire system. You must not think that each particle moves inde- 
pendently as a wave in three dimensions. Any physical state | $) can be defined 
by giving all of the amplitudes {x l9 x 2 \ $) to find the two particles at xi and x 2 . 
This generalized amplitude is therefore a function of the two sets of coordinates 
Xi and x 2 . You see that such a function is not a wave in the sense of an oscillation 
that moves along in three dimensions. Neither is it generally simply a product of 
two individual waves, one for each particle. It is, in general, some kind of a wave 
in the six dimensions defined by x a and x 2 . If there are two particles in nature 
which are interacting, there is no way of describing what happens to one of the 
particles by trying to write down a wave function for it alone. The famous para- 
doxes that we considered in earlier chapters — where the measurements made on 
one particle were claimed to be able to tell what was going to happen to another 
particle, or were able to destroy an interference — have caused people all sorts 
of trouble because they have tried to think of the wave function of one particle 
alone, rather than the correct wave function in the coordinates of both particles. 
The complete description can be given correctly only in terms of functions of the 
coordinates of both particles. 

16-5 The Schrodinger equation 

So far we have just been worrying about how we can describe states which 
may involve an electron being anywhere at all in space. Now we have to worry 
about putting into our description the physics of what can happen in various 
circumstances. As before, we have to worry about how states can change with time. 
If we have a state | \p) which goes over into another state | $') sometime later, 
we can describe the situation for all times by making the wave function — which 
is just the amplitude (r | \p ) — a function of time as well as a function of the co- 
ordinate. A particle in a given situation can then be described by giving a time- 
varying wave function ^(r, /) — ^(x, y 9 z, /). This time-varying wave function 
describes the evolution of successive states that occur as time develops. This 
so-called “coordinate representation” — which gives the projections of the state 
| \p) into the base states | r) may not always be the most convenient one to use — 
but we will consider it first. 
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In Chapter 8 we described how states varied in time in terms of the Hamilto- 
nian Hij. We saw that the time variation of the various amplitudes was given in 
terms of the matrix equation 


ih ~dT = E Hi &- (16.49) 

3 

This equation says that the time variation of each amplitude C z is proportional to 
all of the other amplitudes C h with the coefficients . 

How would we expect Eq, (16.49) to look when we are using the continuum 
of base states | x)l Let s first remember that Eq. (16.49) can also be written as 

ifl Jt ^ ^ ^ = E (i I 8 1 •/)(•/ I '/')• 

3 

Now it is clear what we should do. For the ^-representation we would expect 

ih Jt ( x I ^ = /<* I # I x'W I *) dx> • (16.50) 

The sum over the base states |y), gets replaced by an integral over Since 
{x\fi\ x') should be some function of * and x', we can write it as H(x, x')— which 
corresponds to H h in Eq. (16.49). Then Eq. (16.50) is the same as 


with 




j H{x, x'U(x’) dx' 


H{x, x f ) ={x\fi I x'). 


(16.51) 


According to Eq. (16.51), the rate of change of the at x would depend on the 
value of ^ at all other points x r ; the factor H(x, x') is the amplitude per unit time 
that the electron will jump from x' to x. It turns out in nature , however, that this 
amplitude is zero except for points x ' very close to x. This means— as we saw in the 
example of the chain of atoms at the beginning of the chapter, Eq. (16.12) — that 
the right-hand side of Eq. (16.15) can be expressed completely in terms of \p and 
the derivatives of \[/ with respect to x, all evaluated at the position *. 

For a particle moving freely in space with no forces, no disturbances, the 
correct law of physics is 


/ 


H(x, X f }p(x f ) dx f 


2m dx 2 


M*)- 


Where did we get that from? Nowhere. It’s not possible to derive it from anything 
you know. It came out of the mind of Schrodinger, invented in his struggle to 
find an understanding of the experimental observations of the real world. You can 
perhaps get some clue of why it should be that way by thinking of our derivation 
of Eq. (16.12) which came from looking at the propagation of an electron in a 
crystal. 

Of course, free particles are not very exciting. What happens if we put forces 
on the particle? Well, if the force of a particle can be described in terms of a scalar 
potential V{x)— which means we are thinking of electric forces but not magnetic 
forces and if we stick to low energies so that we can ignore complexities which 
come from relativistic motions, then the Hamiltonian which fits the real world 
gives 

f h 2 d 2 

I H(x, x')J,(x f )dx' = - — — *(x) + V(xyp(x). (16.52) 

Again, you can get some clue as to the origin of this equation if you go back to 
the motion of an electron in a crystal, and see how the equations would have to 
be modified if the energy of the electron varied slowly from one atomic site to 
the other— as it might do if there were an electric field across the crystal. Then 
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the term E 0 in Eq. (16.7) would vary slowly with position and would correspond 
to the new term we have added in (16.52). 

[You may be wondering why we went straight from Eq. (16.51) to Eq. (16.52) 
instead of just giving you the correct function for the amplitude H(x , x') = 
(x\H\ x'). We did that because H(x f x') can only be written in terms of strange 
algebraic functions, although the whole integral on the right-hand side of Eq. 

(16.51) comes out in terms of things you are used to. If you are really curious, 
H(x , x') can be written in the following way: 

h 2 

H(x , x’) = - - 2 ~ 8"(x - x') + V(x) 8(x - x'l 

where 8" means the second derivative of the delta function. This rather strange 
function can be replaced by a somewhat more convenient algebraic differential 
operator, which is completely equivalent: 

"<*• *■>-(-££+ M - * >• 

We will not be using these forms, but will work directly with the form in Eq. 

(16.52) .] 

If we now use the expression we have in (16.52) for the integral in (16.50) we 
get the following differential equation for \f/(x) = (* | ^): 

ih d ft = “ 27^ Kx) + V(xMx). (16.53) 

It is fairly obvious what we should use instead of Eq. (16.53) if we are inter- 
ested in motion in three dimensions. The only changes are that d 2 /dx 2 gets 
replaced by 

dx 2 ^ dy* ^ dz 2 

and V(x) gets replaced by V(x, y 9 z). The amplitude $(x, y, z) for an electron 
moving in a potential V(x, y, z) obeys the differential equation 

“ I - - £ + V *■ <>«■*> 

It is called the Schrodinger equation, and was the first quantum-mechanical 
equation ever known. It was written down by Schrodinger before any of the other 
quantum equations we have described in this book were discovered. 

Although we have approached the subject along a completely different route, 
the great historical moment marking the birth of the quantum mechanical de- 
scription of matter occurred when Schrodinger first wrote down his equation in 
1926. For many years the internal atomic structure of matter had been a great 
mystery. No one had been able to understand what held matter together, why 
there was chemical binding, and especially how it could be that atoms could be 
stable. Although Bohr had been able to give a description of the internal motion 
of an electron in a hydrogen atom which seemed to explain the observed spectrum 
of light emitted by this atom, the reason that electrons moved in this way remained 
a mystery. Schrodinger’s discovery of the proper equations of motion for electrons 
on an atomic scale provided a theory from which atomic phenomena could be 
calculated quantitatively, accurately, and in detail. In principle, Schrodinger’s 
equation is capable of explaining all atomic phenomena except those involving 
magnetism and relativity. It explains the energy levels of an atom, and all the 
facts of chemical binding. This is, however, true only in principle — the mathe- 
matics soon becomes too complicated to solve exactly any but the simplest prob- 
lems. Only the hydrogen and helium atoms have been calculated to a high accuracy. 
However, with various approximations, some fairly sloppy, many of the facts of 
more complicated atoms and of the chemical binding of molecules can be under- 
stood. We have shown you some of these approximations in earlier chapters. 
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The Schrodinger equation as we have written it does not take into account 
any magnetic effects. It is possible to take such effects into account in an approxi- 
mate way by adding some more terms to the equation. However, as we have seen 
in Volume II, magnetism is essentially a relativistic effect, and so a correct de- 
scription of the motion of an electron in an arbitrary electromagnetic field can 
only be discussed in a proper relativistic equation. The correct relativistic equation 
for the motion of an electron was discovered by Dirac a year after Schrodinger 
brought forth his equation, and takes on quite a different form. We will not be 
able to discuss it at all here. 

Before we go on to look at some of the consequences of the Schrodinger 
equation, we would like to show you what it looks like for a system with a large 
number of particles. We will not be making any use of the equation, but just 
want to show it to you to emphasize that the wave function ^ is not simply an 
ordinary wave in space, but is a function of many variables. If there are many 
particles, the equation becomes 


;i. 1*2, r 3i . . .) ^ h 2 \d 2 yp d 2 yfr d 2 \p\ /1/fn 

~ lh at = + ^ + (ru n ’ ' ' ( } 


The potential function V is what corresponds classically to the total potential energy 
of all the particles. If there are no external forces acting on the particles, the 
function V is simply the electrostatic energy of interaction of all the particles. That 
is, if the ith particle carries the charge Z^ c , then the function V is simplyf 

V(r i> r 2, • • •) = X) g2 - (16.56) 

all Tii 

pairs 


16-6 Quantized energy levels 

In a later chapter we will look in detail at a solution of Schrodinger’s equation 
for a particular example. We would like now, however, to show you how one of 
the most remarkable consequence of Schrodinger’s equation comes about — namely, 
the surprising fact that a differential equation involving only continuous functions 
of continuous variables in space can give rise to quantum effects such as the 
discrete energy levels in an atom. The essential fact to understand is how it can be 
that an electron which is confined to a certain region of space by some kind of a 
potential “well” must necessarily have only one or another of a certain well- 
defined set of discrete energies. 


Fig. 16-3. A potential well for a 
particle moving along x. 



Suppose we think of an electron in a one-dimensional situation in which its 
potential energy varies with * in a way described by the graph in Fig. 16-3. We 
will assume that this potential is static — it doesn’t vary with time. As we have done 
so many times before, we would like to look for solutions corresponding to states 
of definite energy, which means, of definite frequency. Let’s try a solution of the 
form 

t = a(x)e~ iEtlh , (16.57) 


t We are using the convention of the earlier volumes according to which e 2 = qH 47reo. 
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If we substitute this function into the Schrodinger equation, we find that the 
function a(x) must satisfy the following differential equation: 

x [V(x) - £]a(x). (16.58) 

This equation says that at each a the second derivative of a(x) with respect to a: 
is proportional to 42(a), the coefficient of proportionality being given by the quan- 
tity (V ~ E). The second derivative of a(x ) is the rate of change of its slope. If 
the potential V is greater than the energy E of the particle, the rate of change of 
the slope of 42(a) will have the same sign as a(x ). That means that the curve of 
o(a) will be concave away from the axis. That is, it will have, more or less, the 
character of the positive or negative exponential function, This means that 
in the region to the left of Ai, in Fig. 16-3, where V is greater than the assumed 
energy E, the function a( a) would have to look like one or another of the curves 



v> e v< e 

(a) (b) 


Fig. 16-4. Possible shapes of the 
wave function a(x) for V > E and for 
V < E. 



Fig. 1 6-5. A wave function for the 
energy E a which goes to zero for nega- 
tive x. 


shown in part (a) of Fig. 16-4. 

If, on the other hand, the potential function V is less than the energy £, the 
second derivative of a( x) with respect to a has the opposite sign from 42(a) 
itself, and the curve of 42(a) will always be concave toward the axis like one of the 
pieces shown in part (b) of Fig. 16-4. The solution in such a region has, piece-by- 
piece, roughly the form of a sinusoidal curve. 

Now let’s see if we can construct graphically a solution for the function a(x) 
which corresponds to a particle of energy E a in the potential V shown in Fig. 
16-3. Since we are trying to describe a situation in which a particle is bound 
inside the potential well, we want to look for solutions in which the wave amplitude 
takes on very small values when a is way outside the potential well. We can easily 
imagine a curve like the one shown in Fig. 16-5 which tends toward zero for large 
negative values of a, and grows smoothly as it approaches a 2 . Since Kis equal to 
E a at Ai, the curvature of the function becomes zero at this point. Between Xi 
and a 2 , the quantity V — E a is always a negative number, so the function 42(a) 
is always concave toward the axis, and the curvature is larger the larger the differ- 
ence between E a and V. If we continue the curve into the region between Aj and 
a 2 , it should go more or less as shown in Fig. 16-5. 

Now let’s contiriue this curve into the region to the right of a 2 . There it 
curves away from the axis and takes off toward large positive values, as drawn in 
Fig. 16-6. For the energy E a we have chosen, the solution for 42(a) gets larger and 
larger with increasing a. In fact, its curvature is also increasing (if the potential 
continues to stay flat). The amplitude rapidly grows to immense proportions. 



Fig. 1 6-6. The wave function a(x) of 
Fig. 1 6-5 continued beyond x 2 . 
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V 


Fig. 16-7. The wave function a(x) 
for an energy E b greater than E a . 


a(x)f j 

V>E C ! V < E c | V > E c 



X, x 2 

Fig. 1 6-8. A wave function for the 
energy E c between E a and E b . 




What does this mean? It simply means that the particle is not “bound 1 ' in the 
potential well. It is infinitely more likely to be found outside of the well, than 
inside. For the solution we have manufactured, the electron is more likely to be 
found at x = + go than anywhere else. We have failed to find a solution for a 
bound particle. 

Let’s try another energy, say one a little bit higher than E a — say the energy 
E h in Fig. 16-7. If we start with the same conditions on the left, we get the solution 
drawn in the lower half of Fig. 16-7. It looked at first as though it were going to 
be better, but it ends up just as bad as the solution for E a — except that now a(x) 
is getting more and more negative as we go toward large values of ;c. 

Maybe that’s the clue. Since changing the energy a little bit from E a to E b 
causes the curve to flip from one side of the axis to the other, perhaps there is 
some energy lying between E a and E b for which the curve will approach zero for 
large values of x. There is, indeed, and we have sketched how the solution might 
look in Fig. 16-8. 

You should appreciate that the solution we have drawn in the figure is a 
very special one. If we were to raise or lower the energy ever so slightly, the func- 
tion would go over into curves like one or the other of the two broken-line curves 
shown in Fig. 16-8, and we would not have the proper conditions for a bound 
particle. We have obtained a result that if a particle is to be bound in a potential 
well, it can do so only if it has a very definite energy. 

Does that mean that there is only one energy for a particle bound in a po- 
tential well? No. Other energies are possible, but not energies too close to E c . 
Notice that the wave function we have drawn in Fig. 16-8 crosses the axis four 
times in the region between x x and x 2 . If we were to pick an energy quite a bit 
lower than E cy we could have a solution which crosses the axis only three times, 
only two times, only once, or not at all. The possible solutions are sketched in 
Fig. 16-9. (There may also be other solutions corresponding to values of the 
energy higher than the ones shown.) Our conclusion is that if a particle is bound 
in a potential well, its energy can take on only the certain special values in a discrete 
energy spectrum. You see how a differential equation can describe the basic fact 
of quantum physics. 

We might remark one other thing. If the energy E is above the top of the 
potential well, then there are no longer any discrete solutions, and any possible 
energy is permitted. Such solutions correspond to the scattering of free particles 
by a potential well. We have seen an example of such solutions when we considered 
the effects of impurity atoms in a crystal. 

Fig. 1 6-9. The function a(x) for the five lowest energy bound states. 
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17 


Symmetry and Conservation Caws 


17-1 Symmetry 


In classical physics there are a number of quantities which are conserved — 
such a momentum, energy, and angular momentum. Conservation theorems 
about corresponding quantities also exist in quantum mechanics. The most beau- 
tiful thing of quantum mechanics is that the conservation theorems can, in a 
sense, be derived from something else, whereas in classical mechanics they are 
practically the starting points of the laws. (There are ways in classical mechanics 
to do an analogous thing to what we will do in quantum mechanics, but it can be 
done only at a very advanced level.) In quantum mechanics, however, the conserva- 
tion laws are very deeply related to the principle of superposition of amplitudes, 
and to the symmetry of physical systems under various changes. This is the subject 
of the present chapter. Although we will apply these ideas mostly to the conserva- 
tion of angular momentum, the essential point is that the theorems about the 
conservation of all kinds of quantities are — in the quantum mechanics — related to 
the symmetries of the system. 

We begin, therefore, by studying the question of symmetries of systems. A 
very simple example is the hydrogen molecular ion — we could equally well take the 
ammonia molecule — in which there are two states. For the hydrogen molecular 
ion we took as our base states one in which the electron was located near proton 
number 1, and another in which the electron was located near proton number 2. 
The two states — which we called | 7) and | 2) — are shown again in Fig. 17-1 (a). 
Now, so long as the two nuclei are both exactly the same, then there is a certain 
symmetry in this physical system. That is to say, if we were to reflect the system 
in the plane halfway between the two protons — by which we mean that everything 
on one side of the plane gets moved to the symmetric position on the other side — 
we would get the situations in Fig. 17- 1(b). Since the protons are identical, the 
operation of reflection changes | 7) into | 2) and | 2) into | 7). We’ll call this reflec- 
tion operation P and write 

P\l)=\2), P\2)=\l). (17.1) 

So our P is an operator in the sense that it “ does something” to a state to make a 
new state. The interesting thing is that P operating on any state produces some 
other state of the system. 

Now P , like any of the other operators we have described, has matrix elements 
which can be defined by the usual obvious notation. Namely, 

= </ I ^ I /) and P 12 = {1 \ P\ 2) 

are the matrix elements we get if we multiply P \ 1) and P 2) on the left by (1 |. 
From Eq. (17.1) they are 


(1 \P\1) = P X1 = (1\2) = 0, 
(1\P\2) = P 12 = (1\1) = 1. 


In the same way we can get P 2 1 
base system | 7) and | 2) — is 


and P 22 ' The matrix of P — with respect to the 

P -(°, o)- < 17 ' 3 > 


We see once again that the words operator and matrix in quantum mechanics are 
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Fig. 17-1. If the states | 1 ) and | 2) 
are reflected in the plane P-P , they go 
into | 2) and | J ), respectively. 
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Fig. 17-2, In a symmetric system, if a pure | 7) state develops as shown in 
part (a), a pure [ 2) state will develop as in part (b). 


practically interchangeable. There are slight technical differences — like the differ- 
ence between a “numeral” and a “number” — but the distinction is something 
pedantic that we don’t have to worry about. So whether P defines an operation, 
or is actually used to define a matrix of numbers, we will call it interchangeably 
an operator or a matrix. 

Now we would like to point out something. We will suppose that the physics 
of the whole hydrogen molecular ion system is symmetrical It doesn’t have to be 
— it depends, for instance, on what else is near it. But if the system is symmetrical, 
the following idea should certainly be true. Suppose we start at / = 0 with the 
system in the state | 1) and find after an interval of time t that the system turns 
out to be in a more complicated situation — in some linear combination of the two 
base states. Remember that in Chapter 8 we used to represent “going for a 
period of time” by multiplying by the operator U . That means that the system 
would after a while — say 15 seconds to be definite — be in some other state. For 
example, it might be x/2/3 parts of the state | 1) and i\J 1 /3 parts of the state | 2), 
and we would write 

| i at 15 sec) = 0(15, 0) j 1) = V573 1 1) + iVT/3 | 2). (17.4) 

Now we ask what happens if we start the system in the symmetric state | 2) and 
wait for 15 seconds under the same conditions ? It is clear that if the world is 
symmetric — as we are supposing — we should get the state symmetric to (17.4): 

at 15 sec) = 0(15, 0) | 2) = V2/3 | 2) + iVT/3 \ 1). (17.5) 

The same ideas are sketched diagrammatically in Fig. 17-2. So if the physics of a 
system is symmetrical with respect to some plane, and we work out the behavior 
of a particular state, we also know the behavior of the state we would get by 
reflecting the original state in the symmetry plane. 

We would like to say the same things a litte bit more generally — which means 
a little more abstractly. Let Q be any one of a number of operations that you 
could perform on a system without changing the physics. For instance, for Q we 
might be thinking of P, the operation of a reflection in the plane between the two 
atoms in the hydrogen molecule. Or, in a system with two electrons, we might be 
thinking of the operation of interchanging the two electrons. Another possibility 
would be, in a spherically symmetric system, the operation of a rotation of the 
whole system through a finite angle around some axis — which wouldn’t change 
the physics. Of course, we would normally want to give each special case some 
special notation for Q. Specifically, we will normally define the P y (d) to be the 
operation “rotate the system about the >>-axis by the angle d ”, By Q we mean 
just any one of the operators we have described or any other one — which leaves 
the basic physical situation unchanged. 

Let’s think of some more examples. If we have an atom with no external 
magnetic field or no external electric field, and if we were to turn the coordinates 
around any axis, it would be the same physical system. Again, the ammonia 
molecule is symmetrical with respect to a reflection in a plane parallel to that of 
the three hydrogens — so long as there is no electric field. When there is an electric 
field, when we make a reflection we would have to change the electric field also, 
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and that changes the physical problem. But if we have no external field, the 
molecule is symmetrical. 

Now we consider a general situation. Suppose we start with the state | 
and after some time or other under giv£n physical conditions it has become the 
state |^ 2 )- We can write 

1 * 2 ) = 0\* 1 ). (17.6) 

[You can be thinking of Eq. (17.4).] Now imagine we perform the operation Q 
on the whole system. The state | ^ 1 ) will be transformed to a state | \p[), which 
we can also write as Q | ^ 1 ). Also the state | ^ 2 ) is changed into | ^ 2 ) “ Q I ^ 2 )* 
Now if the physics is symmetrical under Q (don’t forget the if ; it is not a general 
property of systems), then, waiting for the same time under the same conditions, 
we should have 

(17.7) 

[Like Eq. (17.5).] But we can write Q | for | $[) and Q | ^ 2 ) for I ^ 2 ) so (17.7) 
can also be written 

Gl* 2 > = UQ |^i). (17.8) 

If we now replace | \p 2 ) by U | — Eq. (17.6) — we get 

QU |^i> = UQ |^i). (17.9) 

It's not hard to understand what this means. Thinking of the hydrogen ion it 
says that: “making a reflection and waiting a while” — the expression on the 
right of Eq. (17.9) — is the same as “waiting a while and then making a reflection” — 
the expression on the left of (17.9). These should be the same so long as U doesn’t 
change under the reflection. 

Since (17.9) is true for any starting state | ^ 1 ), it is really an equation about 

the operators : ^ ^ 

QU = UQ. (17.10) 

This is what we wanted to get — it is a mathematical statement of symmetry . When 
Eq. (17.10) is true, we say that the operators JJ and Q commute. We can then 
define “symmetry” in the following way: A physical system is symmetric with 
respect to the operation Q when Q commutes with U , the operation of the passage 
of time. [In terms of matrices, the product of two operators is equivalent to the 
matrix product, so Eq. (17.10) also holds for the matrices Q and U for a system 
which is symmetric under the transformation Q.] 

Incidentally, since for infinitesimal times e we have U — 1 — iHe/h — where 
H is the usual Hamiltonian (see Chapter 8) — you can see that if (17.10) is true, 
it is also true that 

QH = HQ. (17.11) 

So (17.11) is the mathematical statement of the condition for the symmetry of a 
physical situation under the operator Q . It defines a symmetry. 


17-2 Symmetry and conservation 

Before applying the result we have just found, we would like to discuss the 
idea of symmetry a little more. Suppose that we have a very special situation: 
after we operate on a state with Q , we get the same state. This is a very special case, 
but let’s suppose it happens to be true for a state | ^ 0 ) that | \p') = Q \ \[/ 0 ) is 
physically the same state as | ^ 0 )* That means that | \p f ) is equal to | ^ 0 ) except 
for some phase factor.f How can that happen? For instance, suppose that we 


t Incidentally, you can show that Oris necessarily a unitary operator — which means 
that if it operates on | $) to give some number times 1 1 ^), the number must be of the form 
e lh y where 8 is real. It’s a small point, and the proof rests on the following observation. 
Any operation like a reflection or a rotation doesn’t lose any particles, so the normaliza- 
tion of 1 1 p') and | \p) must be the same; they can only differ by a pure imaginary phase 
factor. 
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Fig. 17-3. The state | /) and the 
state P | /) obtained by reflecting | / ) in 
the central plane. 


have an ion in the state which we once called | I). For this state there is equal 
amplitude to be in the base states | 1) and | 2). The probabilities are shown as a 
bar graph in Fig. 17-3 (a). If we operate on 1 1) with the reflection operator P, it 
flips the state over changing | 7} to | 2) and 1 2) to [ 1 ) — we get the probabilities 
shown in Fig. 17-3(b). But that’s just the state 1 1) all over again. If we start with 
state | II) the probabilities before and after reflection look just the same. However, 
there is a difference if we look at the amplitudes. For the state 1 1) the amplitudes 
are the same after the reflection, but for the state | II) the amplitudes have the 
opposite sign. In other words, 


p\i) 


p 1 1 1) + 1 2 ) 

I V2 


I 2) + | /) _ | 
Vi 


p\n) 


*1 1 7 ) - \ 2 ) 
I V2 



I ID- 


(17.12) 


If we write P \ t 0 ) — e li \ we have that e li = 1 for the state 1 1) and e lS = — 1 
for the state j II). 

Let’s look at another example. Suppose we have a RHC polarized photon 
propagating in the z-direction. If we do the operation of a rotation around the 
z-axis, we know that this just multiplies the amplitude by e l<t> when <t> is the angle 
of the rotation. So for the rotation operation in this case, 5 is just equal to the 
angle of rotation. 

Now it is clear that if it happens to be true that an operator Q just changes the 
phase of a state at some time, say t = 0, it is true forever. In other words, if the 
state [ goes over into the state | \p 2 ) after a time or 


£/a0)|^> = |* 2 > (17.13) 

and if the symmetry of the situation makes it so that 

Q |*i> = « ,4 |* 1 >, (17.14) 

then it is also true that 

§. |* 2 > = I * 2 >. (17-15) 

This is clear, since 

0l*a> = QO\+ 1) = uq |f a ), 

and if £ 1 1 1 ) = | ^j), then 

Q I * 2 > = Ue' s I *,) = e tS U I 4>i) = e’ 5 1 4>i). 


[The sequence of equalities follows from (17.13) and (17.10) for a symmetrical 
system, from (17.14), and from the fact that a number like e l6 commutes with an 
operator.] 

So with certain symmetries something which is true initially is true for all 
times. But isn’t that just a conservation lawl Yes’ It says that if you look at the 
original state and by making a little computation on the side discover that an 
operation which is a symmetry operation of the system produces only a multiplica- 
tion by a certain phase, then you know that the same property will be true of the 
final state — the same operation multiplies the final state by the same phase factor. 
This is always true even though we may not know anything else about the inner 
mechanism of the universe which changes a system from the initial to the final 
state. Even if we do not care to look at the details of the machinery by which the 
system gets from one state to another, we can still say that if a thing is in a state 
with a certain symmetry character originally, and if the Hamiltonian for this thing 
is symmetrical under that symmetry operation, then the state will have the same 
symmetry character for all times. That’s the basis of all the conservation laws of 
quantum mechanics. 

Let’s look at a special example. Let’s go back to the P operator. We would 
like first to modify a little our definition of P. We want to take for P not just a 
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mirror reflection, because that requires defining the plane in which we put the 
mirror. There is a special kind of a reflection that doesn’t require the specification 
of a plane. Suppose we redefine the operation P this way: First you reflect in a 
mirror in the z-plane so that z goes to — z, x stays x , and y stays y\ then you turn 
the system 180° about the z-axis so that x is made to go to — x and y to —y. The 
whole thing is called an inversion. Every point is projected through the origin to the 
diametrically opposite position. All the coordinates of everything are reversed. 
We will still use the symbol P for this operation. It is shown in Fig. 17-4. It is a 
little more convenient than a simple reflection because it doesn't require that you 
specify which coordinate plane you used for the reflection — you need specify only 
the point which is at the center of symmetry. 



(e l8 ) 2 = 1. 


So if the inversion operator is a symmetry operation of a state, there are only two 
possibilities for 8: 

e' 5 = ± 1 , 

which means that 

p I *o> = I *o> or P I *o> = - I *o>. (17.17) 

Classically, if a state is symmetric under an inversion, the operation gives 
back the same state. In quantum mechanics, however, there are the two possibilities: 
we get the same state or minus the same state. When we get the same state, P | \p 0 ) — 

| ^ 0 )j we say that the state | ^ 0 ) has even parity. When the sign is reversed so that 
P | ^o) = — | ^o)j we say that the state has odd parity . (The inversion operator 
P is also known as the parity operator.) The state 1 I) of the ion has even parity ; 
and the state | II) has odd parity — see Eq. (17.12). There are, of course, states 
which are not symmetric under the operation P; these are states with no definite 
parity. For instance, in the system the state | /) has even parity, the state | II) 
has odd parity, and the state 1 1) has no definite parity. 

When we speak of an operation like inversion being performed “on a physical 
system” we can think about it in two ways. We can think of physically moving 
whatever is at r to the inverse point at — r, or we can think of looking at the same 
system from a new frame of reference y' 9 z' related to the old by x f = — x, 
y ( — —y 9 and z' = — z. Similarly, when we think of rotations, we can think of 
rotating bodily a physical system, or of rotating the coordinate frame with respect 
to which we measure the system, keeping the “system” fixed in space. Generally, 
the two points of view are essentially equivalent. For rotation they are equivalent 
except that rotating a system by the angle 6 is like rotating the reference frame by 
the negative of 0. In these lectures we have usually considered what happens when 
a projection is made into a new set of axes. What you get that way is the same as 
what you get if you leave the axes fixed and rotate the system backwards by the 
same amount. When you do that, the signs of the angles are reversed.f 



Fig. 17-4. The operation of inver- 
sion, P. Whatever is at the point A at 
(x, y, z) is moved to the point A' at 
(-x, -y, - z ). 


t In other books you may find formulas with different signs; they are probably using 
a different definition of the angles. 
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Many of the laws of physics — but not all — are unchanged by a reflection or an 
inversion of the coordinates. They are symmetric with respect to an inversion. 
The laws of electrodynamics, for instance, are unchanged if we change x to —x, 
y to —y } and z to — z in all the equations. The same is true for the laws of gravity, 
and for the strong interactions of nuclear physics. Only the weak interactions— 
responsible for /3-decay — do not have this symmetry. (We discussed this in some 
detail in Chapter 52, Vol. I.) We will for now leave out any consideration of the 
/3-decays. Then in any physical system where /3-decays are not expected to produce 
any appreciable effect — an example would be the emission of light by an atom— 
the Hamiltonian A and the operator P will commute. Under these circumstances 
we have the following proposition. If a state originally has even parity, and if you 
look at the physical situation at some later time, it will again have even parity. 
For instance, suppose an atom about to emit a photon is in a state known to have 
even parity. You look at the whole thing — including the photon — after the emis- 
sion; it will again have even parity (likewise if you start with odd parity). This 
principle is called the conservation of parity . You can see why the words “conserva- 
tion of parity” and “reflection symmetry” are closely intertwined in the quantum 
mechanics. Although until a few years ago it was thought that nature always 
conserved parity, it is now known that this is not true. It has been discovered to 
be false because the /3-decay reaction does not have the inversion symmetry which 
is found in the other laws of physics. 

Now we can prove an interesting theorem (which is true so long as we can 
disregard weak interactions): Any state of definite energy which is not degenerate 
must have a definite parity. It must have either even parity or odd parity. (Re- 
member that we have sometimes seen systems in which several states have the same 
energy — we say that such states are degenerate. Our theorem will not apply to 
them.) 

For a state | ff) of definite energy, we know that 

o> = *l*o>, (17.18) 

where E is just a number — the energy of the state. If we have any operator Q 
which is a symmetry operator of the system we can prove that 

Ql'f'o) = e ,s | v^o) (17.19) 

so long as | ^ 0 ) is a unique state of definite energy. Consider the new state | $>) 
that you get from operating with Q. If the physics is symmetric, then | must 
have the same energy as | ^ 0 )* But we have taken a situation in which there is 
only one state of that energy, namely | ^ 0 )> so | ^6) must be the same state — it 
can only differ by a phase. That’s the physical argument. 

The same thing comes out of our mathematics. Our definition of symmetry 
is Eq; (17.10) or Eq. (17.11) (good for any state f), 

~ QAW- (17.20) 

But we are considering only a state | ^ 0 ) which is a definite energy state, so that 
H | ^ 0 ) = E I ^o)- Since E is just a number that floats through Q if we want, 
we have 

= QE\+ o) = EQ |*o>- 
So 

#{6l*o» = £{<2 I*o». (17.21) 

So | ^o) = Q I to) is also a definite energy state of ft— and with the same E. 
But by our hypothesis, there is only one such state; it must be that | I i^q)- 

What we have just proved is true for any operator Q that is a symmetry opera- 
tor of the physical system. Therefore, in a situation in which we consider only 
electrical forces and strong interactions — and no /3-decay — so that inversion sym- 
metry is an allowed approximation, we have that P | \p) = e %h | ^). But we have 
also seen that e %h must be either + 1 or — 1 . So any state of a definite energy (which 
is not degenerate) has got either an even parity or an odd parity. 
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17-3 The conservation laws 


We turn now to another interesting example of an operation: a rotation. 
We consider the special case of an operator that rotates an atomic system by angle 
<f> around the z-axis. We will call this operatorf R z ((p). We are going to suppose 
that we have a physical situation where we have no influences lined up along the 
x- and j>-axes. Any electric field or magnetic field is taken to be parallel to the 
z-axisj so that there will be no change in the external conditions if we rotate the 
whole physical system about the z-axis. For example, if we have an atom in empty 
space and we turn the atom around the z-axis by an angle 0, we have the same 
physical system. 

Now then, there are special states which have the property that such an opera- 
tion produces a new state which is the original state multiplied by some phase 
factor. Let us make a quick side remark to show you that when this is true the 
phase change must always be proportional to the angle 0. Suppose that you would 
rotate twice by the angle 0. That's the same thing as rotating by the angle 20. If a 
rotation by 0 has the effect of multiplying the state | 0 O ) by a phase e lS so that 

M<t>) I to) = I to), 

two such rotations in succession would multiply the state by the factor (e lS ) 2 = 
e' 2S , since 

&(*)&(*) | to) = M<t>X S I to) = e lS fi,(4>) | to) = e'*e li \ to). 

The phase change 8 must be proportional to 0.H We are considering then those 
special states | 0 O ) for which 


*.(*) I to) = e "* | to), (17.22) 

where m is some real number. 

We also know the remarkable fact that if the system is symmetrical for a rota- 
tion around z and if the original state happens to have the property that (17.22) 
is true, then it will also have the same property later on. So this number m is a 
very important one. If we know its value initially, we know its value at the end of 
the game. It is a number which is conserved — m is a constant of the motion . The 
reason that we pull out m is because it hasn’t anything to do with any special angle 
0, and also because it corresponds to something in classical mechanics. In quantum 
mechanics we choose to call mh — for such states as | 0 O ) — the angular momentum 
about the z-axis. If we do that we find that in the limit of large systems the same 
quantity is equal to the z-component of the angular momentum of classical me- 
chanics. So if we have a state for which a rotation about the z-axis just produces 
a phase factor e then we have a state of definite angular momentum about that 
axis — and the angular momentum is conserved. It is mh now and forever. Of 
course, you can rotate about any axis, and you get the conservation of angular 
momentum fof* the various axes. You see that the conservation of angular 
momentum is related to the fact that when you turn a system you get the same 
state with only a new phase factor. 

We would like to show you how general this idea is. We will apply it to two 
other conservation laws which have exact correspondence in the physical ideas 
to the conservation of angular momentum. In classical physics we also have 
conservation of momentum and conservation of energy, and it is interesting to 
see that both of these are related in the same way to some physical symmetry. 


t Very precisely, we will define R z (4>) as a rotation of the physical system by —0 about 
the z-axis, which is the same as rotating the coordinate frame by +0. 

t We can always choose z along the direction of the field provided there is only one 
field at a time, and its direction doesn’t change. 

II For a fancier proof we should make this argument for small rotations e Since any 
angle 0 is the sum of a suitable n number of these, 0 — ne, R z (<f>) = [/? £ (t)] n and the total 
phase change is n times that for the small angle e, and is, therefore, proportional to 0 
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Suppose that we have a physical system — an atom, some complicated nucleus, 
or a molecule, or something — and it doesn’t make any difference if we take the 
whole system and move it over to a different place. So we have a Hamiltonian 
which has the property that it depends only on the internal coordinates in some 
sense, and does not depend on the absolute position in space. Under those cir- 
cumstances there is a special symmetry operation we can perform which is a 
translation in space. Let’s define 3 x (a) as the operation of a displacement by the 
distance a along the x-axis. Then for any state we can make this operation and 
get a new state. But again there can be very special states which have the property 
that when you displace them by a along the x-axis you get the same state except 
for a phase factor. It’s also possible to prove, just as we did above, that when this 
happens, the phase must be proportional to a. So we can write for these special 
states | ^o) 

4(a) | *o> = e lka I + o>- (17.23) 

The coefficient k , when multiplied by h , is called the x-component of the momentum. 
And the reason it is called that is that this number is numerically equal to the 
classical momentum p x when we have a large system. The general statement is 
this: If the Hamiltonian is unchanged when the system is displaced, and if the 
state starts with a definite momentum in the x-direction, then the momentum in 
the x-direction will remain the same as time goes on. The total momentum of a 
system before and after collisions— or after explosions or what not — will be the 
same. 

There is another operation that is quite analogous to the displacement in 
space: a delay in time. Suppose that we have a physical situation where there is 
nothing external that depends on time, and we start something off at a certain 
moment in a given state and let it roll. Now if we were to start the same thing 
off again (in another experiment) two seconds later — or/say, delayed by a time 
r — and if nothing in the external conditions depends on the absolute time, the 
development would be the same and the final state would be the same as the 
other final state, except that it will get there later by the time r. Under those 
circumstances we can also find special states which have the property that the 
development in time has the special characteristic that the delayed state is just 
the old, multiplied by a phase factor. Once more it is clear that for these special 
states Ihe phase change must be proportional to r. We can write 

4(r) I *„> = | * 0 >. (17-24) 

It is conventional to use the negative sign in defining co; with this convention 
c oh is the energy of the system, and it is conserved . So a system of definite energy is 
one which when displaced r in time reproduces itself multiplied by e~ l03T . (That’s 
what we have said before when we defined a quantum state of definite energy, so 
we’re consistent with ourselves.) It means that if a system is in a state of definite 
energy, and if the Hamiltonian doesn’t depend on t, then no matter what goes on, 
the system will have the same energy at all later times. 

You see, therefore, the relation between the conservation laws and the sym- 
metry of the world. Symmetry with respect to displacements in time implies the 
conservation of energy; symmetry with respect to position in x, y, or z implies 
the conservation of that component of momentum. Symmetry with respect to 
rotations around the x-, y-, and z-axes implies the conservation of the x-, y-, and 
z-components of angular momentum. Symmetry with respect to reflection implies 
the conservation of parity. Symmetry with respect to the interchange of two elec- 
trons implies the conservation of something we don’t have a name for, and so on. 
Some of these principles have classical analogs and others do not. There are more 
conservation laws in quantum mechanics than are useful m classical mechanics — 
or, at least, than are usually made use of. 

In order that you will be able to read other books on quantum mechanics, 
we must make a small technical aside — to describe the notation that people use. 
The operation of a displacement with respect to time is, of course, just the opera- 
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tion U that we talked about before: 

3 t (r) = U (/ + r, / ). (17.25) 

Most people like to discuss everything in terms of infinitesimal displacements in 
time, or in terms of infinitesimal displacements in space, or in terms of rotations 
through infinitesimal angles. Since any finite displacement or angle can be ac- 
cumulated by a succession of infinitesimal displacements or angles, it is often easier 
to analyze first the infinitesimal case. The operator of an infinitesimal displacement 
At in time is — as we have defined it in Chapter 8 — 


/)i(A0 = 1 - £ AtH. (17.26) 

Then H is analogous to the classical quantity we call energy, because if H \$) 
happens to be a constant times | \p) namely, H\$) = E\$) 9 then that constant 
is the energy of the system. 

The same thing is done for the other operations. If we make a small displace- 
ment in x, say by the amount Ax , a state | \p) will, in general , go over into some other 
state | We can write 

| *') = l\(Ax) I *> = (l + {px Ax) | *>, (17.27) 

since as Ax goes to zero, the | \p') should become just | f) or J5 X (0) = 1, and for 
small Ax the change of J$ x (Ax) from 1 should be proportional to Ax. Defined this 
way, the operator p x is called the momentum operator — for the ^-component, of 
course. 

For identical reasons, people usually write for small rotations 

k z {A<t>) | *> = (l + 1 1 A*) | *> (17.28) 

and call J z the operator of the z-component of angular momentum. For those 
special states for which ^(0) | $ 0 ) = I ^o)> we can for any small angle — say 
A(j >— expand the right-hand side to first order in A<j> and get 

kz(A4>) = e im ^ | * 0 > = (1 + imbfi | *„>• 

Comparing this with the definition of J„ in Eq. (17.28), we get that 


1 1 *„> = mh | * 0 >. (17.29) 

In other words, if you operate with J, on a state with a definite angular momentum 
about the z-axis, you get mh times the same state, where mh is the amount of 
z-component of angular momentum. It is quite analogous to operating on a 
definite energy state with H to get E \ \p). 

We would now like to make some applications of the ideas of the conservation 
of angular momentum — to show you how they work. The point is that they are 
really very simple. You knew before that angular momentum is conserved. The 
only thing you really have to remember from this chapter is that if a state | \p 0 ) 
has the property that upon a rotation through an angle <f> about the z-axis, it be- 
comes e im4> | ^o); it has a z-component of angular momentum equal to mh. That’s 
all we will need to do a number of interesting things. 


17-4 Polarized light 

First of all we would like to check on one idea. In Section 1 1 -4 we showed 
that when RHC polarized light is viewed in a frame rotated by the angle <f> about 
the z-axisf it gets multiplied by e %{ Does that mean then that the photons of light 


t Sorry! This angle is the negative of the one we used in Section 11-4. 
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that are right circularly polarized carry an angular momentum of one unitf along 
the z-axis? Indeed it does. It also means that if we have a beam of light containing 
a large number of photons all circularly polarized the same way — as we would 
have in a classical beam — it will carry angular momentum. If the total energy 
carried by the beam in a certain time is W, then there are N = W/fua photons. Each 
one carries the angular momentum h, so there is a total angular momentum of 

J z = Nh = —■ (17.30) 

CO 



Fig. 17-5. (a) The electric field £ 

in a circularly polarized light wave, (b) 
The motion of an electron being driven 
by the circularly polarized light. 


Can we prove classically that light which is right circularly polarized carries 
an energy and angular momentum in proportion to W/co? That should be a classical 
proposition if everything is right. Here we have a case where we can go from the 
quantum thing to the classical thing. We should see if the classical physics checks. 
It will give us an idea whether we have a right to call m the angular momentum. 
Remember what right circularly polarized light is, classically. It’s described by 
an electric field with an oscillating ^-component and an oscillating ^-component 
90° out of phase so that the resultant electric vector £ goes m a circle — as drawn in 
Fig. 17-5 (a). Now suppose that such light shines on a wall which is going to 
absorb it — or at least some of it — and consider an atom in the wall according to 
the classical physics. We have often described the motion of the electron in the 
atom as a harmonic oscillator which can be driven into oscillation by an external 
electric field. We’ll suppose that the atom is isotropic, so that it can oscillate 
equally well in the x- or y-directions. Then in the circularly polarized light, the 
jc-displacement and the ^-displacement are the same, but one is 90° behind the 
other. The net result is that the electron moves in a circle, as shown in Fig. 17— 5(b). 
The electron is displaced at some displacement r from its equilibrium position at the 
origin and goes around with some phase lag with respect to the vector £. The 
relation between £ and r might be as shown in Fig. 17— 5(b). As time goes on, the 
electric field rotates and the displacement rotates with the same frequency, so 
their relative orientation stays the same. Now let’s look at the work being done 
on this electron. The rate that energy is being put into this electron is v , its velocity, 
times the component of qZ parallel to the velocity: 

d J = *6* (17.31) 

But look, there is angular momentum being poured into this electron, because 
there is always a torque about the origin. The torque is qZ t r, which must be 
equal to the rate of change of angular momentum djjdv. 


dJ z 

dt 


= q£ t r. 


(17.32) 


Remembering that v = cor, we have that 


dJz = l 
dW w’ 


Therefore, if we integrate the total angular momentum which is absorbed, it is 
proportional to the total energy — the constant of proportionality being 1/co, 
which agrees with Eq. (17.30). Light does carry angular momentum — 1 unit 
(times h) if it is right circularly polarized along the z-axis, and — 1 unit along the 
z-axis if it is left circularly polarized. 

Now let’s ask the following question: If light is linearly polarized in the 
^-direction, what is its angular momentum? Light polarized in the x-direction 
can be represented as the superposition of RHC and LHC polarized light. There- 
fore, there is a certain amplitude that the angular momentum is +h and another 


t It is usually very convenient to measure angular momentum of atomic systems in 
units of h. Then you can say that a spin one-half particle has angular momentum ±1/2 
with respect to any axis. Or, in general, that the z-component of angular momentum 
is m. You don’t need to repeat the h all the time. 
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amplitude that the angular momentum is ~h 3 so it doesn’t have a definite angular 
momentum. It has an amplitude to appear with +h and an equal amplitude to 
appear with —h. The interference of these two amplitudes produces the linear 
polarization, but it has equal probabilities to appear with plus or minus one unit 
of angular momentum. Macroscopic measurements made on a beam of linearly 
polarized light will show that it carries zero angular momentum, because in a large 
number of photons there are nearly equal numbers of RHC and LHC photons 
contributing opposite amounts of angular momentum — the average angular 
momentum is zero. And in the classical theory you don’t find the angular mo- 
mentum unless there is some circular polarization. 

We have said that any spin-one particle can have three values of J Z3 namely 
+ 1,0, — 1 (the three states we saw in the Stern-Gerlach experiment). But light is 
screwy; it has only two states. It does not have the zero case. This strange lack 
is related to the fact that light cannot stand still. For a particle of spin j which is 
standing still, there must be the 2 j + 1 possible states with values of j z going in 
steps of 1 from — j to +/. But it turns out that for something of spin j with zero 
mass only the states with the components +j and — j along the direction of motion 
exist. For example, light does not have three states, but only two — although a 
photon is still an object of spin one. How is this consistent with our earlier proofs — 
based on what happens under rotations in space — that for spin-one particles three 
states are necessary 9 For a particle at rest, rotations can be made about any 
axis without changing the momentum state. Particles with zero rest mass (like 
photons and neutrinos) cannot be at rest; only rotations about the axis along the 
direction of motion do not change the momentum state. Arguments about rota- 
tions around one axis only are insufficient to prove that three states are required, 
given that onb of them varies as e H under rotations by the angle 

One further side remark. For a zero rest mass particle, in general, only one 
of the two spin states with respect to the line of motion (+/, —j) is really necessary. 
For neutrinos — which are spin one-half particles; — only the states with the com- 
ponent of angular momentum opposite to the direction of motion (— &/2) exist 
in nature [and only along the motion ( +h/2 ) for antineutrinos]. When a system has 
inversion symmetry (so that parity is conserved, as it is for light) both components 
(+/, and —j) are required. 


17-5 The disintegration of the A 0 


Now we want to give an example of how we use the theorem of conservation 
of angular momentum in a specifically quantum physical problem. We look at 
break-up of the lambda particle (A 0 ), which disintegrates into a proton and a 7r“ 
meson by a “weak” interaction: 


A° — » p + 7T . 

Assume we know that the pion has spin zero, that the proton has spin one-half, 
and that the A 0 has spin one-half. We would like to solve the following problem: 
Suppose that a A 0 were to be produced in a way that caused it to be completely 
polarized — by which we mean that its spin is, say “up,” with respect to some suit- 
ably chosen z-axis — see Fig. 17-6(a). The question is, with what probability will it 
disintegrate so that the proton goes off at an angLe 6 with respect to the z-axis — as 
in Fig. 17— 6(b)? In other words, what is the angular distribution of the disintegra- 
tions? We will look at the disintegration in the coordinate system in which the 
A 0 is at rest — we will measure the angles in this rest frame; then they can always 
be transformed to another frame if we want. 


before 


AFTER 



Fig. 17-6. A A 0 with spin ‘‘up” 
decays into a proton and a pion (in the 
CM system). What is the probability that 
the proton will go off at the angle 0? 


t We have tried to find at least a proof that the component of angular momentum 
along the direction of motion must for a zero mass particle be an integral multiple of 
h/2 — and not something like h/ 3. Even using all sorts of properties of the Lorentz 
transformation and what not, we failed. Maybe it’s not true. We’ll have to talk about 
it with Prof. Wigner, who knows all about such things. 
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Fig. 17-7. Two possibilities for the 
decay of a spin “up” A 0 with the proton 
going along the +z-axis. Only (b) 
conserves angular momentum. 
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NO YES 

(a) (b) (c) 

Fig. 17-8. The decay along The 
z-axis for a A 0 with spin “down.” 


We begin by looking at the special circumstance in which the proton is emitted 
into a small solid angle Aft along the z-axis (Fig. 17-7). Before the disintegration 
we have a A 0 with its spm “up,” as in part (a) of the figure. After a short time — for 
reasons unknown to this day, except that they are connected with the weak decays — 
the A 0 explodes into a proton and a pion. Suppose the proton goes up along the 
+z-axis. Then, from the conservation of momentum, the pion must go down. 
Since the proton is a spin one-half particle, its spin must be either “up” or “down” — 
there are, in principle, the two possibilities shown in parts (b) and (c) of the figure. 
The conservation of angular momentum, however, requires that the proton have 
spin “up.” This is most easily seen from the following argument. A particle moving 
along the z-axis cannot contribute any angular momentum about this axis by virtue 
of its motion; therefore, only the spins can contribute to J z . The spin angular 
momentum about the z-axis is +h/2 before the disintegration, so it must also be 
+h/2 afterward. We can say that since the pion has no spin, the proton spin 
must be “up.” 

If you are worried that arguments of this kind may not be valid in quantum 
mechanics, we can take a moment to show you that they are. The initial state 
(before the disintegration), which we can call | A 0 , spin +z) has the property that 
if it is rotated about the z-axis by the angle <j>, the state vector gets multiplied by 
the phase factor e %4>12 . (In the rotated system the state vector is e l<b! 2 | A 0 , spin +z).) 
That’s what we mean by spin “up” for a spin one-half particle. Since nature’s 
behavior doesn’t depend on our choice of axes, the final state (the proton plus 
pion) must have the same property. We could write the final state as, say, 

| proton going +z, spin 4-z; pion going — z). 

But we really do not need to specify the pion motion, since in the frame we have 
chosen the pion always moves opposite the proton; we can simplify our description 
of the final state to 

| proton going +z, spin +z). 

Now what happens to this state vector if we rotate the coordinates about the 
z-axis by the angle 4>? 

Since the proton and pion are moving along the z-axis, their motion isn’t 
changed by the rotation. (That’s why we picked this special case; we couldn’t 
make the argument otherwise.) Also, nothing happens to the pion, because it is 
spm zero. The proton, however, has spin one-half. If its spin is “up” it will con- 
tribute a phase change of e l<t>12 in response to the rotation. (If its spin were 
“down” the phase change due to the proton would be 2 .) But the phase change 
with rotation before and after the excitement must be the same if angular mo- 
mentum is to be conserved. (And it will be, since there are no outside influences in 
the Hamiltonian.) So the only possibility is that the proton spin will be “up.” 
If the proton goes up, its spin must also be “up.” 

We conclude, then, that the conservation of angular momentum permits the 
process shown in part (b) of Fig. 17-7, but does not permit the process shown in 
part (c). Since we know that the disintegration occurs, there is some amplitude 
for process (b) — proton going up with spin “up.” We’ll let a stand for the amplitude 
that the disintegration occurs in this way in any infinitesimal interval of time.f 

Now let’s see what would happen if the A 0 spm were initially “down.” Again 
we ask about the decays in which the proton goes up along the z-axis, as shown in 
Fig. 17-8. You will appreciate that in this case the proton must have spin “down” 
if angular momentum is conserved. Let’s say that the amplitude for such a dis- 
integration is b. 

We can’t say anything more about the two amplitudes a and b. They depend 
on the inner machinery of A 0 , and the weak decays, and nobody yet knows how to 

| We are now assuming that the machinery of the quantum mechanics is sufficiently 
familiar to you that we can speak about things in a physical way without taking the time 
to write down all the mathematical details. In case what we are saying here is not clear 
to you, we have put some of the missing details in a note at the end of the section. 

17-12 



calculate them. We’ll have to get them from experiment. But with just these 
two amplitudes we can find out all we want to know about the angular distribution 
of the disintegration. We only have to be careful always to define completely the 
states we are talking about. 

We want to know the probability that the proton will go off at the angle 6 
with respect to the z-axis (into a small solid angle A ft) as drawn in Fig. 17-6. 
Let’s put a new z-axis in this direction and call it the z'-axis. We know how to 
analyze what happens along this axis. With respect to this new axis, the A° no 
longer has its spin “up,” but has a certain amplitude to have its spin “up” and 
another amplitude to have its spin “down.” We have already worked these out 
in Chapter 6, and again in Chapter 10, Eq. (10.30). The amplitude to be spin 
“up” is cos 0/2, and the amplitude to be spin “down” isf —sin 6/2. When the 
A 0 spin is “up” along the z'-axis it will emit a proton in the +z'-direction with the 
amplitude a. So the amplitude to find an “up”-spinning proton coming out along 
the z'-direction is 

a cos | • (17.33) 

Similarly, the amplitude to find a “down”-spinning proton coming along the posi- 
tive z'-axis is 


—6 sin ^ * (17.34) 

The two processes that these amplitudes refer to are shown in Fig. 17-9. 



Amplitude a cose/2 Amplitude -b cos 6/2 


Fig. 1 7-9. Two possible decay states for the A 0 . 

Let’s now ask the following easy question. If the A 0 has spin up along the 
z-axis, what is the probability that the decay proton will go off at the angle 6 ? 
The two spin states (“up” or “down” along z') are distinguishable even though 
we are not going to look at them. So to get the probability we square the amplitudes 
and add. The probability /(0) of finding a proton in a small solid angle AQ at 6 is 

f(d) = \a\ 2 cos 2 | + |ft| 2 sin 2 | • (17.35) 

Remembering that sin 2 9/2 = £(1 — cos 6) and that cos 2 6/2 = £(1 + cos 6), 
we can write f(B) as 

m - (Isliitf) + (tf_iiW!) c()s9 . (17 . 36) 


t We have chosen to let z' be in the xz-plane and use the matrix elements for R y {B). 
You would get the same answer for any other choice. 
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The angular distribution has the form 

f(8) = (3(\ + acosd). (17.37) 

The probability has one part that is independent of 8 and one part that varies 
linearly with cos 8. From measuring the angular distribution we can get a and /?, 
and therefore, \a\ and \b\. 

Now there are many other questions we can answer. Are we interested only 
in protons with spin “up” along the old z-axis? Each of the terms in (17-33) and 
(17-34) will give an amplitude to find a proton with spin “up” and with spin 
“down” with respect to the z'-axis (+z' and — z'). Spin “up” with respect to the 
old axis | +z) can be expressed in terms of the base states | +z' ) and | — z'). 
We can then combine the two amplitudes (17.33) and (17.34) with the proper 
coefficients (cos 8/2 and —sin 8/2) to get the total amplitude 

( 2 Q , , • 2 e\ 

la cos 2 + 0 sm 27 ’ 

Its square is the probability that the proton comes out at the angle 8 with its spin 
the same as the A 0 (“up” along the z-axis). 

If parity were conserved, we could say one more thing. The disintegration 
of Fig. 17-8 is just the reflection — in say, the y z-plane of the disintegration of 
Fig. 17-7.f If parity were conserved, b would have to be equal to a or to —a. 
Then the coefficient a of (17.37) would be zero, and the disintegration would be 
equally likely to occur in all directions. 

The experimental results show, however, that there is an asymmetry in the 
disintegration. The measured angular distribution does go as cos 8 as we predict — 
and not as cos 2 8 or any other power. In fact, since the angular distribution has 
this form, we can deduce from these measurements that the spin of the A 0 is 1/2. 
Also, we see that parity is not conserved. In fact, the coefficient a is found experi- 
mentally to be —0.62 ± 0.05, so b is about twice as large as a. The lack of sym- 
metry under a reflection is quite clear. 

You see how much we can get from the conservation of angular momentum. 
We will give some more examples in the next chapter. 


Parenthetical note. By the amplitude a in this section we mean the amplitude that the 
state | proton going +z, spin -fz) is generated in an infinitesimal time dt from the state 
| A, spin +z), or, in other words, that 

(proton going +z, spin +z | H | A, spin +z) = iha t (17.38) 

where H is the Hamiltonian of the world — or, at least, of whatever is responsible for the 
A-decay. The conservation of angular momentum means that the Hamiltonian must 
have the property that 

(proton going +z, spin —z\H\A, spin +z) = 0. (17.39) 

By the amplitude b we mean that 

(proton going +z, spin — z | H | A, spin —z) = ihb. (17.40) 

Conservation of angular momentum implies that 

(proton going -\ -z, spin +z | H | A, spin — z) = 0. (17.41) 

If the amplitudes written in (17.33) and (17.34) are not clear, we can express them 
more mathematically as follows. By (17.33) we intend the amplitude that the A with 
spin along will disintegrate into a proton moving along the -fz'-direction with its 
spin also in the +z'-direction, namely the amplitude 

(proton going +z', spin +z' 1 H\ A, spin +z). (17.42) 

By the general theorems of quantum mechanics, this amplitude can be written as 

X) (proton going +z', spin -fz' | H | A, /)( A, i | A, spin +z), (17.43) 

t Remembering that the spin is an axial vector and flips over in the reflection. 
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where the sum is to be taken over the base states | A, /) of the A-particle at rest. Since the 
A-particle is spin one-half, there are two such base states which can be in any reference 
base we wish. If we use for base states spin “up” and spin “down” with respect to z' 
(+z\ — z'), the amplitude of (17.43) is equal to the sum 

(proton going +z', spin + z' | H | A, +z')(A, +z' | A, +z) 

+ (proton going +z', spin +z'\H\ A, -z'){ A, -z' | A, +z). (17.44) 

The first factor of the first term is a, and the first factor of the second term is zero — from 
the definition of (17.38), and from (17.41), which in turn follows from angular momentum 
conservation. The remaining factor (A, +z' | A, +z) of the first term is just the amplitude 
that a spin one-half particle which has spin “up” along one axis will also have spin “up” 
along an axis tilted at the angle 8 , which is cos 8/2 — see Table 6-2. So (17.44) is just 
a cos 0/2, as we wrote in (17.33). The amplitude of (17.34) follows from the same kind 
of arguments for a spin “down” A-particle. 


17-6 Summary of the rotation matrices 

We would like now to bring together in one place the various things we have 
learned about the rotations for particles of spin one-half and spin one — so they will 
be convenient for future reference. On the next page you will find tables of the two 
rotation matrices R z {<f) and R y {8) for spin one-half particles, for spin-one particles, 
and for photons (spin-one particles with zero rest mass). For each spin we will 
give the terms of the matrix (j \ R | i) for rotations about the z-axis or the j^-axis. 
They are, of course, exactly equivalent to the amplitudes like (4 T | 0 S) we have 
used in earlier chapters. We mean by R z (<j>) that the state is projected into a new 
coordinate system which is rotated through the angle <f> about the z-axis — using 
always the right-hand rule to define the positive sense of the rotation. By R v (8) 
we mean that the reference axes are rotated by the angle 8 about the j;-axis. Know- 
ing these two rotations, you can, of course, work out any arbitrary rotation. As 
usual, we write the matrix elements so that the state on the left is a base state of 
the new (rotated) frame and the state on the right is a base state of the old (un- 
rotated) frame. You can interpret the entries in the tables in many ways. For 
instance, the entry e~ l<t>12 in Table 17-1 means that the matrix element (— \R\ — ) = 
e~ t(p!2 . It also means that R | — ) = e~ %4>!2 | — ), or that (— | R = (— J e~ l<t>f2 . 
It’s all the same thing. 
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Table 17-1 

Rotation matrices for spin one-half 

Two states : | +), “up” along the z-axis, m = +1/2 

| — ), “down” along the z-axis, m = —1/2 


RM) 

1 +) 

l-> 

<+l 

^4- 1^/2 

0 

<-l 

0 

e-*' 2 


Ry(9) 

1 +> 1 -) 

<+l 

<-l 

cos 0/2 sin 0/2 

—sin 0/2 cos 0/2 


Table 17-2 

Rotation matrices for spin one 

Three states : | -+*), np = -j-1 
| 0 ), m — 0 

I ~). m = -1 


RM) 

1+) 

|0> 

I-) 

<+l 

e+i* 

0 

0 

<0 | 

0 

1 

0 

<-1 

0 

0 



RM 

1 +) 

|o> 

I”) 

(+1 

i(i + cos e) 


J(1 — COS 0) 

<0 1 

1 • * 
7 = sin 0 

V 2 

cos 0 

, 1 . a 

H 7 = sin 0 

V2 

<-l 

J(1 — cos0) 

--Lsin* 

i(l + COS 0) 


Table 17-3 
Photons 


Two states : | R) *= —7= (| x) + i | >)), m = +1 (RHC polarized) 

v 2 

| L) = (| jc) — i 1 y)), m = — 1 (LHC polarized) 

V2 


RM) 

\R) 

\L) 

<*l 

e +i * 

0 

(L\ 

0 

e~* 
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Angular Momentum 


18-1 Electric dipole radiation 

In the last chapter we developed the idea of the conservation of angular 
momentum in quantum mechanics, and showed how it might be used to predict 
the angular distribution of the proton from the disintegration of the A-particle 
We want now to give you a number of other, similar, illustrations of the con- 
sequences of momentum conservation in atomic systems Our first example is 
the radiation of light from an atom. The conservation of angular momentum 
(among other things) will determine the polarization and angular distribution 
of the emitted photons. 

Suppose we have an atom which is in an excited state of definite angular 
momentum — say with a spin of one — and it makes a transition to a state of angular 
momentum zero at a lower energy, emitting a photon. The problem is to figure 
out the angular distribution and polarization of the photons. (This problem is 
almost exactly the same as the A 0 disintegration, except that we have spin-one 
instead of spin one-half particles.) Since the upper state of the atom is spin one, 
there are three possibilities for its z-component of angular momentum. The value 
of m could be +1, or 0, or — 1. We will take m = +1 for our example. Once 
you see how it goes, you can work out the other cases. We suppose that the atom 
is sitting with its angular momentum along the +z-axis — as in Fig. 18-1 (a) — and 
ask with what amplitude it will emit right circularly polarized light upward along 
the z-axis, so that the atom ends up with zero angular momentum — as shown in 
part (b) of the figure. Well, we don't know the answer to that. But we do know 
that right circularly polarized light has one unit of angular momentum about its 
direction of propagation. So after the photon is emitted, the situation would 
have to be as shown in Fig. 18-1 (b) — the atom is left with zero angular momentum 
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Fig. 18-1. An atom with m = +1 
emits a RHC photon along the +z-axis. 


Fig. 18-2. An atom with m = — 1 
emits a LHC photon along the -Tz-axis. 
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Fig. 18-3. If the process of (a) is 
transformed by an inversion through the 
center of the atom, it appears as in (b). 


I 


about the z-axis, since we have assumed an atom whose lower state is spin zero. 
We will let a stand for the amplitude for such an event. More precisely, we let a 
be the amplitude to emit a photon into a certain small solid angle Afi, centered 
on the z-axis, during a time dt. Notice that the amplitude to emit a LHC photon 
in the same direction is zero. The net angular momentum about the z-axis would 
be — 1 for such a photon and zero for the atom for a total of — 1, which would 
not conserve angular momentum. 

Similarly, if the spin of the atom is initially “down” (— 1 along the z-axis), 
it can emit only a LHC polarized photon in the direction of the + z-axis, as shown 
in Fig. 18-2. We will let b stand for the amplitude for this event — meaning again 
the amplitude that the photon goes into a certain solid angle AQ. On the other 
hand, if the atom is in the m — 0 state, it cannot emit a photon in the +z-direction 
at all, because a photon can have only the angular momentum +1 or — 1 a long 
its direction of motion. 

Next, we can show that b is related to a. Suppose we perform an inversion of 
the situation in Fig. 18-1, which means that we should imagine what the system 
would look like if we were to move each part of the system to an equivalent point 
on the opposite side of the origin. This does not mean that we should reflect the 
angular momentum vectors, because they are artificial. We should, rather, invert 
the actual character of the motion that would correspond to such an angular 
momentum. In Fig. 18-3 (a) and (b) we show what the process of Fig. 18-1 looks 
like before and after an inversion with respect to the center of the atom. Notice 
that the sense of rotation of the atom is unchanged.! In the inverted system of 
Fig. 1 8—3 (b) we have an atom with m — +1 emitting a LHC photon downward. 

If we now rotate the system of Fig. 1 8—3 (b) by 180° about the ;t- or y-axis, it 
becomes identical to Fig. 18-2. The combination of the inversion and rotation 
turns the second process into the first. Using Table 17-2, we see that a rotation 
of 180° about the y-axis just throws an m = —1 state into an m = +1 state, 
so the amplitude b must be equal to the amplitude a except for a possible sign 
change due to the inversion . The sign change in the inversion will depend on the 
parities of the initial and final state of the atom. 

In atomic processes, parity is conserved, so the parity of the whole system 
must be the same before and after the photon emission. What happens will depend 
on whether the parities of the initial and final states of the atom are even or odd — 
the angular distribution of the radiation will be different for different cases. We 
will take the common case of odd parity for the initial state and even parity for the 
final state; it will give what is called “electric dipole radiation.” (If the initial 
and final states have the same parity we say there is “magnetic dipole radiation,” 
which has the character of the radiation from an oscillating current in a loop.) 
If the parity of the initial state is odd, its amplitude reverses its sign in the inversion 
which takes the system from (a) to (b) of Fig. 18-3. The final state of the atom 
has even parity, so its amplitude doesn’t change sign. If the reaction is going to 
conserve parity, the amplitude b must be equal to a in magnitude but of the 
opposite sign. 

We conclude that if the amplitude is a that an m = +1 state will emit a 
photon upward, then for the assumed parities of the initial and final states the 
amplitude that an m = — 1 state will emit a LHC photon upward is — a.% 

We have all we need to know to find the amplitude for a photon to be emitted 
at any angle 9 with respect to the z-axis. Suppose we have an atom originally 
polarized with m = +1. We can resolve this state into +1, 0, and —1 states 
with respect to a new z'-axis in the direction of the photon emission. The ampli- 
tudes for these three states are just the ones given in the lower half of Table 17-2. 


t When we change x, y, z into — x t — y, — z, you might think that all vectors get re- 
versed. That is true for polar vectors like displacements and velocities, but not for an 
axial vector like angular momentum — or any vector which is derived from a cross product 
of two polar vectors. Axial vectors have the same components after an inversion. 

X Some of you may object to the argument we have just made, on the basis that the final 
states we have been considering do not have a definite parity. You will find in Added 
Note 2 at the end of this chapter another demonstration, which you may prefer. 
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The amplitude that a RHC photon is emitted in the direction 8 is then a times the 
amplitude to have m = + 1 in that direction, namely, 

a (+ I Ry(fi) I + ) = j 0 "I" cos ^)* (18.1) 

The amplitude that a LHC photon is emitted in the same direction is — a times the 
amplitude to have m = — 1 in the new direction. Using Table 17-2, it is 

-a(- | R v {8) | +) = -^ 0 - cos d). (18.2) 

If you are interested in other polarizations you can find out the amplitude for them 
from the superposition of these two amplitudes To get the intensity of any 
component as a function of angle, you must, of course, take the absolute square 
of the amplitudes. 


18-2 Light scattering 


Let’s use these results to solve a somewhat more complicated problem — 
but also one which is somewhat more real. We suppose that the same atoms are 
sitting in their ground state (j — 0), and scatter an incoming beam of light. 
Let’s say that the light is going initially in the +z-direction, so that we have photons 
coming up to the atom from the — z-direction, as shown in Fig. 18-4(a). We can 
consider the scattering of light as a two-step process: The photon is absorbed, 
and then is re-emitted. If we start with a RHC photon as in Fig. 18-4(a), and 
angular momentum is conserved, the atom will be in an m = +1 state after the 
absorption — as shown in Fig. 18-4(b). We call the amplitude for this process c. 
The atom can then emit a RHC photon in the direction 8 — as in Fig. 18-4(c). 
The total amplitude that a RHC photon is scattered in the direction 6 is just 
c times (18.1). Let’s call this scattering amplitude (R' \ S \ R); we have 

(R'\S\R) = y (1 + cos 8). (18.3) 

There is also an amplitude that a RHC photon will be absorbed and that 
a LHC photon will be emitted. The product of the two amplitudes is the amplitude 
(U | S | R) that a RHC photon is scattered as a LHC photon. Using (18.2), we have 



Fig. 1 8-4. The scattering of light by 
an atom seen as a two-step process. 


(L'\S\R) = -y (1 - cos 8). 


(18.4) 


Now let’s ask about what happens if a LHC photon comes in. When it is 
absorbed, the atom will go into an m = — 1 state. By the same kind of arguments 
we used in the preceding section, we can show that this amplitude must be — c. 
The amplitude that an atom in the m = — 1 state will emit a RHC photon at the 
angle 8 is a times the amplitude (+ | R y (8) \ — ), which is ^(1 — cos 8 ). So we have 

(R! | 5 | L) = -y (1 - cos 8). (18.5) 

Finally, the amplitude for a LHC photon to be scattered as a LHC photon is 
(L' I S\ L) = y (1 + COS0). (18.6) 

(There are two minus signs which cancel.) 

If we make a measurement of the scattered intensity for any given combina- 
tion of circular polarizations it will be proportional to the square of one of our four 
amplitudes. For instance, with an incoming beam of RHC light the intensity of 
the RHC light in the scattered radiation will vary as (1 + cos 0) 2 . 

That’s all very well, but suppose we start out with linearly polarized light. 
What then? If we have ^-polarized light, it can be represented as a superposition 
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(18.7) 


of RHC and LHC light. We write (see Section 1 1-4) 

ID = "7=(l*> + I D)- 

V2 

Or, if we have ^-polarized light, we would have 

ID = — l - (| R) - | L)). (18.8) 

V2 

Now what do you want to know? Do you want the amplitude that an x-polarized 
photon will scatter into a RHC photon at the angle 61 You can get it by the usual 
rule for combining amplitudes. First, multiply (18.7) by (R f | S to get 

</?' I S\ x) = 4= «*' I s\ R) + (R' | S|L)), (18.9) 

V2 

and then use (18.3) and (18.5) for the two amplitudes. You get 

(R' I S I x) = cos 6. (18.10) 

V2 

If you wanted the amplitude that an x-photon would scatter into a LHC photon, 
you would get 

(U I S\ x) = — cose. (18.11) 

\/2 

Finally, suppose you wanted to know the amplitude that an jc-polarized photon 
will scatter while keeping its ;c-polarization. What you want is (x' | S | x). This 
can be written as 


V\S\x) = ( 

x' 1 R')(R' I 

5|D + <*'| £')<!' |S|jc>- 

(18.12) 

If you then use the relations 

i 

R') = — 

V2 

(1 + i 1 /», 

(18.13) 

i 


(l*'> - 

(18.14) 

it follows that 




<*' 1 R'> 

1 

“ Vi’ 

(18.15) 


<*' 1 U) 

1 

V2‘ 

(18.16) 

So you get that 


= ac cos 6. 

(18.17) 


The answer is that a beam of jc-polarized light will be scattered at the direction 6 
(in the xz-plane) with an intensity proportional to cos 2 6. If you ask about ^-polar- 
ized light, you find that 

<y\S\x) - 0. (18.18) 

So the scattered light is completely polarized in the x-direction. 

Now we notice something interesting. The results (18.17) and (18.18) corre- 
spond exactly to the classical theory of light scattering we gave in Vol. 1, Section 
32-6, where we imagined that the electron was bound to the atom by a linear 
restoring force — so that it acted like a classical oscillator. Perhaps you are think- 
ing: “It’s so much easier in the classical theory; if it gives the right answer why 
bother with the quantum theory?” For one thing, we have considered so far 
only the special — though common — case of an atom with a j = 1 excited state 
and ay = 0 ground state. If the excited state had spin two, you would get a differ- 
ent result. Also, there is no reason why the model of an electron attached to a 
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spring and driven by an oscillating electric field should work for a single photon. 
But we have found that it does in fact work, and that the polarization and intensi- 
ties come out right. So in a certain sense we are bringing the whole course around 
to the real truth. Whereas we have, in Vol. I, done the theory of the index of 
refraction, and of light scattering, by the classical theory, we have now shown that 
the quantum theory gives the same result for the most common case. In effect 
we have now done the polarization of sky light, for instance, by quantum me- 
chanical arguments, which is the only truly legitimate way. 

It should be, of course, that all the classical theories which work are sup- 
ported ultimately by legitimate quantum arguments. Naturally, those things 
which we have spent a great deal of time in explaining to you were selected from 
just those parts of classical physics which still maintain validity in quantum 
mechanics. You’ll notice that we did not discuss in great detail any model of the 
atom which has electrons going around in orbits. That’s because such a model 
doesn’t give results which agree with the quantum mechanics. But the electron 
on a spring — which is not, in a sense, at all the way an atom “looks” — does 
work, and so we used that model for the theory of the index of refraction. 


18-3 The annihilation of positronium 


We would like next to take an example which is very pretty. It is quite inter- 
esting and, although somewhat complicated, we hope not too much so. Our 
example is the system called positronium , which is an “atom” made up of an elec- 
tron and a positron — a bound state of an e + and an e~ It is like a hydrogen 
atom, except that a positron replaces the proton. This object has — like the hydro- 
gen atom — many states. Also like the hydrogen, the ground state is split into a 
“hyperfine structure” by the interaction of the magnetic moments. The spins of 
the electron and positron are each one-half, and they can be either parallel or 
antiparallel to any given axis. (In the ground state there is no other angular 
momentum due to orbital motion.) So there are four states: three are the sub- 
states of a spin-one system, all with the same energy; and one is a state of spin 
zero with a different energy. The energy splitting is, however, much larger than 
the 1420 megacycles of hydrogen because the positron magnetic moment is so 
much stronger — 1000 times stronger — than the proton moment. 

The most important difference, however, is that positronium cannot last 
forever. The position is the antiparticle of the electron; they can annihilate each 
other. The two particles disappear completely — converting their rest energy into 
radiation, which appears as 7-rays (photons). In the disintegration, two particles 
with a finite rest mass go into two or more objects which have zero rest mass.f 

We begin by analyzing the disintegration of the spin-zero state of the posi- 
tronium. It disintegrates into two 7-rays with a lifetime of about 10™ 10 second. 
Initially, we have a positron and an electron close together and with spins anti- 
parallel, making the positronium system. After the disintegration there are two 
photons going out with equal and opposite momenta (Fig. 18-5). The momenta 
must be equal and opposite, because the total momentum after the disintegration 
must be zero, as it was before, if we are taking the case of annihilation at rest. 
If the positronium is not at rest, we can ride with it, solve the problem, and then 
transform everything back to the lab system. (See, we can do anything now; 
we have all the tools.) 

First, we note that the angular distribution is not very interesting. Since 
the initial state has spin zero, it has no special axis — it is symmetric under all 
rotations. The final state must then also be symmetric under all rotations. That 
means that all angles for the disintegration are equally likely — the amplitude is 
the same for a photon to go in any direction. Of course, once we find one of 
the photons in some direction the other must be opposite. 



BEFORE AFTER 

(0) (b) 


Fig. 1 8-5. The two-photon annihila- 
tion of positronium. 


t In the deeper understanding of the world today, we do not have an easy way to 
distinguish whether the energy of a photon is less “matter” than the energy of an electron, 
because as you remember all the particles behave very similarly. The only distinction is 
that the photon has zero rest mass. 
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Fig. 18-6. One possibility for posi- 
tronium annihilation along the z-axis. 


The only remaining question, which we now want to look at, is about the 
polarization of the photons. Let’s call the directions of motion of the two photons 
the plus and minus z-axes. We can use any representations we want for the polar- 
ization states of the photons; we will choose for our description right and left 
circular polarization — always with respect to the directions of motion. Right 
away, we can see that if the photon going upward is RHC, then angular momentum 
will be conserved if the downward going photon is also RHC. Each will carry + 1 
unit of angular momentum with respect to its momentum direction , which means 
plus and minus one unit about the z-axis. The total will be zero, and the angular 
momentum after the disintegration will be the same as before. See Fig. 18-6. 

The same arguments show that if the upward going photon is RHC, the 
downward cannot be LHC. Then the final state would have two units of angular 
momentum. This is not permitted if the initial state has spin zero. Note that 
such a final state is also not possible for the other positronium ground state of 
spin one, because it can have a maximum of one unit of angular momentum in 
any direction. 

Now we want to show that two-photon annihilation is not possible at all 
from the spin-one state. You might think that if we took they = 1,/n = 0 state — 
which has zero angular momentum about the z-axis — it should be like the spin-zero 
state, and could disintegrate into two RHC photons. Certainly, the disintegration 
sketched in Fig. 18-7 (a) conserves angular momentum about the z-axis. But now 
look what happens if we rotate this system around the j>-axis by 180°; we get the 
picture shown in Fig. 1 8— 7(b). It is exactly the same as in part (a) of the figure. 
All we have done is interchange the two photons. Now photons are Bose particles; 
if we interchange them, the amplitude has the same sign, so the amplitude for the 
disintegration in part (b) must be the same as in part (a). But we have assumed 
that the initial object is spin one. And when we rotate a spin-one object in a state 
with m = 0 by 180° about the y~ axis, its amplitudes change sign (see Table 17-2 
for 0 = 7 r). So the amplitudes for (a) and (b) in Fig. 18-7 should have opposite 
signs; the spin-one state cannot disintegrate into two photons. 

When positronium is formed you would expect it to end up in the spin-zero 
state 1/4 of the time and in the spin-one state (with m = — 1, 0, or + 1)3/4 of the 
time. So 1/4 of the time you would get two-photon annihilations. The other 3/4 
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Fig. 1 8-7. For the j = 1 state of positronium, the process (a) and its 1 80° 
rotation about y (b) are exactly the same. 


t Note that we always analyze the angular momentum about the direction of motion of 
the particle. If we were to ask about the angular momentum about any other axis, we 
would have to worry about the possibility of “orbital” angular momentum — from a 
p X r term. For instance, we can’t say that the photons leave exactly from the center 
of the positronium. They could leave like two things shot out from the rim of a spinning 
wheel. We don’t have to worry about such possibilities when we take our axis along the 
direction of motion. 
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of the time there can be no two-photon annihilations. There is still an annihilation, 
but it has to go with three photons. It is harder for it to do that and the lifetime 
is 1000 times longer — about 10~ 7 second. This is what is observed experimentally. 
We will not go into any more of the details of the spin-one annihilation. 

So far we have that if we only worry about angular momentum, the spin-zero 
state of the positronium can go into two RHC photons. There is also another 
possibility: it can go into two LHC photons as shown in Fig. 18-8. The next 
question is, what is the relation between the amplitudes for these two possible 
decay modes? We can find out from the conservation of parity. 

To do that, however, we need to know the parity of the positronium. Now 
theoretical physicists have shown in a way that is not easy to explain that the 
parity of the electron and the positron — its antiparticle — must be opposite, so 
that the spin-zero ground state of positronium must be odd. We will just assume 
that it is odd, and since we will get agreement with experiment, we can take that 
as sufficient proof. 

Let’s see then what happens if we make an inversion of the process in Fig. 
18-6. When we do that, the two photons reverse directions and polarizations. 
The inverted picture looks just like Fig. 18-8. Assuming that the parity of the 
positronium is odd, the amplitudes for the two processes in Figs. 18-6 and 18-8 
must have the opposite sign. Let’s let | R\R 2 ) stand for the final state of Fig. 

18-6 in which both photons are RHC, and let | LiL 2 ) stand for the final state of 

Fig. 18-8, in which both photons are LHC. The true final state — let’s call it | F ) — 
must be 

|F) = |RiR 2 ) - | L x L 2 ). (18.19) 

Then an inversion changes the R’s into Us and gives the state 

F | F) == | L\L 2 ) - | R\R 2 ) = - I F), (18.20) 

which is the negative of (18.19). So the final state | F) has negative parity, which 
is the same as the initial spin-zero state of the positronium. This is the only final 
state that conserves both angular momentum and parity. There is some amplitude 
that the disintegration into this state will occur, which we don’t need to worry 
about now, however, since we are only interested in questions about the polariza- 
tion. 

What does the final state of (18.19) mean physically? One thing it means is 
the following: If we observe the two photons in two detectors which can be set 
to count separately the RHC or LHC photons, we will always see two RHC 
photons together, or two LHC photons together. That is, if you stand on one side 
of the positronium and someone else stands on the opposite side, you can measure 
the polarization and tell the other guy what polarization he will get. You have a 
50-50 chance of catching a RHC photon or a LHC photon ; whichever one you get, 
you can predict that he will get the same. 

Since there is a 50-50 chance for RHC or LHC polarization, it sounds as 
though it might be like linear polarization. Let’s ask what happens if we observe 
the photon in counters that accept only linearly polarized light. For Y-rays it is 
not as easy to measure the polarization as it is for light; there is no polarizer which 
works well for such short wavelengths. But let’s imagine that there is, to make the 
discussion easier. Suppose that you have a counter that only accepts light with 
x-polarization, and that there is a guy on the other side that also looks for linear 
polarized light with, say, .y-polarization. What is the chance you will pick up the 
two photons from an annihilation? What we need to ask is the amplitude that 
| F) will be in the state | x x y 2 ). In other words, we want the amplitude 


LHC 


m = 0 

e+e“ 



Fig. 18-8. Another possible process 
for positronium annihilation. 


which is, of course, just 


{xiyi | F), 

(xi)>2 I F1R2) - (Xiy 2 I LiL 2 ). 


( 18 . 21 ) 


Now although we are working with two-particle amplitudes for the two 
photons, we can handle them just as we did the single particle amplitudes, since 
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each particle acts independently of the other. That means that the amplitude 
(x 1 y 2 | RiRf) is just the product of the two independent amplitudes (x 1 | 7^) 
and {y 2 | 7? 2 )- Using Table 17-3, these two amplitudes are l/\/2 and z'A/2, so 

{^1^2 I R1R2) = + 2 * 

Similarly, we find that 

(xiy 2 I L x L 2 ) = — j • 

Subtracting these two amplitudes according to (18.21), we get that 

I F) = + 1 . (18.22) 

So there is a unit probability! that if you get a photon in your x-polarized detector, 
the other guy will get a photon in his y-polarized detector. 

Now suppose that the other guy sets his counter for ^-polarization the same 
as yours. He would never get a count when you got one. If you work it through, 
you will find that 

(x x x 2 | F) = 0. (18.23) 

It will, naturally, also work out that if you set your counter for ^-polarization he 
will get coincident counts only if he is set for ^-polarization. 

Now this all leads to an interesting situation. Suppose you were to set up 
something like a piece of calcite which separated the photons into x-polarized 
and y-polarized beams, and put a counter in each beam. Let’s call one the x-counter 
and the other the ^-counter. If the guy on the other side does the same thing, 
you can always tell him which beam his photon is going to go into. Whenever 
you and he get simultaneous counts, you can see which of your detectors caught 
the photon and then tell him which of his counters had a photon. Let’s say that 
in a certain disintegration you find that a photon went into your ^-counter; you 
can tell him that he must have had a count in his y-counter. 

Now many people who learn quantum mechanics in the usual (old-fashioned) 
way find this disturbing. They would like to think that once the photons are emitted 
it goes along as a wave with a definite character. They would think that since 
“any given photon” has some “amplitude” to be x-polarized or to bey-polarized, 
there should be some chance of picking it up in either the x- or y-counter and that 
this chance shouldn’t depend on what some other person finds out about a com- 
pletely different photon. They argue that “someone else making a measurement 
shouldn’t be able to change the probability that I will find something.” Our 
quantum mechanics says, however, that by making a measurement on photon 
number one, you can predict precisely what the polarization of photon number 
two is going to be when it is detected. This point was never accepted by Einstein, 
and he worried about it a great deal — it became known as the “Einstein-Podalsky- 
Rosen paradox.” But when the situation is described as we have done it here, 
there doesn’t seem to be any paradox at all; it comes out quite naturally that what 
is measured in one place is correlated with what is measured somewhere else. The 
argument that the result is paradoxical runs something like this: 

(1) If you have a counter which tells you whether your photon is RHC or LHC, 
you can predict exactly what kind of a photon (RHC or LHC) he will find. 

(2) The photons he receives must, therefore, each be purely RHC or purely LHC, 
some of one kind and some of the other. 

(3) Surely you cannot alter the physical nature of his photons by changing the 
kind of observation you make on your photons. No matter what measure- 
ments you make on yours, his must still be either RHC or LHC. 


t We have not normalized our amplitudes, or multiplied them by the amplitude for 
the disintegration into any particular final state, but we can see that this result is correct 
because we get zero probability when we look at the other alternative — see Eq. (18.23). 
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(4) Now suppose he changes his apparatus to split his photons into two linearly 
polarized beams with a piece of calcite so that all of his photons go either 
into an ^--polarized beam or into a >>-polarized beam. There is absolutely no 
way, according to quantum mechanics, to tell into which beam any par- 
ticular RHC photon will go. There is a 50% probability it will go into the 
x-beam and a 50% probability it will go into the y-beam. And the same 
goes for a LHC photon. 

(5) Since each photon is RHC or LHC — according to (2) and (3) — each one 
must have a 50-50 chance of going into the x-beam or the j^-beam and there 
is no way to predict which way it will go. 

(6) Yet the theory predicts that if you see your photon go through an ^-polarizer 
you can predict with certainty that his photon will go into his y-polarized 
beam. This is in contradiction to (5) so there is a paradox. 

Nature apparently doesn’t see the “paradox,” however, because experiment 
shows that the prediction in (6) is, in fact, true. We have already discussed the key 
to this “paradox” in our very first lecture on quantum mechanical behavior in 
Chapter 35, Vol. I. In the argument above, steps (1), (2), (4), and (6) are all 
correct, but (3), and its consequence (5), are wrong; they are not a true description 
of nature. Argument (3) says that by your measurement (seeing a RHC or a LHC 
photon) you can determine which of two alternative events occurs for him (seeing 
a RHC or a LHC photon), and that even if you do not make your measurement 
you can still say that his event will occur either by one alternative or the other. 
But it was precisely the point of Chapter 35, Vol. I, to point out right at the begin- 
ning that this is not so in Nature. Her way requires a description in terms of inter- 
fering amplitudes, one amplitude for each alternative. A measurement of which 
alternative actually occurs destroys the interference, but if a measurement is 
not made you cannot still say that “one alternative or the other is still occurring.” 

If you could determine for each one of your photons whether it was RHC and 
LHC, and also whether it was x-polarized (all for the same photon) there would 
indeed be a paradox. But you cannot do that — it is an example of the uncertainty 
principle. 

Do you still think there is a “paradox”? Make sure that it is, in fact, a paradox 
about the behavior of Nature, by setting up an imaginary experiment for which 
the theory of quantum mechanics would predict inconsistent results via two 
different arguments. Otherwise the “paradox” is only a conflict between reality 
and your feeling of what reality “ought to be.” 

Do you think that it is not a “paradox,” but that it is still very peculiar? 
On that we can all agree. It is what makes physics fascinating. 

18-4 Rotation matrix for any spin 

By now you can see, we hope, how important the idea of the angular mo- 
mentum is in understanding atomic processes. So far, we have considered only 
systems with spins — or “total angular momentum” — of zero, one-half, or one. 
There are, of course, atomic systems with higher angular momenta. For analyzing 
such systems we would need to have tables of rotation amplitudes like those in 
Section 17-6. That is, we would need the matrix of amplitudes for spin §, 2, 
f, 3, etc. Although we will not work out these tables in detail, we would like 
to show you how it is done, so that you can do it if you ever need to. 

As we have seen earlier, any system which has the spin or “total angular mo- 
mentum” j can exist in any one of (2 j + 1) states for which the z-component of 
angular momentum can have any one of the discrete values in the sequence y, 
j — l,y — 2, . . . , —(j — 1), — j (all in units of h). Calling the z-component of 
angular momentum of any particular state mh , we can define a particular 
angular momentum state by giving the numerical values of the two “angular 
momentum quantum numbers” j and m. We can indicate such a state by the state 
vector | j,m). In the case of a spin one-half particle, the two states are then 
| \) and | i, — or for a spin-one system, the states would be written in this 

notation as | 1, 4-1), | 1, 0), | 1, — 1). A spin-zero particle has, of course, only the 
one state | 0, 0). 
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Now we want to know what happens when we project the general state | /, m) 
into a representation with respect to a rotated set of axes. First, we know that j 
is a number which characterizes the system, so it doesn’t change. If we rotate the 
axes, all we do is get a mixture of the various m-values for the same j. In general, 
there will be some amplitude that in the rotated frame the system will be in the 
state | j, m f ), where m' gives the new z-component of angular momentum. So what 
we want are all the matrix elements (j, m r \ R \j, m) for various rotations. We 
already know what happens if we rotate by an angle 4> about the z-axis. The new 
state is just the old one multiplied by e im * — it still has the same m-value. We can 
write this by 

R z (<t>)\j,m) = e™+\j,m). (18.24) 

Or, if you prefer, 

(h m' | *,(*) | j, m) = (18.25) 

(where 5 m , m ' is 1 if m f = m , or zero otherwise). 

For a rotation about any other axis there will be a mixing of the various 
m-states. We could, of course, try to work out the matrix elements for an arbitrary 
rotation described by the Euler angles /3, a , and 7. But it is easier to remember 
that the most general such rotation can be made up of the three rotations R z ( 7), 
j R y (a), R z ( jS); so if we know the matrix elements for a rotation about the >>-axis, 
we will have all we need. 

How can we find the rotation matrix for a rotation by the angle d about the 
jy-axis for a particle of spin jl We can’t tell you how to do it in a basic way (with 
what we have had). We did it for spin one-half by a complicated symmetry argu- 
ment. We then did it for spin one by taking the special case of a spin-one system 
which was made up of two spin one-half particles. If you will go along with us and 
accept the fact that in the general case the answers depend only on the spin j, and 
are independent of how the inner guts of the object of spin j are put together, we 
can extend the spin-one argument to an arbitrary spin. We can, for example, 
cook up an artificial system of spin f out of three spin one-half objects. We can 
even avoid complications by imagining that they are all distinct particles — like a 
proton, an electron, and a muon. By transforming each spin one-half object, we 
can see what happens to the whole system — remembering that the three amplitudes 
are multiplied for the combined state. Let’s see how it goes in this case. 

Suppose we take the three spin one-half objects all with spins “up”; we can 
indicate this state by | + + +). If we look at this system in a frame rotated about 
the z-axis by the angle <£, each plus stays a plus, but gets multiplied by e Hj2 . 
We have three such factors, so 

RM) I + + +> = «* tW2 ' I + + +)• (18.26) 

Evidently the state | + + +) is just what we mean by the m = +§ state, or 
the state | ■§, +f>. 

If we now rotate this system about the >>-axis, each of the spin one-half objects 
will have some amplitude to be plus or to be minus, so the system will now be a 

mixture of the eight possible combinations | + + +), | + H >, \ -\ b), 

I 1 — h), M X I “ + I b), or | ). It is clear, however, 

that these can be broken up into four sets, each set corresponding to a particular 
value of m. First, we have | + H — b), for which m — f . Then there are the 

three states | — | ), | H b), and | b +) — each with two plusses and 

one minus. Since each spin one-half object has the same chance of coming out 
minus under the rotation, the amounts of each of these three combinations should 
be equal. So let’s take the combination 

4={l + + -> + ! + -+> + !- + +)} 08-27) 

V3 

with the factor l/\/3 put in to normalize the state. If we rotate this state about 
the z-axis, we get a factor e l<t>12 for each plus, and e~ Hl2 for each minus. Each 
term in (18.27) is multiplied by e^ /2 , so there is the common factor e % * 12 . This 
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one 


pieces. For instance, 


c< »» 


| + + -> = a 2 c I +' +' +') + a 2 d I +' +' -') + abc | +' +') 

+ bac | +' +') + abd | +' -') + bad | +' -'} 

+ b 2 c | +') + b 2 d | -')• (18.33) 

Adding two similar expressions for | H 1-} and | — |- +) and dividing by 

x/3, we find 

If, +1,5} = V3« 2 c|f,+f,r) 

+ (a 2 d + labc) | f,+i f r> 

+(ibad + b 2 c) | f ,-ir> 

+ V3b 2 d\l-IT). (18.34) 

Continuing the process we find all the elements (fT \ iS) of the transformation ma- 
trix as given in Table 18-2. The first column comes from Eq. (18.32); the second 
from (18.34). The last two columns were worked out in the same way. 


Table 18-2 

Rotation matrix for a spin § particle 

(The coefficients a y 6, c, and d are given in Table 12-4.) 


(jT\iS) 

1 §,+f,S> 

1 f ,+bs) 

1 l-bs) 

f,— f,5) 

(f,+f,T| 

fl 3 

\/3 a 2 c 

\/3 ac 2 

c 3 

(f 1 

\/3 a 2 b 

a 2 d + 2abc 

c 2 b - f- 2 dac 

v/3 c 2 d 

(h~h T \ 

V3 ab 2 

2 bad + b 2 c 

2 cdb + d 2 a 

\/3 cd 2 

(f,— I 

b 3 

V3b 2 d 

x/3 bd 2 

d 3 


Now suppose the T-frame were rotated with respect to S by the angle 0 about 
their y-axes. Then a , b , c, and d have the values [see (12.54)] a = d = cos 0/2, 
and c = —b — sin 0/2. Using these values in Table 18-2 we get the forms 
which correspond to the second part of Table 17-2, but now for a spin f system. 

The arguments we have just gone through are readily generalized to a system 
of any spin /. The states | j,m) can be put together from 2 j particles, each of 
spin one-half. (There are j + m of them in the | +> state and j — m in the | — ) 
state.) Sums are taken over all the possible ways this can be done, and the state 
is normalized by multiplying by a suitable constant. Those of you who are mathe- 
matically inclined may be able to show that the following result comes outf : 


(j, m> | R V (B) | j,m) = [(j + m)\(j - m)l(j + m')\U - mj .] 112 

(-l)*(cos e/2f 3 + m '- m - 2 \sin e/2) m - m '+ 2k 
4^ (m — m ! + k)\(j + m' — k)\(j — m — k)\k\ ’ 


(18.35) 


where k is to go over all values which give terms > 0 in all the factorials. 

This is quite a messy formula, but with it you can check Table 17-2 for j = 1 
and prepare tables of your own for larger j. Several special matrix elements are of 
extra importance and have been given special names. For example the matrix 
elements for m = m' = 0 and integral j are known as the Legendre polynomials 
and are called P 3 (cos 0): 


(j\ 0 | R v (6) | j, 0) = P ,( cos 0). (18.36) 

t If you want details, they, are given in an appendix to this chapter. 
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The first few of these polynomials are: 


P 0 (cos 0) — 1, 


(18.37) 

P 1 (cos 0) = cos 0, 


(18.38) 

P 2 (cos 0) - 4(3 cos 2 0 — 

i), 

(18.39) 

P z (cos 0) = 4(5 cos 3 0 — 

3 cos 0). 

(18.40) 


18-5 Measuring a nuclear spin 

We would like to show you one example of the application of the coefficients 
we have just described. It has to do with a recent, interesting experiment which 
you will now be able to understand. Some physicists wanted to find out the spin 
of a certain excited state of the Ne 20 nucleus. To do this, they bombarded a 
carbon target with a beam of accelerated carbon ions, and produced the desired 
excited state of Ne 20 — called Ne 20 * — in the reaction 


C 1 2 + C 12 -> Ne 20 * + a u 


where a x is the a-particle, or He 4 . Several of the excited states of Ne 20 produced 
this way are unstable and disintegrate in the reaction 

Ne 20 * O 16 + a 2 - 

So experimentally there are two a-particles which come out of the reaction. We 
call them a x and a 2 ; since they come off with different energies, they can be 
distinguished from each other. Also, by picking a particular energy for a x we 
can pick out any particular excited state of the Ne 20 . 

The experiment was set up as shown in Fig. 18-9. A beam of 16-Mev carbon 
ions was directed onto a thin foil of carbon. The first a-particle was counted in a 
silicon diffused junction detector marked a x — set to accept a-particles of the 
proper energy moving in the forward direction (with respect to the incident C 12 
beam). The second a-particle was picked up in the counter a 2 at the angle 0 
with respect to a x . The counting rate of coincidence signals from a x and a 2 were 
measured as a function of the angle 0. 

The idea of the experiment is the following. First, you need to know that the 
spins of C 1 2 , 0 1 6 , and the a-particle are all zero. If we call the direction of motion 
of the initial C 12 the +z-direction, then we know that the Ne 20 * must have zero 
angular momentum about the z-axis. None of the other particles has any spin; 
the C 12 arrives along the z-axis and the a x leaves along the z-axis so they can’t 
have any angular momentum about it. So whatever the spin j of the Ne 20 * is, 
we know that it is in the state | j, 0). Now what will happen when the Ne 20 * 
disintegrates into an O 16 and the second a-particle? Well, the a-particle is picked 
up in the counter a 2 and to conserve momentum the 0 16 must go off in the op- 
posite direction.! About the new axis through a 2 , there can be no component of 
angular momentum. The final state has zero angular momentum about the new 
axis, so the Ne 20 * can disintegrate this way only if it has some amplitude to have 
m f equal to zero, where m* is the quantum number of the component of angular 
momentum about the new axis. In fact, the probability of observing a 2 at the angle 
0 is just the square of the amplitude (or matrix element) 


SILICON JUNCTION 
DETECTORS 


% 




C 1 2 BEAM t 

16 MeV 

/ 

CARBON FOIL 
30/ig /cm 2 

Fig. 1 8-9. Experimental arrange- 
ment for determining the spin of certain 
states of Ne 20 . 



<j 9 0\R y (e)\j 9 0). (18.41) 

To find the spin of the Ne 20 * state in question, the intensity of the second 
a-particle was plotted as a function of angle and compared with the theoretical 


t We can neglect the recoil given to the Ne 20 * in the first collision. Or better still, 
we can calculate what it is and make a correction for it. 
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Fig. 18-10. Experimental results for 
the angular distribution of the a-particles 
from two excited states of Ne 20 pro- 
duced in the setup of Fig. 1 8-9. [From 
J. A. Kuehner, Physical Review , Vol. 1 25, 
p. 1 653, 1962.] 


curves for various values of J. As we said in the last section, the amplitudes 
(j, 0 | R v (6) | y, 0) are just the functions P } (cos 6). So the possible angular distribu- 
tions are curves of \P 3 (cos 0)] 2 . The experimental results are shown in Fig. 18-10 
for two of the excited states. You can see that the angular distribution for the 
5.80-Mev state fits very well the curve for [Pi (cos 6 )] 2 , and so it must be a spin-one 
state. The data for the 5.63-Mev state, on the other hand, are quite different; 
they fit the curve [P 3 (cos 0)] 2 . The state has a spin of 3. 

From this experiment we have been able to find out the angular momentum of 
two of the excited states of Ne 20 *. This information can then be used for trying 
to understand what the configuration of protons and neutrons is inside this 
nucleus — one more piece of information about the mysterious nuclear forces. 

18-6 Composition of angular momentum 

When we studied the hyperfine structure of the hydrogen atom in Chapter 12 
we had to work out the internal states of a system composed of two particles — 
the electron and the proton — each with a spin of one-half. We found that the four 
possible spin states of such a system could be put together intjo two groups — a 
group with one energy that looked to the external world like a spin-one particle, 
and one remaining state that behaved like a particle of zero spin. That is, putting 
together two spin one-half particles we can form a system whose “total spin” 
is one, or zero. In this section we want to discuss in more general terms the spin 
states of a system which is made up of two particles of arbitrary spin. It is another 
important problem about angular momentum in quantum mechanical systems. 

Let’s first rewrite the results of Chapter 12 for the hydrogen atom in a form 
that will be easier to extend to the more general case. We began with two particles 
which we will now call particle a (the electron) and particle b (the proton). Particle 
a had the spin j a ( = i), and its z-component of angular momentum m a could 
have one of several values (actually 2, namely m a = +i or m a = — Similarly, 
the spin state of particle b is described by its spin jb and its z-component of angular 
momentum m b . Various combinations of the spin states of the two particles 
could be formed. For instance, we could have particle a with m a — J and particle 
b with m b = — J, to make a state | a, +£; b , — £). In general, the combined 
states formed a system whose “system spin,” or “total spin,” or “total angular 
momentum” / could be 1, or 0. And the system could have a z-component of 
angular momentum M, which was -f 1, 0, or — 1 when / = 1, or 0 when / = 0. 
In this new language we can rewrite the formulas in (12.41) and (12.42) as shown 
in Table 18-3. 

In the table the left-hand column describes the compound state in terms of 
its total angular momentum J and the z-component Af. The right-hand column 
shows how these states are made up in terms of the m-values of the two particles 
a and b. 

We want now to generalize this result to states made up of two objects a and 
b of arbitrary spins j a and jb- We start by considering an example for which j a — £ 


Table 18-3 

Composition of angular momenta for two 
spin i particles (/« = = i) 
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and = 1, namely, the deuterium atom in which particle a is an electron (e) and 
particle b is the nucleus — a deuteron (d). We have then that j a = j e = The 
deuteron is formed of one proton and one neutron in a state whose total spin is 
one, so jb = jd = 1. We want to discuss the hyperfine states of deuterium — just 
the way we did for hydrogen. Since the deuteron has three possible states m b = 
rrta = +1, 0, — 1 , and the electron has two, m a = m e = -f J, there are 
six possible states as follows (using the notation | e, m e ; d, m d )): 

I e 5 +^; d,+l), 

1 d,0); | e,— J; d,+ l), 

|e,+±;d,-l>;|e,-i;d,0>, 1 ‘ J 

|e,-i;d-l). 


You will notice that we have grouped the states according to the values of the sum 
of m e and ma — arranged in descending order. 

Now we ask: What happens to these states if we project into a different 
coordinate system? If the new system is just rotated about the z-axis by the angle 
4>, then the state | e, w e ; d, m d > gets multiplied by 




(18.43) 


(The state may be thought of as the product | e, m e ) | d, /w d ), and each state vector 
contributes independently its own exponential factor.) The factor (18.43) is of the 
form e lMtt> , so the state | e, m Q ; d, /w d ) has a z-component of angular momentum 
equal to 

M = m e + m d . (18.44) 


The z-component of the total angular momentum is the sum of the z-components of 
angular momentum of the parts . 

In the list of (18.42), therefore, the state in the top line has M — +§, the 
two in the second line have M = +J, the next two have M = — and the 
last state has M = —\. We see immediately one possibility for the spin J of the 
combined state (the total angular momentum) must be f, and this will require 
four states with M = +f, +£, — and — f. 

There is only one candidate for M = f , so we know already that 

| 7 = f , Af = +f) = |e,+i;d,+l). (18.45) 

But what is the state \J= f , M — ^)? We have two candidates in the second line 
of (18.42), and, in fact, any linear combination of them would also have M = 

So, in general, we must expect to find that 

\J = IM = +*) - a|e,+i;d,0> + /3|e,-£;d,+ l>, (18.46) 


where a and 0 are two numbers. They are called the Clebsch-Gordon coefficients . 
Our next problem is to find out what they are. 

We can find out easily if we just remember that the deuteron is made up of a 
neutron and a proton, and write the deuteron states out more explicitly using the 
rules of Table 18-3. If we do that, the states listed in (18.42) then look as shown in 
Table 18-4. 

We want to form the four states of J — f , using the states in the table. 
But we already know the answer, because in Table 18-1 we have states of spin 
f formed from three spin one-half particles. The first state in Table 18-1 has 
\ J = f , M = +f) and it is | + + +), which — in our present notation — is the 
same as | e,+^; n,+^, p,+£), or the first state in Table 18-4. But this state is 
also the same as the first in the list of (18.42), confirming our statement in (1 8.45). 
The second line of Table 18-1 says — changing to our present notation — that 

I J = \ ; M = +£> = -2= {| e,+i; n,+£; p.-^) 

\/i 

+ | p.+i) + | e,-*; n,+£; p,+J)}. (18.47) 
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Table 18-4 

Angular momentum states for a deuterium atom 



The right side can evidently be put together from the two entries in the second line 
of Table 18-4 by taking V2/3 of the first term with \/l/3 of the second. That is, 
Eq. (18-47) is equivalent to 

| / = f, M = i) = V2/3 | e,+i; d,0) + VT/3 \ e,-*; d,l). (18.48) 

We have found our two Clebsch-Gordon coefficients a and ft in Eq. (18.46): 

a = V2/3, 0 = VX/3. (18.49) 

Following the same procedure we can find that 

I J = §, M = -i) = y/T/3 I e,+4; d - 1> + y/2 / 3 | e,-£; d,0). (18.50) 

And, also, of course, 

| J - §, M = -§) = |e,-^;d-l>. (18.51) 

These are the rules for the composition of spin 1 and spin \ to make a total / = §. 
We summarize (18.45), (18.48), and (18.50) in Table 18-5. 

We have, however, only four states here while the system we are considering 
has six possible states. Of the two states in the second line of (18.42) we have used 
only one linear combination to form | J = f , M = +£). There is another linear 
combination orthogonal to the one we have taken which also has M = +£, 
namely 

VT/3 I e,+*; d,0> - y/ljl | e d,+ l). (18.52) 

Table 18-5 

The J — f states of the deuterium atom 

| J = f. M = +§> = | e,+i; d,+l) 

| J = £ M = +£> = V2?3 | e,+i; d.0) + Vltt | e,-*; d,l) 

| J = I, M = -i) - VT73 I e,+i; d,-l> + y/ljl \ e d,0> 

| J = §> = |e,-£;d,-l> 
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Similarly, the two states in the third line of (18.42) can be combined to give two 
orthogonal states, each with M = —J. The one orthogonal to (18.52) is 

V273 | e,+i; d - 1) - VW I e,-4; d,0>. (18.53) 

These are the two remaining states. They have M = m e + m t i = =»=£; and 
must be the two states corresponding to / = So we have 

I /= i.M= 4) = a/T 73 I e,+4; d,0> - V2?3 | e,-fc d,+ l), 

(18.54) 

| J = hM= -*> = V273|e,+i;d -1) - Vi/3 | c i; d,0). 

We can verify that these two states do indeed behave like the states of a spin 
one-half object by writing out the deuterium parts in terms of the neutron and 
proton states — using Table 18-3. The first state in (18.53) is 

Vl/6{| e,+J; n,+£; p,—^) + | n,— p,+i)} 

— \/2/3 | e ,— \\ n,+|; p,+^), (18.55) 

which can also be written 

VT/3[\/l/2 {| n,+J; p,— -j) — | e,—^; n,+^; p,+^)} 

+ VV2 {| e,+i;n,-^;p,+|) - | e,-£: n,+£; p, +£>}]. 

(18.56) 

Now look at the terms in the first curly brackets, and think of the e and p taken 
together. Together they form a spin-zero state (see the bottom line of Table 18-3), 
and contribute no angular momentum. Only the neutron is left, so the whole of 
th e first curly bracket of (18.56) behaves under rotations like a neutron, namely 
as a state with J = M = +£. Following the same reasoning, we see that 
in the second curly bracket of (18.56) the electron and neutron team up to produce 
zero angular momentum, and only the proton contribution — with m p = \ — is 
left. The terms behave like an object with / = M = +£. So the whole ex- 
pression of (18.56) transforms like \J = +£, M = +^} as it should. The 
M = — \ state which corresponds to (18.57) can be written down (by changing 
the proper + ^’s to — J’s) to get 

VT/3[VT72 {| e,+£; n,-i; p,-£> - | e,-£; n,-±; p,+i)} 

+ \/l /2 {I e,+ 5 -; n,— \\ p,— 5 ) — | e,— ^;n,+ 5 -;p,— £)}]• 

(18.57) 

You can easily check that this is equal to the second line of (18.54), as it should be 
if the two terms of that pair are to be the two states of a spin one-half system. So our 
results are confirmed. A deuteron and an electron can exist in six spin states, four 
of which act like the states of a spin § object (Table 18-5) and two of which act 
like an object of spin one-half (18.54). 

The results of Table 18-5 and of Eq. (18.54) were obtained by making use of 
the fact that the deuteron is made up of a neutron and a proton. The truth of the 
equations does not depend on that special circumstance. For any spin-one object 
put together with any spin one-half object the composition laws (and the coeffi- 
cients) are the same. The set of equations in Table 18-5 means that if the co- 
ordinates are rotated about, say, the j^-axis — so that the states of the spin one-half 
particle and of the spin-one particle change according to Table 18-1/ and Table 
18-2— the linear combinations on the right-hand side will change in the proper 
way for a spin § object. Under the same rotation the states of (18.54) will 
change as the states of a spin one-half object. The results depend only on the 
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Table 18-6 


Composition of a spin one-half particle ( j a = 
and a spin-one particle (jb = l). 



rotation properties (that is, the spin states) of the two original particles but not 
in any way on the origins of their angular momenta. We have only made use of 
this fact to work out the formulas by choosing a special case in which one of the 
component parts is itself made up of two spin one-half particles in a symmetric 
state. We have put all our results together in Table 18-6, changing the notation 
“e” and “d” to “a” and “A” to emphasize the generality of the conclusions. 

Suppose we have the general problem of finding the states which can be 
formed when two objects of arbitrary spins are combined. Say one has j a (so its 
z-component m a runs over the 2j a + 1 values from — j a to +j a ) and the other has 
j b (with z-component m b running over the values from —j b to +j b ). The combined 
states are | a, m a \ b, m a ), and there are (2j a + 1)(2 j b + 1) different ones. Now 
what states of total spin J can be found? 

The total z-component of angular momentum M is equal to m a + m b , and 
the states can all be listed according to M [as in (18.42)]. The largest M is unique; 
it corresponds to m a = j a and m b = j b , and is, therefore, just j a + jb- That 
means that the largest total spin J is also equal to the sum j a + j b : 

J — (Af) max ~ ja + jb- 

For the first M value smaller than (Af) max , there are two states (either m a or m b 
is one unit less than its maximum). They must contribute one state to the set that 
goes with / = j a + jb , and the one left over will belong to a new set with J = 
ja + j b — 1* The next Af-value — the third from the top of the list — can be formed 
in three ways. (From m a = j a — 2, m b = j b \ from m a = j a — 1, m b = j b — 1; 
and from m a = y a , m b = j b — 2.) Two of these belong to groups already started 
above; the third tells us that states of / — j a + j b — 2 must also be included. 
This argument continues until we reach a stage where in our list we can no longer 
go one more step down in one of the m’s to make new states. 

Let j b be the smaller of j a and j b (if they are equal take either one); then only 
2 j b values of J are required — going in integer steps from j a + jb down to j a — j b . 
That is, when two objects of spin j a and j b are combined, the system can have a 
total angular momentum J equal to any one of the values 


J = 


ja + jb 
ja + jb ~ 1 
ja + jb ~ 2 


[I ja ~ jbl 


(18.58) 


(By writing | j a — j b \ instead of j a — j b we can avoid the extra admonition that 

ja > jb') 

For each of these J values there are the 2J + 1 states of different Af-values — 
with M going from +J to — Each of these is formed from linear combinations 
of the original states \ a, m a l b , m b ) with appropriate factors — the Clebsch-Gordon 
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coefficients for each particular term. We can consider that these coefficients give 
the “amount” of the state | j a , m a ;j b , m b ) which appears in the state | J, M). So 
each of the Clebsch-Gordon coefficients has, if you wish, six indices identifying 
its position in the formulas like those of Tables 18-3 and 18-6. That is, calling 
these coefficients C(J } M;j a , m a ;j b , m b ), we could express the equality of the 
second line of Table 18-6 by writing 

C(l+bb + 2 ; 1,0) = V2?3, 

2> ~b i>+i) = Vi/3. 

We will not calculate here the coefficients for any other special cases.f You 
can, however, find tables in many books. You might wish to try another special 
case for yourself. The next one to do would be the composition of two spin-one 
particles. We give just the final result in Table 18-7. 

These laws of the composition of angular momenta are very important in 
particle physics — where they have innumerable applications. Unfortunately, we 
have no time to look at more examples here. 

Table 18-7 



Added Note 1: Derivation of the rotation matrix! 

For those who would like to see the details, we work out here the general 
rotation matrix for a system with spin (total angular momentum) j. It is really not 
very important to work out the general case; once you have the idea, you can find 
the general results in tables in many books. On the other hand, after coming 
this far you might like to see that you can indeed understand even the very com- 
plicated formulas of quantum mechanics, such as Eq. (18,35), that come into the 
description of angular momentum. 


t A large part of the work is done now that we have the general rotation matrix Eq. 
(18.35). 

X The material of this appendix was originally included in the body of the lecture. 
We now feel that it is unnecessary to include such a detailed treatment of the general case. 
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We extend the arguments of Section 18-4 to a system with spin y, which we 
consider to be made up of 2 j spin one-half objects. The state with m = j would 
be | + + + •*• +) (with j plus signs). For m = j — 1, there will be 2 / terms 

like | + + b H ), | + + • • • H H), and so on. Let’s consider the 

general case in which there are r plusses and s minuses — with r + s = 2 j. Under 
a rotation about the z-axis each of the r plusses will contribute e +%4>j2 . The result 
is a phase change of i(r/2 — s/2)$. You see that 

m = (18.59) 

Just as for / = f , each state of definite m must be the linear combination with 
plus signs of all the states with the same r and s — that is, states corresponding to 
every possible arrangement which has r plusses and s minuses. We assume that 
you can figure out that there are (r + s)!/r!s! such arrangements. To normalize 
each state, we should divide the sum by the square root of this number. We can 
write 


(r + 5)! 

r!5! 



r 


+ 




with 


+ (all rearrangements of order)} 


J = 


r + 5 


m 



j,m) (18.60) 
(18.61) 


It will help our work if we now go to still another notation. Once we have 
defined the states by Eq. (18.60), the two numbers r and s define a state just as 
well as j and m. It will help us keep track of things if we write 

I j, m) = I ;>, (18.62) 


where, using the equalities of (18.67) 


r = j + m 9 s = j - m. 


Next, we would like to write Eq. (18.60) with a new special notation as 


I j> m) 


: ;> = 


(r + s)i 

rlsl 


1 + 1/2 


{|+H -) S }pe rm . 


(18.63) 


Note that we have changed the exponent of the factor in front to plus We do 
that because there are just N — (r + $)!//•!$! terms inside the curly brackets. 
Comparing (18.63) with (18.60) it is clear that 


{| +) r I ) S } perm 

is just a shorthand way of writing 

{! + + ■** ) + all rearrangements} 

_ 


where N is the number of different terms in the bracket. The reason that this 
notation is convenient is that each time we make a rotation, all of the plus signs 
contribute the same factor, so we get this factor to the rth power. Similarly, all 
together the 5 minus terms contribute a factor to the 5th power no matter what the 
sequence of the terms is. 

Now suppose we rotate our system by the angle 6 about the j>-axis. What we 
want is R y (0) \ r s ). When R y (B) operates on each | + ) it gives 

V)|+>= | + >C+|->S, (18.64) 

where C = cos 6/2 and S = sin 6/2. When R y (6) operates on each | — > it gives 


*,(*)!“> = l -> c - \+) s . 
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So what we want is 


*,(«) i :> 


■^T^r 1 ] 12 {(R ^ e) 1 +» r < / w*) I -» ] porm 

{| + )C + \ — )5) r (| — )C — | +)*S S }perm. 


(18.65) 


Now each binomial has to be expanded out to its appropriate power and the two 
expressions multiplied together. There will be terms with | + ) to all powers from 
zero to (r 4- s). Let’s look at all of the terms which have | +) to the r' power. 
They will appear always multiplied with | — ) to the s' power, where s' = 2j — r'. 
Suppose we collect all such terms. For each permutation they will have some 
numerical coefficient involving the factors of the binomial expansion as well as 
the factors Cand S. Suppose we call that factor A r >. Then Eq. (18.65) will look like 

*„(*) I a r > = £ { A r . | +> r ' | -) 8 '} porm . ( 18 . 66 ) 

r'— 0 


Now let’s say that we divide A r > by the factor [(/•' + s')!/r'!s'!] 1/2 and call the 
quotient B r >. Equation (18.66) is then equivalent to 


Ry (6) | 


r-j-s 

I) = E B r 


r'=0 


^ + ^ 1 1/2 
r'ls'l J 11 


+n-n P 


(18.67) 


(We could just say that this equation defines B r > by the requirement that (18.67) 
gives the same expression that appears in (18.65).) 

With this definition of B r > the remaining factors on the right-hand side of 
Eq. (18.67) are just the states | £). So we have that 

Ry(0) I = 2 B r , I s r :>, (18.68) 

r'=0 


with s' always equal to r + s — r'. This means, of course, that the coefficients 
B r are just the matrix elements we want, namely 

(l’\Ry(9)\ r s } = B r , (18.69) 

Now we just have to push through the algebra to find the various B r >. Com- 
paring (18.39) with (18.37) — and remembering that r* + s' = r + s — we see 
that B r > is just the coefficient of a r 'b s ' in the following expression: 

\ 7 i~) (ac + bSY(bC - asy. (18.70) 

It is now only a dirty job to make the expansions by the binomial theorem, and 
collect the terms with the given power of a and b. If you work it all out, you find 
that the coefficient of ci 'b s ' in (18.70) is 

1/2 (— 1 \kvr-r'+2kfis+r'-2k r\ ■?! 

[ r!s! \ } *(/* — /*' + k)\(r f — k)\ ' (s - k)\k\ * 

H8.71) 

The sum is to be taken over all integers k which give terms of zero or greater in the 
factorials. This expression is then the matrix element we wanted. 

Finally, we can return to our original notation in terms of j, m } and m' using 


r = j + m, r' = j + m', s = j - m, s' - j - m'. 


Making these substitutions, we get Eq. (18.34) in Section 18-4. 
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Added Note 2: Conservation of parity in photon emission 

In Section 1 of this chapter we considered the emission of light by an atom 
that goes from an excited state of spin 1 to a ground state of spin 0. If the excited 
state has its spin up (m = +1), it can emit a RHC photon along the 4- 2 -axis or 
a LHC photon along the — z-axis. Let’s call these two states of the photon | R uv ) 
and | L fin ). Neither of these states has a definite parity. Letting P be the parity 
operator, P ) Pup) — j Ldn) and P ] Ldn) = | P n p ). 

What about our earlier proof that an atom in a state of definite energy must 
have a definite parity, and our statement that parity is conserved in atomic proc- 
esses? Shouldn’t the final state in this problem (the state after the emission of a 
photon) have a definite parity? It does if we consider the complete final state 
which contains amplitudes for the emission photons into all sorts of angles. In 
Section 1 we chose to consider only a part of the complete final state. 

If we wish we can look only at final states that do have a definite parity. For 
example, consider a final state | \f/p) which has some amplitude a to be a RHC 
photon going along +z and some amplitude 0 to be a LHC photon going along 
— z. We can write 


| $f) = « i *up> + P I Ldn). (18.72) 

The parity operation on this state gives 

p I = a I Ldn) + 0 I R ap ). (18.73) 

This state will be =*= | \Pf) if j3 = a or if /3 = — a. So a final state of even parity is 

|^> = «{/?up) + | Id.)}, (18.74) 

and a state of odd parity is 

\* 7 ) = ~ |L d n)}. (18.75) 

Next, we wish to consider the decay of an excited state of odd parity to a 
ground state of even parity. If parity is to be conserved, the final state of the 
photon must have odd parity. It must be the state in (18.75). If the amplitude to 
find | R u p ) is a, the amplitude to find | Ldn) is —a. 

Now notice what happens when we perform a rotation of 180° about the 
y-axis. The initial excited state of the atom becomes an m = — 1 state (with no 
change in sign, according to Table 17-2). And the rotation of the final state gives 

*,(180°) [ *F> = a{\ *dn) - ] L up )} . (18.76) 

Comparing this equation with (18.75), you see that for the assumed parity of the 
final state, the amplitude to get a LHC photon along +z from the m = — 1 
initial state is the negative of the amplitude to get a RHC photon from the m = + 1 
initial state. This agrees with the result we found in Section 1. 
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19 


The Hydrogen Atom and 
The Periodic Table 


19-1 Schrodinger’s equation for the hydrogen atom 

The most dramatic success in the history of the quantum mechanics was the 
understanding of the details of the spectra of some simple atoms and the under- 
standing of the periodicities which are found in the table of chemical elements. 
In this chapter we will at last bring our quantum mechanics to the point of this 
important achievement, specifically to an understanding of the spectrum of the 
hydrogen atom. We will at the same time arrive at a qualitative explanation of the 
mysterious properties of the chemical elements. We will do this by studying in 
detail the behavior of the electron in a hydrogen atom — for the first time making 
a detailed calculation of a distribution-in-space according to the ideas we developed 
in Chapter 16. 

For a complete description of the hydrogen atom we should describe the mo- 
tions of both the proton and the electron. It is possible to do this in quantum 
mechanics in a way that is analogous to the classical idea of describing the motion 
of each particle relative to the center of gravity, but we will not do so. We will 
just discuss an approximation in which we consider the proton to be very heavy, 
so we can think of it as fixed at the center of the atom. 

We will make another approximation by forgetting that the electron has a 
spin and should be described by relativistic laws of mechanics. Some small cor- 
rections to our treatment will be required since we will be using the nonrelativistic 
Schrodinger equation and will disregard magnetic effects. Small magnetic effects 
occur because from the electron’s point-of-view the proton is a circulating charge 
which produces a magnetic field. In this field the electron will have a different 
energy with its spin up than with it down. The energy of the atom will be shifted 
a little bit from what we will calculate. We will ignore this small energy shift. 
Also we will imagine that the electron is just like a gyroscope moving around in 
space always keeping the same direction of spin. Since we will be considering a 
free atom in space the total angular momentum will be conserved. In our approxi- 
mation we will assume that the angular momentum of the electron spin stays con- 
stant, so all the rest of the angular momentum of the atom — what is usually called 
“orbital” angular momentum — will also be conserved. To an excellent approxi- 
mation the electron moves in the hydrogen atom like a particle without spin — the 
angular momentum of the motion is a constant. 

With these approximations the amplitude to find the electron at different 
places in space can be represented by a function of position in space and time. 
We let y//(x, y, z, t) be the amplitude to find the electron somewhere at the time t. 
According to the quantum mechanics the rate of change of this amplitude with 
time is given by the Hamiltonian operator working on the same function. From 
Chapter 16, 
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Here, m is the electron mass, and V(r) is the potential energy of the electron in the 
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electrostatic field of the proton. Taking V = 0 at large distances from the proton 
we can writef 


V 


e 

r 


The wave function ^ must then satisfy the equation 


ih 


dif/ 

Tt 



(19.3) 



Fig. 19-1. The spherical polar co- 
ordinates r, 0, 0 of the point P. 


We want to look for definite energy states, so we try to find solutions which 
have the form 

«Kr, 0 = r (, '* )E VW- (19.4) 


The function \p{r) must then be a solution of 

h 2 


2m 






(19.5) 


where E is some constant — the energy of the atom. 

Since the potential energy term depends only on the radius, it turns out to 
be much more convenient to solve this equation in polar coordinates rather than 
rectangular coordinates. The Laplacian is defined in rectangular coordinates by 

V 3 = -^ + il + il. 

dx 2 dy 2 dz 2 

We want to use instead the coordinates r, 0, <j> shown in Fig. 19-1. These 
coordinates are related to x, y , z by 

x — /* sin 0 cos <£; y = r sin 6 sin </>; z = r cos 0. 

It's a rather tedious mess to work through the algebra, but you can eventually 
show that for any function /(r) = /(r, 0, <£), 


V 2 /(r, U) = ~ (rf) + ^ l e (sin 6 |Q 


+ 


1 dj 


sin 2 0 d<t> 2 


(19.6) 


So in terms of the polar coordinates, the equation which is to be satisfied by 
yp(r> 0, <f > ) is 


1 d / . d\f\ 1 d 2 yp\ 2m ( „ e 2 \ . 

r dr* ^ + ?■ 2 (sin d 66 \ Sm 9 6d) + sin^ 6 6 <f> 2 } ~ h* \ r ) 


(19.7) 


19-2 Spherically symmetric solutions 

Let's first try to find some very simple function that satisfies the horrible 
equation in (19.7). Although the wave function ^ will, in general, depend on the 
angles 0 and <f> as well as on the radius /*, we can see whether there might be a special 
situation in which $ does not depend on the angles. For a wave function that 
doesn’t depend on the angles, none of the amplitudes will change in any way if 
you rotate the coordinate system. That means that all of the components of the 
angular momentum are zero. Such a ^ must correspond to a state whose total 
angular momentum is zero. (Actually, it is only the orbital angular momentum 
which is zero because we still have the spin of the electron, but we are ignoring 
that part.) A state with zero orbital angular momentum is called by a special name. 
It is called an “s-state” — you can remember “s for spherically symmetric.”! 


t As usual, e 2 = q 2 / 4 tt eo. 

X Since these special names are part of the common vocabulary of atomic physics, you 
will just have to learn them. We will help out by putting them together in a short “dic- 
tionary” later in the chapter. 
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Now if ^ is not going to depend on 6 and <f> then the entire Laplacian contains 
only the first term and Eq. (19.7) becomes much simpler: 


I i 2 _ 

r dr 2 


W) = - 



(19.8) 


Before you start to work on solving an equation like this, it’s a good idea to get 
rid of all excess constants like e 2 , m, and h 3 by making some scale changes. Then 
the algebra will be easier. If we make the following substitutions: 


r = 




and 


me 


T? me 

E= 2F € ’ 


then Eq. (19.8) becomes (after multiplying through by p) 


d\p*') 

dp 2 



(19.9) 

(19.10) 


(19.11) 


These scale changes mean that we are measuring the distance r and energy E as 
multiples of “natural” atomic units. That is, p = r/r Bi where r B = h 2 /me 2 , 
is called the “Bohr radius” and is about 0.528 angstroms. Similarly, e = E/E B , 
with Ex = me 4 /2h 2 . This energy is called the “Rydberg” and is about 13.6 
electron volts. 

Since the product p\p appears on both sides, it is convenient to work with it 
rather than with \f/ itself. Letting 




(19.12) 


we have the more simple-looking equation 


fl 

dp 2 



(19.13) 


Now we have to find some function / which satisfies Eq. (19.13) — in other 
words, we just have to solve a differential equation. Unfortunately, there is no 
very useful, general method for solving any given differential equation. You just 
have to fiddle around. Our equation is not easy, but people have found that it 
can be solved by the following procedure. First, you replace /, which is some 
function of p, by a product of two functions 


f(p) = e~ ap g(p). 


(19.14) 


This just means that you are factoring e~ ap out of /(p). You can certainly do that 
for any /(p) at all. This just shifts our problem to finding the right function g(p). 
Sticking (19.14) into (19.13), we get the following equation for g: 

0- 2 “! + G +<+ “ 2 )*“°- (i9i5) 

Since we are free to choose a, let's make 

a 2 = -e, (19.16) 

and get 

$- 2 “i < i9 - i7 > 

You may think we are no better off than we were at Eq. (19.13), but the happy 
thing about our new equation is that it can be solved easily in terms of a power 
series in p (It is possible, in principle, to solve (19.13) that way too, but it is 
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much harder.) We are saying that Eq. (19.17) can be satisfied by some g(p) which 
can be written as a series, 

8(p) = E (* 9 - 18 ) 

1 


in which the au are constant coefficients. Now all we have to do is find a suitable 
infinite set of coefficients! Let’s check to see that such a solution will work. The 
first derivative of this g(p) is 


and the second derivative is 


dg 

dp 


= E a k k P \ 

k = 1 


*8 

dp 2 


= E ° kk (k ~ Op 4 2 - 

k= 1 


Using these expressions in (19.17) we have 


E k(k - 1 )a k p k - 2 - E 2«W :_1 + E W _1 = 0. (19.19) 

A = 1 $=1 *=1 

It’s not obvious that we have succeeded; but we forge onward. It will all look 
better if we replace the first sum by an equivalent. Since the first term of the sum 
is zero, we can replace each k by k + 1 without changing anything in the infinite 
series; with this change the first sum can equally well be written as 


E (* + 1 )*<**+ ip* 1 - 


Now we can put all the sums together to get 

OO 

E [(* + l)ka k+1 - 2aka k + 2 a^ 1 = 0. (19.20) 

This power series must vanish for all possible values of p. It can do that only 
if the coefficient of each power of p is separately zero. We will have a solution 
for the hydrogen atom if we can find a set a * for which 


(k + 1 )ka k +! - 2 (ak - 1 )a k - 0 (19.21) 


for all A: > 1. That is certainly easy to arrange. Pick any a x you like. Then 
generate all of the other coefficients from 




2 (ak ~ 1) 
k{k T" 1 ) 


a k . 


(19.22) 


With this you will get a 2 , a 3 , a 4 , and so on, and each pair will certainly satisfy 
(19.21). We get a series for g(p) which satisfies (19.17). With it we can make a 
^ that satisfies Schrodinger’s equation. Notice that the solutions depend on the 
assumed energy (through a), but for each value of e, there is a corresponding series. 

We have a solution, but what does it represent physically? We can get an 
idea by seeing what happens far from the proton — for large values of p. Out there, 
the high-order terms of the series are the most important, so we should look at 
what happens for large k. When k » 1, Eq. (19.22) is approximately the same as 

2 a 

i = a k , 

which means that 

a k+1 ^^~f- ( 19 . 23 ) 


But these are just the coefficients of the series for e +2ap . The function of g is a 
rapidly increasing exponential. Even coupled with e~ ap to produce /(p) — see 
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Eq. (19.14) — it still gives a solution for /(p) which goes like e ap for large p. We 
have found a mathematical solution but not a physical one. It represents a situa- 
tion in which the electron is least likely to be near the proton! It is always more 
likely to be found at a very large radius p. A wave function for a bound electron 
must go to zero for large p. 

We have to think whether there is some way to beat the game, and there is. 
Observe! If it just happened by luck that a were equal to 1//2, where n is any 
integer, then Eq. (19.22) would make a n + 1 = 0. All higher terms would also be 
zero. We wouldn’t have an infinite series but a finite polynomial. Any polynomial 
increases more slowly than e ap , so the term e~ ap will eventually beat it down, and 
the function / will go to zero for large p. The only bound-state solutions are those 
for which a = 1 /«, with n = 1, 2, 3, 4, and so on. 

Looking back to Eq. (19.16), we see that the bound-state solutions to the 
spherically symmetric wave equation can exist only when 

,111 1 

C — I — ) — 5 5 • • < 5 5 • • • 

’4 9 16 ri> 

The allowed energies are just these fractions times the Rydberg, Er = me 4 /2h 2 , 
or the energy of the nth energy level is 

£„=-£*!• ( 19 . 24 ) 


There is, incidentally, nothing mysterious about negative numbers for the energy. 
The energies are negative because when we chose to write V = — e 2 /r y we picked 
our zero point as the energy of an electron located far from the proton. When it 
is close to the proton, its energy is less, so somewhat below zero. The energy is 
lowest (most negative) for n — 1, and increases toward zero with increasing n. 

Before the discovery of quantum mechanics, it was known from experimental 
studies of the spectrum of hydrogen that the energy levels could be described by 
Eq. (19.24), where Er was found from the observations to be about 13.6 electron 
volts. Bohr then devised a model which gave the same equation and predicted 
that E r should be me 4 /2h 2 . But it was the first great success of the Schrodinger 
theory that it could reproduce this result from a basic equation of motion for the 
electron. 

Now that we have solved our first atom, let’s look at the nature of the solution 
we got. Pulling all the pieces together, each solution looks like this: 


where 


and 



—pin 

-r- gn(p), 
H 


Snip) = a k p k 


k = 1 


&k + 1 


2 (k/n - 1) 
k(k + 1) 




(19.25) 

(19.26) 

(19.27) 


So long as we are mainly interested in the relative probabilities of finding the 
electron at various places we can pick any number we wish for ai. We may as well 
set a 1 — 1. (People often choose ai so that the wave function is “normalized,” 
that is, so that the integrated probability of finding the electron anywhere in the 
atom is equal to 1 . We have no need to do that just now.) 

For the lowest energy state, n = 1, and 


*iG>) - (19.28) 

For a hydrogen atom in its ground (lowest-energy) state, the amplitude to find the 
electron at any point drops off exponentially with the distance from the proton. 
It is most likely to be found right at the proton, and the characteristic spreading 
distance is about one unit in p, or about one Bohr radius, 
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Putting n = 2 gives the next higher level. The wave function for this state 
will have two terms. It is 


+2(P) = (l - ()e~ pl2 . 

The wave function for the next level is 

^(p) = (l - y + TfP a ) e ~ m 


(19.29) 


(19.30) 


The wave functions for these first three levels are plotted in Fig. 19-2. You can 
see the general trend All of the wave functions approach zero rapidly for large 
p after oscillating a few times. In fact, the number of “bumps” is just equal to 
n — or, if you prefer, the number of zero-crossings of is n — 1. 



19-3 States with an angular dependence 

In the states described by the ^ n (r) we have found that the probability ampli- 
tude for finding the electron is spherically symmetric — depending only on r, the 
distance for the proton. Such states have zero orbital angular momentum. We 
should now inquire about states which may have some angular dependences. 

We could, if we wished, just investigate the strictly mathematical problem of 
finding the functions of r, 6 , and 4> which satisfy the differential equation (19.7) — 
putting in the additional physical conditions that the only acceptable functions 
are ones which go to zero for large r. You will find this done in many books. 
We are going to take a short cut by using the knowledge we already have about 
how amplitudes depend on angles in space. 

The hydrogen atom in any particular state is a particle with a certain “spin” 
j — the quantum number of the total angular momentum. Part of this spin comes 
from the electron’s intrinsic spin, and part from the electron’s motion. Since 
each of these two components acts independently (to an excellent approximation) 
we will again ignore the spin part and think only about the “orbital” angular 
momentum. This orbital motion behaves, however, just like a spin. For example, 
if the orbital quantum number is /, the z-component of angular momentum can 
be /, / — 1, / — 2, . . . , — (We are, as usual, measuring in units of h.) Also, 
all the rotation matrices and other properties we have worked out still apply 
(From now on we will really ignore the electron’s spin; when we speak of “angular 
momentum” we will mean only the orbital part.) 

Since the potential V in which the electron moves depends only on r and not 
on 0 or </>, the Hamiltonian is symmetric under all rotations. It follows that the 
angular momentum and all its components are conserved. (This is true for motion 
in any “central field” — one which depends only on r — so is not a special feature of 
the Coulomb e 2 /r potential ) 
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Now let’s think of some possible state of the electron; its internal angular 
structure will be characterized by the quantum number /. Depending on the 
“orientation” of the total angular momentum with respect to the z-axis, the 
z-component of angular momentum will be m, which is one of the 21 + 1 possi- 
bilities between +/ and — /. Let’s say m = 1. With what amplitude will the elec- 
tron be found on the z-axis at some distance r? Zero. An electron on the z-axis 
cannot have any orbital angular momentum around that axis. Alright, suppose 
m is zero, then there can be some nonzero amplitude to find the electron at each 
distance from the proton. We’ll call this amplitude Fj(r). It is the amplitude to 
find the electron at the distance r up along the z-axis, when the atom is in the 
state | /, 0), by which we mean orbital spin / and z-component m = 0. 

If we know Fi(r ) everything is known. For any state | /, m), we know the 
amplitude to find the electron anyw/iere in the atom. How? Watch. Suppose 

we have the atom in the state | /, m), what is the amplitude to find the electron at 
the angle 8 , <f> and the distance r from the origin? Put a new z-axis, say z', at that 
angle (see Fig. 19-3), and ask what is the amplitude that the electron will be at 
the distance r along the new axis z'? We know that it cannot be found along z' 
unless its z'-component of angular momentum, say m f , is zero. When m f is zero, 
however, the amplitude to find the electron along z' is Fi(r). Therefore, the result 
is the product of two factors. The first is the amplitude that an atom in the state 
| /, m) along the z-axis will be in the state | /, m f = 0) with respect to the z'- axis. 
Multiply that amplitude by Fi(r) and you have the amplitude ^, m (r) to find the 
electron at (r, 8 , </>) with respect to the original axes. 

Let’s write it out. We have worked out earlier the transformation matrices 
for rotations. To go from the frame x, y , z to the frame x' } /, z' of Fig. 19-3, 
we can rotate first around the z-axis by the angle <£, and then rotate about the new 
y-axis (/) by the angle 8. This combined rotation is the product 

Ry(8)R z (4>). 

The amplitude to find the state /, m' = 0 after the rotation is 

(h 0 | Ry(<t>)R z (<i>) | /, m). (19.31) 

Our result, then, is 

*i.»(r) = (/, 0 | Ry(8)Rz(<f>) I /, m)Fi(r). (19.32) 



Fig. 1 9-3. The point (r, 6 , <p) is on 
the z'-axi$ of the x'yV coordinate frame. 


The orbital motion can have only integral values of /. (If the electron can be 
found anywhere at r ^ 0, there is some amplitude to have m — 0 in that direction. 
And m = 0 states exist only for integral spins.) The rotation matrices for / = 1 
are given in Table 17-2. For larger / you can use the general formulas we worked 
out in Chapter 18. The matrices for R z (<t>) and R y (8) appear separately, but you 
know how to combine them. For the general case you would start with the state 
| /, m) and operate with R z (4>) to get the new state R z (<f>) \ /, m). Then you operate 
on this state with R y (8) to get the state R y (8)R z (<t>) \ /, m) (which is just e im4> \ /, m)). 
Multiplying by (/, 0 | gives the matrix element (19.31). 

The matrix elements of the rotation operation are algebraic functions of 8 
and 4>. The particular functions which appear in (19.31) also show up in many 
kinds of problems which involve waves in spherical geometries and so has been 
given a special name. Not everyone uses the same convention; but one of the most 
common ones is 

</, 0 | R y (8)R z (<t>) \l 9 m) = a Y hm (8> 0). (19.33) 


The functions T^ m (0, <£) are called the spherical harmonics, and a is just a numerical 
factor which depends on the definition chosen for For the usual definition, 

0 - VsTTl ■ <19 M) 

With this notation, the hydrogen wave functions can be written 


( 19 . 35 ) 
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il.mir) = Yi t m(0, <j>)Fi(r). 



The angle functions T/ m (0, ^) are important not only in many quantum- 
mechanical problems, but also in many areas of classical physics in which the V 2 
operator appears, such as electromagnetism. As another example of their use in 
quantum mechanics, consider the disintegration of an excited state of Ne 20 
(such as we discussed in the last chapter) which decays by emitting an a-particle 
and going into O 10 : 


Ne 2u * -> 0 lh + He 4 . 



state of Ne 20 . 


Suppose that the excited state has some spin / (necessarily an integer) and that the 
2 -component of angular momentum is m. We might now ask the following- 
given / and m, what is the amplitude that we will find the a-particle going off in a 
direction which makes the angle 6 with respect to the z-axis and the angle <j> with 
respect to the xz-plane — as shown in Fig. 19-4. 

To solve this problem we make, first, the following observation. A decay in 
which the a-particle goes straight up along z must come from a state with m = 0. 
This is so because both O 10 and the a-particle have spin zero, and because their 
motion cannot have any angular momentum about the z-axis Let's call this 
amplitude a (per unit solid angle). Then, to find the amplitude for a decay at the 
arbitrary angle of Fig. 19-4, all we need to know is what amplitude the given initial 
state has zero angular momentum about the decay direction. The amplitude for 
the decay at d and 4> is then a times the amplitude that a state | /, m) with respect 
to the z-axis will be in the state | /, 0) with respect to z* — the decay direction. This 
latter amplitude is just what we have written in (19.31). The probability to see the 
a-particle at 0, <j> is 

P(d,<j>) = a 2 | </, 0 | Ry(d)R z (<l>) | /, m)\ 2 . 

As an example, consider an initial state with / = 1 and various values of m. 
From Table 17-2 we know the necessary amplitudes. They are 


< 1 , 0 | | 1 ,+ 1 > 
(1,0 I R V (6)U4>) I 1,0) 

<1,0 I RyWRzte) I 1, -1) 


— sin 0<? , 

V2 

cos e , 

— sin de~ t<p 

V2 


(19.36) 


These are the three possible angular distribution amplitudes — depending on the 
m-value of the initial nucleus. 

Amplitudes such as the ones in (19.36) appear so often and are sufficiently 
important that they are given several names. If the angular distribution amplitude 
is proportional to any one of the three functions or any linear combination of them, 
we say, “The system has an orbital angular momentum of one.” Or we may say, 
“The Ne 20 * emits a p - wave a-particle.” Or we say, “The a-particle is emitted in 
an / = 1 state.” Because there are so many ways of saying the same thing it is 
useful to have a dictionary. If you are going to understand what other physicists 
are talking about, you will just have to memorize the language. In Table 19-1 
we give a dictionary of orbital angular momentum. 

If the orbital angular momentum is zero, then there is no change when you 
rotate the coordinate system and there is no variation with angle — the “dependence” 
on angle is as a constant, say 1. This is also called an “s-state”, and there is only 
one such state — as far as angular dependence is concerned. If the orbital angular 
momentum is 1, then the amplitude of the angular variation may be any one of the 
three functions given — depending on the value of m — or it may be a linear combina- 
tion. These are called ‘/-states,” and there are three of them. If the orbital angular 
momentum is 2 then there are the five functions shown. Any linear combination 
is called an ‘7 = 2,” or a “c/-wave” amplitude. Now you can immediately guess 
what the next letter is— what should come after 5 , /?, dl Well, of course,/, g , /?, 
and so on down the alphabet! The letters don’t mean anything. (They did once 
mean something— they meant “sharp” lines, “principal” lines, “diffuse” lines and 
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Table 19-1 


Dictionary of orbital angular momentum 

(/ = j = an integer) 


Orbital 

angular 

momentum, 

/ 


0 


2 - 

component, 


0 

+ 1 
0 

-1 

+ 2 

+ 1 
0 

-1 


Angular dependence 
of amplitudes 


i- sin 0 e 1 

V2 

cos 0 
sin 0 e~‘ 

V2 


sin 2 0 e 2l<i> 
sin 0 cos 0 e i<p 


\/6 
4 

a/6 
2 

4(3 cos 2 0-1) 

— sin 0 cos 0 e~ l<b 
2 


V6 


i 2 0 


I, 0 I /?#)/?.(</>) I I, rn 

= P v l l {co%e)e' m4 ’ 


Name 


Number of 
states 


21 + 1 


Orbital 

parity 


+ 


+ 


(-i y 


“fundamental” lines of the optical spectra of atoms. But those were in the days 
when people did not know where the lines came from. After / there were no 
special names, so we now just continue with g , h, and so on.) 

The angular functions in the table go by several names — and are sometimes 
defined with slightly different conventions about the numerical factors that appear 
out in front. Sometimes they are called “spherical harmonics,” and written as 
Yi >m (0, <t > ). Sometimes they are written Pf 1 (cos 0)e m</) , and if m = 0, simply as 
Pi(cos 0). The functions Pi ( cos 0) are called the “Legendre polynomials" in 
cos 0, and the functions P™(cos 0) are called the “associated Legendre functions." 
You will find tables of these functions in many books. 

Notice, incidentally, that all the functions for a given / have the property that 
that they have the same parity — for odd / they change sign under an inversion and 
for even / they don’t change. So we can write that the parity of a state of orbital 
angular momentum / is (— 1)*. 

As we have seen, these angular distributions may refer to a nuclear disintegra- 
tion or some other process, or to the distribution of the amplitude to find an elec- 
tron at some place in the hydrogen atom. For instance, if an electron is in a /7-state 
(/ = 1) the amplitude to find it can depend on the angle in many possible ways — 
but all are linear combinations of the three functions for / = 1 in Table 19-1. 
Let’s take the case cos 0. That’s interesting. That means that the amplitude is 
positive, say, in the upper part (0 < w/2), is negative in the lower part (0 > x/2), 
and is zero when 0 is 90°. Squaring this amplitude we see that the probability of 
finding the electron varies with 0 as shown in Fig. 19-5 — and is independent of <t> 
This angular distribution is responsible for the fact that in molecular binding the 
attraction of an electron in an / = 1 state for another atom depends on direction — 
it is the origin of the directed valences of chemical attraction. 



Fig. 19-5. A polar graph of cos 2 0, 
which is the relative probability of finding 
an electron at various angles from the 
z-axis (for a given r) in an atomic state 
with / = 1 and m — 0. 
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19-4 The general solution for hydrogen 

In Eq. (19.35) we have written the wave functions for the hydrogen atom as 


h,m(r) = Y hm {6, <t>)Fi(r). 


(19.37) 


These wave functions must be solutions of the differential equation (19.7). Let’s 
see what that means. Put (19.37) into (19.7); you get 



H* W + 


F, 

r 2 sin 0 



Fi d 2 Y t , m 
r 2 sin 2 B d<l> 2 


Now multiply through by r 2 /Fi and rearrange terms. The result is 

1 d 2 Y t , m 


(19.38) 


1 

sin 8 dB \ S 6 dd / 


+ 


sin 2 8 d<t> 2 


1 cP 


Fi [r dr 2 ^ rFl) + h 2 


( £+ t)I. 


(19.39) 


The left-hand side of this equation depends on 0 and but not on r. No matter 
what value we choose for r , the left side doesn’t change. This must also be true 
for the right-hand side. Although the quantity in the square brackets has r’s all 
over the place, the whole quantity cannot depend on r, otherwise we wouldn’t 
have an equation good for all r. As you can see, the bracket also does not depend 
on 0 or <f>. It must be some constant. Its value may well depend on the /-value of 
the state we are studying, since the function F% must be the one appropriate to that 
state; we’ll call the constant A/. Equation (19.35) is therefore equivalent to two 
equations: 


1 d_ 
sin 6 dd 




+ 


1 5 2 

sin 2 8 dd 2 




(19.40) 


Now look at what we’ve done. For any state described by / and m, we know 
the functions Yi tfa ; we can use Eq. (19.40) to determine the constant Ki . Putting 
Ki into Eq. (19.41) we have a differential equation for the function F t (r). If we 
can solve that equation for Fj(r), we have all of the pieces to put into (19.37) to 
give \p(r). 

What is Kil First, notice that it must be the same for all m (which go with a 
particular /), so we can pick any m we want for Yi tm and plug it into (19.40) to 
solve for Ki. Perhaps the easiest one to use is Yij. From Eq. (18.24), 

*.(*) I U> = e*+ 1 1, 1). (19.42) 

The matrix element for R y (6 ) is also quite simple: 

(l,0\R u (6)\l,l) = b (sin 9) 1 , (19.43) 

where b is some number.f Combining the two, we obtain 

Y it i oc e^sin*0. (19.44) 


t You can with some work show that this comes out of Eq. (18.35), but it is also easy 
to work out from first principles following the ideas of Section 18-4. A state | /, /) can 
be made out of 21 spin one-half particles all with spins up; while the state | /, 0) would 
have / up and / down. Under the rotation the amplitude that an up-spin remains up 
is cos 6/2, and that an up-spin goes down is sin 0/2. We are asking for the amplitude 
that l up-spins stay up, while the other / up-spins go down. The amplitude for that is 
(cos 0/2 sin 0/ 2) 1 which is the same as sin 1 0. 
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Putting this function into (19.40) gives 


Ki = /(/+ 1 ). 


(19.45) 


Now that we have determined K u Eq. ( 19 . 41 ) tells us about the radial function 
Fi(r). It is, of course, just the Schrodinger equation with the angular part replaced 
by its equivalent K t Fi/r 2 . Let’s rewrite (19 41) in the form we had in Eq (19.8), 
as follows: 


1 

r dr 2 


(rFd 



/(/ + \)h 2 } 
2 mr 2 I 


(19.46) 


A mysterious term has been added to the potential energy. Although we got this 
term by some mathematical shenanigan, it has a simple physical origin. We can 
give you an idea about where it comes from in terms of a semi-classical argument. 
Then perhaps you will not find it quite so mysterious. 

Think of a classical particle moving around some center of force. The total 
energy is conserved and is the sum of the potential and kinetic energies 

U — V(r) + \mv 2 = constant. 


In general, v can be resolved into a radial component v r and a tangential compo- 
nent rB\ then 


v 2 r + (r6)\ 


Now the angular momentum mr 2 B is also conserved; say it is equal to L. We can 
then write 

mr 2 B = L , or rd = — > 

mr 


and the energy is 


U = %mv 2 r + V(r) + 


L 2 

2mr 2 


If there were no angular momentum we would have just the first two terms. 
Adding the angular momentum L does to the energy just what adding a term 
L 2 /2mr 2 to the potential energy would do. But this is almost exactly the extra 
term in (19 46) The only difference is that /(/ + 1 )h appears for the angular 
momentum instead of l 2 h 2 as we might expect. But we have seen before (for ex- 
ample, Volume II, Section 34-7)f that this is just the substitution that is usually 
required to make a quasi-classical argument agree with a correct quantum- 
mechanical calculation. We can, then, understand the new term as a “pseudo- 
potential” which gives the “centrifugal force” term that appears in the equations 
of radial motion for a rotating system. (See the discussion of “pseudo-forces” in 
Volume I, Section 12—5.) 

We are now ready to solve Eq. (19,46) for Fi(r). It is very much like Eq. 
(19.8), so the same technique will work again. Everything goes as before until 
you get to Eq. (19.19) which will have the additional term 

-/(/ + i) a * p L ~ 2 - 09.47) 

k= i 

This term can also be written as 

— /(/ + l)|y - ^2 (19.48) 


(We have taken out the first term and then shifted the running index k down 
by 1.) Instead of Eq. (19.20) we have 


Y, [{^(^ + 0 — Kl + l)}flfc+i — 2 (ak — l)aif]p k 1 

k=1 - W_±J>! = o 

P 


(19.49) 


t See Appendix to this volume. 
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There is only one term in p~\ so it must be zero. The coefficient ci\ must be zero 
(unless / = 0 and we have our previous solution). Each of the other terms is 
made zero by having the square bracket come out zero for every k . This condition 
replaces Eq. (19.21) by 

2(ak — 1) .... 

° k+1 k(k +1)- /(/ + 1) ° k ' (19 ' 50) 

This is the only significant change from the spherically symmetric case. 

As before the series must terminate if we are to have solutions which can 
represent bound electrons. The series will end at k = n if an = 1. We get again 
the same condition on a, that it must be equal to 1 /«, where n is some integer. 
However, Eq. (19.50) also gives a new restriction. The index k cannot be equal to 
/, the denominator becomes zero and ai + 1 is infinite. That is, since a x = 0, Eq. 
(19.50) implies that all successive a* are zero until we get to a* +1 , which can be 
nonzero. This means that k must start at / + 1 and end at n. 

Our final result is that for any / there are many possible solutions which we 
can call F n j where «>/+!. Each solution has the energy 



The wave function for the state of this energy with the angular quantum numbers 
I and m is 

tn.l.m = Yl,m(e,<t>)Fn,l(p), (19.52) 

with 

pFn.lip ) = e~“ p £ ««>*■ (19.53) 

fc=W+l 



Fig. 1 9-6. Rough sketches showing 
the general nature of some of the hydro- 
gen wave functions. The shaded regions 
show where the amplitudes are large. 
The plus and minus signs show the relative 
sign of the amplitude in each region. 


The coefficients ak are obtained from (19.50). We have, finally, a complete de- 
scription of the states of a hydrogen atom. 

19-5 The hydrogen wave functions 

Let’s review what we have discovered. The states which satisfy Schrodinger’s 
equation for an electron in a Coulomb field are characterized by three quantum 
numbers n , /, m , all integers. The angular distribution of the electron amplitude 
can have only certain forms which we call Yi m . They are labeled by L the quantum 
number of total angular momentum , and m , the “ magnetic ” quantum number , 
which can range from — / to +/. For each angular configuration, various possible 
radial distributions F n ,i{r) of the electron amplitude are possible; they are labeled 
by the principle quantum number n — which can range from / + 1 to cc . The energy 
of the state depends only on n y and increases with increasing n. 

The lowest energy, or ground, state is an s-state. It has l = 0, n = 0, and 
m = 0. It is a “nondegenerate” state — there is only one with this energy, and its 
wave function is spherically symmetric. The amplitude to find the electron is a 
maximum at the center, and falls off mon^tonically with increasing distance from 
the center. We can visualize the electron amplitude as a blob as shown in Fig. 
19-6(a). 

There are other 5-states with higher energies, for n = 2, 3, 4, . . . For each 
energy there is only one version ( m = 0), and they are all spherically symmetric. 
These states have amplitudes which alternate in sign one or more times with 
increasing r. There are n — 1 spherical nodal surfaces — the places where ^ goes 
through zero. The 2s-state (l — 0,n — 2), for example, will look as sketched in 
Fig. 19— 6(b). (The dark areas indicate regions where the amplitude is large, and 
the plus and minus signs indicate the relative phases of the amplitude.) The energy 
levels of the 5-states are shown in the first column of Fig. 19-7, 

Then there are the p-states — with / = 1. For each which must be 2 or 
greater, there are three states of the same energy, one each for m = +1, m = 0, 
and m = — 1. The energy levels are as shown in Fig. 19-7. The angular de- 
pendences of these states are given in Table 19-1. For instance, for m = 0, if the 
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amplitude is positive for 9 near zero, it will be negative for 9 near 180°. There is 
a nodal plane coincident with the xj;-plane. For n > 2 there are also spherical 
nodes. The n = 2, m = 0 amplitude is sketched in Fig. 19— 6(c), and the n = 3, 
m = 0 wave function is sketched in Fig. 1 9— 6(d). 

You might think that since m represents a kind of “orientation” in space, 
there should be similar distributions with the peaks of amplitude along the x-axis 
or along the ,y-axis. Are these perhaps the m = +1 and m — —1 states? No. 
But since we have three states with equal energies, any linear combinations of the 
three will also be stationary states of the same energy. It turns out that the “x”- 
state — which corresponds to the “z”-state, or m ~ 0 state, of Fig. 19-6(c) — is 
a linear combination of the m = +1 and m = — 1 states. The corresponding 
“y’-state is another combination. Specifically, we mean that 


VET 


0 


And so on 


4s 


4p Ad 


4f 


-4 


“z” = I 1,0), 

t t x i9 _ I 1> ~f~ 1 ) + | 1) — 1) s 
\/2 

« » = | 1> +1) ~ | 1> — 1) 
iV 2 


3s 


3p 3d 


2s 2p 


These states all look the same when referred to their particular axes. 

The ^-states (/ — 2) have five possible values of m for each energy, the lowest 
energy has n = 3. The levels go as shown in Fig. 19-7. The angular dependences 
get more complicated. For instance the m = 0 states have two conical nodes, so 
the wave function reverses phase from + , to — , to + as you go around from the 
north pole to the south pole. The rough form of the amplitude is sketched in (e) 
and (f) of Fig. 19-6 for the m = 0 states with n — 3 and n — 4. Again, the 
larger «’s have spherical nodes. 

We will not try to describe any more of the possible states. You will find the 
hydrogen wave functions described in more detail in many books. Two good 
references are L. Pauling and E. B. Wilson, Introduction to Quantum Mechanics , 
McGraw-Hill (1935); and R. B. Leighton, Principles of Modern Physics , McGraw- 
Hill (1959). You will find in them graphs of some of the functions and pictorial 
representations of many states. 

We would like to mention one particular feature of the wave functions for 
higher /: for / > 0 the amplitudes are zero at the center. That is not surprising, 
since it’s hard for an electron to have angular momentum when its radius arm is 
very small. For this reason, the higher the /, the more the amplitudes are “pushed 
away” from the center. If you look at the way the radial functions F(r) vary for 
small r, you find from (19.53) that 
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Fig. 19-7. The e n ergy level diagram 
for hydrogen. 


F n ,i(r) « r*. 

Such a dependence on r means that for larger /’ s you have to go farther from r — 0 
before you get an appreciable amplitude. This behavior is, incidentally, determined 
by the centrifugal force term in the radial equation, so the same thing will apply 
for any potential that varies slower than 1/r 2 for small r — which most atomic 
potentials do. 


19-6 The periodic table 

We would like now to apply the theory of the hydrogen atom in an approxi- 
mate way to get some understanding of the chemist’s periodic table of the elements. 
For an element with atomic number Z there are Z electrons held together by the 
electric attraction of the nucleus but with mutual repulsion of the electrons. To 
get an exact solution we would have to solve Schrodinger’s equation for Z electrons 
in a Coulomb field. For helium the equation is 
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where V? is a Laplacian which operates on r u the coordinate of one electron; 
V| operates on r 2 ; and r i 2 = K — r 2 |- (We are again neglecting the spin of the 
electrons.) To find the stationary states and energy levels we would have to find 
solutions of the form 

lA = f(rur 2 )e- umist . 

The geometrical dependence is contained in /, which is a function of six variables 
— the simultaneous positions of the two electrons. No one has found an analytic 
solution, although solutions for the lowest energy states have been obtained by 
numerical methods. 

With 3, 4, or 5 electrons it is hopeless to try to obtain exact solutions, and it is 
going too far to say that quantum mechanics has given a precise understanding of 
the periodic table. It is possible, however, even with a sloppy approximation— and 
some fixing — to understand, at least qualitatively, many chemical properties 
which show up in the periodic table. 

The chemical properties of atoms are determined primarily by their lowest 
energy states. We can use the following approximate theory to find these states 
and their energies. First, we neglect the electron spin, except that we adopt the 
exclusion principle and say that any particular electronic state can be occupied 
by only one electron. This means that any particular orbital configuration can 
have up to two electrons — one with spin up, the other with spin down. Next we 
disregard the details of the interactions between the electrons in our first approxi- 
mation, and say that each electron moves in a central field which is the combined 
field of the nucleus and all the other electrons. For neon, which has 10 electrons, 
we say that one electron sees an average potential due to the nucleus plus the other 
nine electrons. We imagine then that in the Schrodinger equation for each electron 
we put a V(r ) which is a 1 /r field modified by a spherically symmetric charge 
density coming from the other electrons. 

In this model each electron acts like an independent particle. The angular 
dependence of its wave function will be just the same as the ones we had for the 
hydrogen atom. There will be s-states, /?- states, and so on; and they will have the 
various possible ra-values. Since V(r) no longer goes as 1/r, the radial part of the 
wave functions will be somewhat different, but it will be qualitatively the same, so 
we will have the same radial quantum numbers, n. The energies of the states will 
also be somewhat different. 

H 


With these ideas, let’s see what we get. The ground state of hydrogen has 
l — m = 0 and n — 1 ; we say the electron configuration is Is. The energy is 
— 13.6 ev. This means that it takes 13.6 electron volts to pull the electron off the 
atom. We call this the “ionization energy”, W T . A large ionization energy means 
that it is harder to pull the electron off and, in general, that the material is chem- 
ically less active. 

He 


Now take helium. Both electrons can be in the same lowest state (one spin 
up and the other spin down). In this lowest state the electron moves in a potential 
which is for small r like a Coulomb field for z — 2 and for large r like a Coulomb 
field for z = 1. The result is a “hydrogen-like” Is state with a somewhat lower 
energy. Both electrons occupy identical Is states (/ = 0, m = 0). The observed 
ionization energy (to remove one electron) is 24.6 electron volts. Since the Is 
“shell” is now filled — we allow only two electrons — there is practically no tendency 
for an electron to be attracted from another atom. Helium is chemically inert. 

Li 


The lithium nucleus has a charge of 3. The electron states will again be hy- 
drogen-like, and the three electrons will occupy the lowest three energy levels. 
Two will go into Is states and the third will go into an n = 2 state. But with / = 0 
or / = 1 ? In hydrogen these states have the same energy, but in other atoms they 
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don’t, for the following reason. Remember that a 2s state has some amplitude to 
be near the nucleus while the 2 p state does not. That means that a 2s electron will 
feel some of the triple electric charge of the Li nucleus, but that a 2 p electron will 
stay out where the field looks like the Coulomb field of a single charge. The extra 
attraction lowers the energy of the 2s state relative to the 2 p state. The energy 
levels will be roughly as shown in Fig. 19-8 — which you should compare with the 
corresponding diagram for hydrogen in Fig. 19-7. So the lithium atom will have 
two electrons in Is states and one in a 2s. Since the 2s electron has a higher energy 
than a Is electron it is relatively easily removed. The ionization energy of lithium 
is only 5.4 electron volts, and it is quite active chemically. 

So you can see the patterns which develop; we have given in Table 19-2 a 
list of the first 36 elements, showing the states occupied by the electrons in the 
ground state of each atom. The Table gives the ionization energy for the most 
loosely bound electron, and the number of electrons occupying each “shell” — 
by which we mean states with the same n. Since the different /-states have different 
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The electron configurations of the first 36 elements 


z 

Element 

fVj(ev) 

Electron Configuration 

Is 

2 s 2 p 

3s 3 p 3d 

4 s 4 p 4 d 4/ 

1 

H hydrogen 

13.6 

1 




2 

He helium 

24.6 

2 




3 

Li lithium 

5.4 


1 



4 

Be beryllium 

9.3 


2 



5 

B boron 

8.3 


2 1 



6 

C carbon 

11.3 

FILLED 

2 2 

Number of electrons 

7 

N nitrogen 

14.5 

(2) 

2 3 

in each state 

8 

O oxygen 

13.6 


2 4 



9 

F fluorine 

17.4 


2 5 



10 

Ne neon 

21.6 


2 6 



11 

Na sodium 

5.1 



1 


12 

Mg magnesium 

7.6 



2 


13 

A1 aluminum 

6.0 



2 1 


14 

Si silicon 

8.1 

— FILLED — 

2 2 


15 

P phosphorus 

10.5 



2 3 


16 

S sulfur 

10.4 

(2) 

(8) 

2 4 


17 

Cl chlorine 

13.0 



2 5 


18 

A argon 

15.8 



2 6 


19 

K potassium 

4.3 



i 

i 

i 

1 

20 

Ca calcium 

6.1 



i 

i 

2 

21 

Sc scandium 

6.5 



j 1 

2 

22 

Ti titanium 

6.8 



! 2 

2 

23 

V vanadium 

6.7 

— 

-FILLED 1 3 

2 

24 

Cr chromium 

6.8 



] 5 

1 

25 

Mn manganese 

7.4 

(2) 

(8) 

(8) j 5 

2 

26 

Fe iron 

7.9 



6 

2 

27 

Co cobalt 

7.9 



1 7 

2 

28 

Ni nickel 

7.6 



I 8 

2 

29 

Cu copper 

7.7 



i 10 

1 

30 

Zn zinc 

9.4 



1 10 

i 

2 

31 

Ga gallium 

6.0 




2 1 

32 

Ge germanium 

7.9 



FILLED 

2 2 

33 

As arsenic 

9.8 




2 3 

34 

Se selenium 

9.7 

(2) 

(8) 

(18) 

2 4 

35 

Br bromine 

11.8 




2 5 

36 

Kr krypton 

14.0 




2 6 


Is p d f 

Fig. 1 9-8. Schematic energy level 
diagram for an atomic electron with other 
electrons present. (The scale is not the 
same as Fig. 19-7.) 
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energies, each /-value corresponds to a sub-shell of 2(2/ +1) possible states (of 
different m and electron spin). These all have the same energy — except for some 
very small effects we are neglecting. 

Be 


Beryllium is like lithium except that it has two electrons in the 2s state as 
well as two in the filled Is shell. 

B to Ne 

Boron has 5 electrons. The fifth must go into a 2 p state. There are 2 X 3 = 6 
different 2 p states, so we can keep adding electrons until we get to a total of 8. 
This takes us to neon. As we add these electrons we are also increasing Z, so the 
whole electron distribution gets pulled in closer and closer to the nucleus and the 
energy of the 2 p states goes down. By thg_time we get to neon the ionization energy 
is up to 21.6 volts. Neon does not easily give up an electron. Also there are no 
more low-energy slots to be filled, so it won’t try to grab an extra electron. Neon 
is chemically inert. Fluorine, on the other hand, does have an empty position where 
an electron can drop into a state of low energy, so it is quite active in chemical 
reactions. 

Na to A 

With sodium the eleventh electron must start a new shell — going into a 3s 
state. The energy level of this state is much higher ; the ionization energy jumps 
down; and sodium is an active chemical. From sodium to argon the s and p states 
with n = 3 are occupied in exactly the same sequence as for lithium to neon. 
Angular configurations of the electrons in the outer unfilled shell have the same 
sequence, and the progression of ionization energies is quite similar. You can see 
why the chemical properties repeat with increasing atomic number. Magnesium 
acts chemically much like beryllium, silicon like carbon, and chlorine like fluorine. 
Argon is inert like neon. 

You may have noticed that there is a slight peculiarity in the sequence of 
ionization energies between lithium and neon, and a similar one between sodium 
and argon. The last electron is bound to the oxygen atom somewhat less than 
we might expect. And sulphur is similar. Why should that be? We can under- 
stand it if we put in just a little bit of the effects of the interactions between in- 
dividual electrons. Think of what happens when we put the first 2 p electron onto 
the boron atom. It has six possibilities — three possible /7-states, each with two 
spins. Imagine that the electron goes with spin up into the m — 0 state, which 
we have also called the “z” state because it hugs the z-axis. Now what will happen 
in carbon? There are now two 2 p electrons. If one of them goes into the “z” 
state, where will the second one go? It will have lower energy if it stays away from 
the first electron, which it can do by going into, say, the "x” state of the 2p shell. 
(This state is, remember, just a linear combination of the m = +1 and m = — 1 
states.) Next, when we go to nitrogen, the three 2 p electrons will have the lowest 
energy of mutual repulsion if they go one each into the “x,” “y,” and “z” con- 
figurations. For oxygen, however, the jig is up. The fourth electron must go into 
one of the filled states — with opposite spin. It is strongly repelled by the electron 
already in that state, so its energy will not be as low as it might otherwise be, and 
it is more easily removed. That explains the break in the sequence of binding 
energies which appears between nitrogen and oxygen, and between phosphorus 
and silicon. 

K to Zn 

After argon, you would, at first, think that the new electrons would start to 
fill up the 3 d states But they don’t. As we described earlier — and illustrated in 
Fig. 19-7 — the higher angular momentum states get pushed up in energy. By the 
time we get to the 2d states they are pushed to an energy a little bit above the energy 
of the 45 state. So in potassium the last electron goes into the 45 state. After this 
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shell is filled (with two electrons) at calcium, the 3 d states begin to be filled for 
scandium, titanium, and vanadium. 

The energies of the 3 p and 4s states are so close together that small effects 
can shift the balance either way. By the time we get to put four electrons into the 
3d states, their repulsion raises the energy of the 4s state just enough that its energy 
is slightly above the 3c/ energy, so one electron shifts over. For chromium we don’t 
get a 4, 2 combination as we would have expected, but instead a 5, 1 combination. 
The new electron added to get manganese fills up the 4s shell again, and the states 
of the 3d shell are then occupied one by one until we reach copper. 

Since the outermost shell of manganese, iron, cobalt, and nickel have the same 
configurations, however, they all tend to have similar chemical properties. (This 
effect is mtteh more pronounced in the rare-earth elements which all ha_ve the same 
outer shell but a progressively filling inner shell which has much less influence on 
their chemical properties.) 

In copper an electron is robbed from the 4s shell, finally completing the 3d 
shell. The energy of the 10, 1 combination is, however, so close to the 9, 2 con- 
figuration for copper that just the presence of another atom nearby can shift the 
balance. For this reason the two last electrons of copper are nearly equivalent, 
and copper can have a valence of either 1 or 2 (It sometimes acts as though its 
electrons were in the 9, 2 combination.) Similar things happen at other places and 
account for the fact that other metals, such as iron, combine chemically with either 
of two valences. By zinc, both the 3d and 4s shells are filled once and for all. 

Ga to Kr 

From gallium to krypton the sequence proceeds normally again, filling the 
4 p shell. The outer shells, the energies, and the chemical properties repeat the 
pattern of boron to neon and aluminum to argon. 

Krypton, like argon and neon, is known as “noble” gas. All three are chem- 
ically “inert.” This means only that, having filled shells of relatively low energy, 
there are few situations in which there is an energy advantage for them to join in a 
simple combination with other elements. Having a filled shell is not enough. 
Beryllium and magnesium have filled s-shells, but the energy of these shells is too 
high to lead to stability. Similarly, one would have expected another “noble” 
element at nickel, if the energy of the 3d shell had been lower (or the 4s, higher). 
On the other hand, krypton is not completely inert; it will form a weakly-bound 
compound with chlorine. 

Since our sample has turned up most of the main features of the periodic 
table, we stop our examination at element number 36 — there are still seventy or 
so more! 

We would like to bring up only one more point — that we not only can under- 
stand the valences to some extent but also can say something about the directional 
properties of the chemical bonds. Take an atom like oxygen which has four 2 p 
electrons. The first three go into “x,” “y” and “z” states and the fourth will 
double one of these states, leaving two — say “x” and — vacant. Consider then 
what happens in H 2 0. Each of the two hydrogens are willing to share an electron 
with the oxygen, helping the oxygen to fill a shell. These electrons will tend to go 
into the “x” and “_y” vacancies. So the water molecule should have the two hy- 
drogen atoms making a right angle with respect to the center of the oxygen. The 
angle is actually 105°. We can even understand why the angle is larger than 90°. 
In sharing their electrons the hydrogens end up with a net positive charge. The 
electric repulsion “strains” the wave functions and pushes the angle out to 105°. 
The same situation occurs in H 2 S. But because the sulphur atom is larger, the 
two hydrogen atoms are farther apart, there is less repulsion, and the angle is 
only pushed out to about 93°. Selenium is even larger, so in H 2 Se the angle is 
very nearly 90°. 

We can use the same arguments to understand the geometry of ammonia, 
H 3 N. Nitrogen has room for three more 2 p electrons, on each for the “x,” “y,” 
and “z” type states. The three hydrogens should join on at right angles to each 
other. The angles come out a little larger than 90° — again from the electric repul- 
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sion — but at least we see why the molecule of H 3 N is not flat. The angles in 
phosphene, H 3 P, are close to 90°, and in H 3 As are still closer. We assumed that 
NH 3 was not flat when we described it as a two-state system. And the nonflatness 
is what makes the ammonia maser possible. Now we see that also that shape can 
be understood from our quantum mechanics. 

The Schrodinger equation has been one of the great triumphs of physics. By 
providing the key to the underlying machinery of atomic structure it has given 
an explanation for atomic spectra, for chemistry, and for the nature of matter. 
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20 


Operators 


20-1 Operations and operators 

All the things we have done so far in quantum mechanics could be handled 
with ordinary algebra, although we did from time to time show you some special 
ways of writing quantum-mechanical quantities and equations. We would like 
now to talk some more about some interesting and useful mathematical ways of 
describing quantum-mechanical things. There are many ways of approaching the 
subject of quantum mechanics, and most books use a different approach from the 
one we have taken. As you go on to read other books you might not see right 
away the connections of what you will find in them to what we have been doing. 
Although we will also be able to get a few useful results, the main purpose of this 
chapter is to tell you about some of the different ways of writing the same physics. 
Knowing them you should be able to understand better what other people are 
saying. When people were first working out classical mechanics they always wrote 
all the equations in terms of x-, y-, and z-components. Then someone came along 
and pointed out that all of the writing could be made much simpler by inventing 
the vector notation. It’s true that when you come down to figuring something 
out you often have to convert the vectors back to their components. But it’s 
generally much easier to see what’s going on when you work with vectors and also 
easier to do many of the calculations. In quantum mechanics we were able to 
write many things in a simpler way by using the idea of the “state vector.” The 
state vector | has, of course, nothing to do with geometric vectors in three 
dimensions but is an abstract symbol that stands for a physical state , identified 
by the “label,” or “name,” The idea is useful because the laws of quantum 
mechanics can be written as algebraic equations in terms of these symbols. For 
instance, our fundamental law that any state can be made up from a linear com- 
bination of base states is written as 


20-1 Operations and operators 

20-2 Average energies 

20-3 The average energy of an 
atom 

20-4 The position operator 

20-5 The momentum operator 

20-6 Angular momentum 

20-7 The change of averages with time 
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( 20 . 1 ) 


where the C t are a set of ordinary (complex) numbers — the amplitudes C t = (/ 1 \p) 
— while | 1), [ 2), | 3), and so on, stand for the base states in some base, or repre- 
sentation. 

If you take some physical state and do something to it — like rotating it, or 
like waiting for the time At — you get a different state. We say, “performing 
an operation on a state produces a new state.” We can express the same idea by 
an equation: 

I*) = A\*). (20.2) 

An operation on a state produces another state. The operator A stands for some 
particular operation. When this operation is performed on any state, say | ^), it 
produces some other state | <j>). 

What does Eq. (20.2) mean? We define it this way. If you multiply the 
equation by (/ 1 and expand | \p) according to Eq. (20.1), you get 

(i I <*>) = X) <' I ^ I J>(J I ^>- (20.3) 

3 

(The states | j) are from the same set as | /).) This is now just an algebraic equation. 
The numbers (/ 1 <t>) give the amount of each base state you will find in \<j>) 9 and 
it is given in terms of a linear superposition of the amplitudes (j \ $ ) that you find 
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| \f/) in each base state. The numbers (i \A\j) are just the coefficients which tell 
how much of ( j \ $) goes into each sum. The operator A is described numerically 
by the set of numbers, or “matrix,” 

= (20.4) 


So Eq. (20.2) is a high-class way of writing Eq. (20.3). Actually it is a little 
more than that; something more is implied. In Eq. (20.2) we do not make any 
reference to a set of base states. Equation (20.3) is an image of Eq. (20.2) in 
terms of some set of base states. But, as you know, you may use any set you wish. 
And this idea is implied in Eq. (20.2). The operator way of writing avoids making 
any particular choice. Of course, when you want to get definite you have to choose 
some set. When you make your choice, you use Eq. (20.3). So the operator 
equation (20.2) is a more abstract way of writing the algebraic equation (20.3). 
It’s similar to the difference between writing 


c = a X b 

instead of 

Cx ~ &ybz &zby, 
Cy ~ a z b X 0 X by , 

C Z — a x by ' Oyb X * 


The first way is much handier. When you want results , however, you will eventually 
have to give the components with respect to some set of axes. Similarly, if you 
want to be able to say what you really mean by A , you will have to be ready to 
give the matrix A l0 in terms of some set of base states. So long as you have in 
mind some set Ai 3 , Eq. (20.2) means just the same as Eq. (20.3). (You should 
remember also that once you know a matrix for one particular set of base states 
you can always calculate the corresponding matrix that goes with any other base. 
You can transform the matrix from one “representation” to another.) 

The operator equation in (20.2) also allows a new way of thinking. If we 
imagine some operator A, we can use it with any state | V ') to create a new state 
A | \p). Sometimes a “state” we get this way may be very peculiar — it may not 
represent any physical situation we are likely to encounter in nature. (For instance, 
we may get a state that is not normalized to represent one electron.) In other 
words, we may at times get “states” that are mathematically artificial. Such 
artificial “states” may still be useful, perhaps as the mid-point of some calculation. 

We have already shown you many examples of quantum-mechanical op- 
erators. We have had the rotation operator P y (8) which takes a state | and 
produces a new state, which is the old state as seen in a rotated coordinate system. 
We have had the parity (or inversion) operator P, which makes a new state by 
reversing all coordinates. We have had the operators & Xi a y , and & z for spin one- 
half particles. 

The operator J z was defined in Chapter 17 in terms of the rotation operator 
for a small angle e. 


Pz(*) — 1 fa € *^ z * (20.5) 

This just means, of course, that 

A z (e) | i) = \t) + ^ej z \t). (20.6) 

In this example, j z | is h/ie times the state you get if yon rotate | by the small 
angle € and then subtract the original state. It represents a “state” which is the 
difference of two states. 

One more example. We had an operator p x — called the momentum operator 
(x-component) defined in an equation like (20.6). If 3 X (L) is the operator which 
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displaces a state along x by the distance L, then p z is defined by 


4(S) = 1 + l h hp x , (20.7) 

where 5 is a small displacement. Displacing the state [ yp) along x by a small dis- 
tance 8 gives a new state | yp'). We are saying that this new state is the old state 
plus a small new piece 

jr Sfix I *)• 

The operators we are talking about work on a state vector like | yp) 9 which is 
an abstract description of a physical situation. They are quite different from 
algebraic operators which work on mathematical functions. For instance, d/dx 
is an “operator” that works on /(x) by changing it to a new function /'(x ) = 
df/dx. Another example is the algebraic operator V 2 . You can see why the same 
word is used in both cases, but you should keep in mind that the two kinds of 
operators are different. A quantum-mechanical operator A does not work on an 
algebraic function, but on a state vector like | yp). Both kinds of operators are 
used in quantum mechanics and often in similar kinds of equations, as you will 
see a little later. When you are first learning the subject it is well to keep the 
distinction always in mind. Later on, when you are more familiar with the subject, 
you will find that it is less important to keep any sharp distinction between the 
two kinds of operators. You will, indeed, find that most books generally use the 
same notation for both! 

We’ll go on now and look at some useful things you can do with operators. 
But first, one special remark. Suppose we have an operator A whose matrix in 
some base is A tJ = (i \ A |y). The amplitude that the state A \ yp) is also in some 
other state [ <p) is (<j> | A | yp). Is there some meaning to the complex conjugate of 
this amplitude? You should be able to show that 

(4> | A | yp)* = (yp | A f | <t>), (20.8) 

where A* (read “A dagger”) is an operator whose matrix elements are 

4 = (A n )*. (20.9) 

To get the ij element of A f you go to the j, i element of A (the indexes are reversed) 
and take its complex conjugate. The amplitude that the state | <f>) is in | yp) is 
the complex conjugate of the amplitude that A \ yp) is in | <p). The operator A* is 
called the “Hermitian adjoint” of A . Many important operators of quantum 
mechanics have the special property that when you take the Hermitian adjoint, 
you get the same operator back. If B is such an operator, then 

& = B , 

and it is called a “self-adjoint” or “Hermitian,” operator. 


20-2 Average energies 

So far we have reminded you mainly of what you already know. Now we 
would like to discuss a new question. How would you find the average energy of 
a system — say, an atom? If an atom is in a particular state of definite energy and 
you measure the energy, you will find a certain energy E . If you keep repeating 
the measurement on each one of a whole series of atoms which are all selected to 
be in the same state, all the measurements will give E , and the “average” of your 
measurements will, of course, be just E, 

Now, however, what happens if you make the measurement on some state 
| yp) which is not a stationary state? Since the system does not have a definite 
energy, one measurement would give one energy, the same measurement on another 
atom in the same state would give a different energy, and so on. What would you 
get for the average of a whole series of energy measurements? 
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We can answer the question by projecting the state | \p) onto the set of states 
of definite energy. To remind you that this is a special base set, we’ll call the states 
\ iu). Each of the states | rj l ) has a definite energy E t . In this representation, 

I *> = £ c, I if.). (20.10) 

? 

When you make an energy measurement and get some number £ z , you have found 
that the system was in the state Vi. But you may get a different number for each 
measurement Sometimes you will get sometimes E 2 , sometimes £ 3 , and so 
on. The probability that you observe the energy E x is just the probability of finding 
the system in the state \rn), which is, of course, just the absolute square of the 
amplitude C x = (771 | \p). The probability of finding each of the possible energies 
E x is 

P x = | C t | 2 . ( 20 . 11 ) 

How are these probabilities related to the mean value of a whole sequence 
of energy measurements? Let’s imagine that we get a series of measurements like 
this: Eu E 7 , E lu £ 9 , £ 1} E 10i E 7 , £ 2 , £3, £9, £4, and so on. We continue 

for, say, a thousand measurements. When we are finished we add all the energies 
and divide by one thousand. That’s what we mean by the average. There’s also 
a short-cut to adding all the numbers. You can count up how many times you get 
£1, say that is N u and then count up the number of times you get £ 2 , call that 
N 2 , and so on. The sum of all the energies is certainly just 

N X E X + N 2 E 2 + N Z E Z + * * * - X) N * E '- 


The average energy is this sum divided by the total number of measurements which 
is just the sum of all the N t \ which we can call N\ 

£,v = ■ (20.12) 

We are almost there. What we mean by the probability of something happen- 
ing is just the number of times we expect it to happen divided by the total number 
of tries. The ratio NJN should — for large N — be very near to P x , the probability 
of finding the state | rj x ), although it will not be exactly P t because of the statistical 
fluctuations. Let’s write the predicted (or “expected”) average energy as (£) av ; 
then we can say that 

(£>.v = (20.13) 

% 

The same arguments apply for any measurement. The average value of a measured 
quantity A should be equal to 

<4>.v = 

l 

where A t are the various possible values of the observed quantity, and P t is the 
probability of getting that value. 

Let’s go back to our quantum-mechanical state | ^). It’s average energy is 

(£>av = £ \C l \ 2 E l = X) cT C t E t . (20.14) 

l l 

Now watch this trickery! First, we write the sum as 

(f I Vi)Et(vt I '!')■ (20.15) 

l 

Next we treat the left-hand (\p \ as a common “factor.” We can take this factor 
out of the sum, and write it as 

(' P I I i ^)J- 
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This expression has the form 


I <#>}> 

where | <j>) is some “cooked-up” state defined by 

I <#*) = I 7 >i)E i{vx | <P)- (20.16) 

l 

It is, in other words, the state you get if you take each base state | rj t ) in the amount 
Ei(n% I ^>* 

Now remember what we mean by the states \r) l ). They are supposed to be 
the stationary states — by which we mean that for each one, 

H | 7} t ) = E % | 7l t ). 

Since E % is just a number, the right-hand side is the same as | r n l )E u and the sum 
in Eq. (20.16) is the same as 


X) H | | yp). 

i 

Now i appears only in the famous combination that contracts to unity, so 
H I I = H’Yj | ViXvi 1 1 1s) = H 1 1)- 

l l 

Magic! Equation (20.16) is the same as 

\<t>) = H\t). (20.17) 

The average energy of the state | yp) can be written very prettily as 

<£>av = <*|#|*>. (20.18) 

To get the average energy you operate on | \p) with //, and then multiply by (\p ]. 
A simple result. 

Our new formula for the average energy is not only pretty. It is also useful, 
because now we don’t need to say anything about any particular set of base 
states. We don’t even have to know all of the possible energy levels. When we go 
to calculate, we’ll need to describe our state in terms of some set of base states, 
but if we know the Hamiltonian matrix H tJ for that set we can get the average 
energy. Equation (19.18) says that for any set of base states | /), the average 
energy can be calculated from 

{E) „v = 'E(+\i)(i\H\j)(j\+), (20.19) 

1J 

where the amplitudes (i \H\j) are just the elements of the matrix H iy 

Let’s check this result for the special case that the states | /) are the definite 
energy states. For them, H\j) = E, |y), so (i \ H | j) = E 3 8 tJ and 

<£>nv = E <* I I *> = Z I I *>, 

i 

which is right. 

Equation (20.19) can, incidentally, be extended to other physical measure- 
ments which you can express as an operator. For instance, L z is the operator of 
the r-component of the angular momentum L. The average of the z-component 
for the state | \p) is 


(L z ) av = | L z | $). 

One way to prove it is to think of some situation in which the energy is proportional 
to the angular momentum. Then all the arguments go through in the same way. 
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In summary, if a physical observable A is related to 

a suitable quantum- 

mechanical operator A , the average value of A for the state 

| \fr) is given by 

{A) av = | A | *>. 

(20.20) 

By this we mean that 


^av ~ (lA 1 <A)> 

(20.21) 

with 


1 4>) = A\ f). 

(20.22) 


?0-3 The average energy of an atom 

Suppose we want the average energy of an atom in a state described by a 
wave function yp{r)\ How do we find it? Let’s first think of a one-dimensional 
situation with a state | defined by the amplitude (x \ \p) — $(x), We are asking 
for the special case of Eq. (20.19) applied to the coordinate representation. Follow- 
ing our usual procedure, we replace the states 1 1 ) and |y) by | x) and | x f ), and 
change the sums to integrals. We get 

(£>av = f jit | x)(x I A I x')(x' I #) dx dx'. (20.23) 

This integral can, if we wish, be written in the following way: 

J I x){x I <t>) dx, (20.24) 

with 

<*!♦> = f(x\A\ x')(x' I 4>) dx'. (20.25) 


The integral over x ( in (20.25) is the same one we had in Chapter 16~see Eq. 
(16.50) and Eq. (16.52) — and is equal to 




We can therefore write 


(x I <t>) 


-S3&+ 


(20.26) 


Remember that (\p | x) = (x\ $)* = ^*(.x); using this equality, the average 
energy in Eq. (20.23) can be written as 

(E) av = f +*(x) {- £ £ + V \ dx. (20.27) 


Given a wave function yp(x), you can get the average energy by doing this integral. 
You can begin to see how we can go back and forth from the state-vector ideas 
to the wave-function ideas. 

The quantity in the braces of Eq. (20.27) is an algebraic operator.;): We will 
write it as 3C 


3C = 


*L£_, 

2m dx 2 


V . 


With this notation Eq. (20.23) becomes 

(E)av = j$*(x)3bl/(x)dx. (20.28) 

The algebraic operator 3C defined here is, of course, not identical to the 
quantum-mechanical operator H. The new operator works on a function of 
position yp(x) = (x | to give a new function of x, <j>(x ) = (x \ 0); while H 


J The “operator” V(x) means “multiply by K(jc).” 
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operates on a state vector | \p) to give another state vector | <f>), without implying 
the coordinate representation or any particular representation at all. Nor is 3C 
strictly the same as H even in the coordinate representation. If we choose to 
work in the coordinate representation, we would interpret H in terms of a matrix 
(x\H\ x ') which depends somehow on the two "indices** x and x'; that is, we 
expect — according to Eq. (20.25) — that (x | <f>) is related to all the amplitudes 
(x | \p) by an integration. On the other hand, we find that 3C is a differential op- 
erator. We have already worked out in Section 16-5 the connection between 
{x\H\ x') and the algebraic operator 3C. 

We should make one qualification on our results. We have been assuming 
that the amplitude ^(x) = (x | \p) is normalized. By this we mean that the scale 
has been chosen so that 


f\+(x)\ 2 dx = 1; 

so the probability of finding the electron somewhere is unity. If you should choose 
to work with a ^(x) which is not normalized you should write 

f \p*(x)X\p(x) dx 

(EU = - J (20.29) 

I ^*(x>Kx) dx 

It’s the same thing. 

Notice the similarity in form between Eq. (20.28) and Eq. (20.18). These 
two ways of writing the same result appear often when you work with the x-repre- 
sentation. You can go from the first form to the second with any^4 which is a 
local operator, where a local operator is one which in the integral 

J (x | A | x')(x f | \p) dx f 

can be written as & ^(x), where d is a differential algebraic operator. There are, 
however, operators for which this is not true. For them you must work with 
the basic equations in (20.21) and (20.22). 

You can easily extend the derivation to three dimensions. The result is that£ 

<£>.,v = j Vol, (20.30) 

with 

3C = - ^ V 2 + V(r), (20.31) 

and with the understanding that 

j | \p | 2 c?Vol = 1. (20.32) 

The same equations can be extended to systems with several electrons in a fairly 
obvious way, but we won’t bother to write down the results. 

With Eq. (20.30) we can calculate the average energy of an atomic state 
even without knowing its energy levels. All we need is the wave function. It’s 
an important law. We’ll tell you about one interesting application. Suppose you 
want to know the ground-state energy of some system— say the helium atom, but 
it’s too hard to solve Schrodinger’s equation for the wave function, because there 
are too many variables. Suppose, however, that you take a guess at the wave 
function — pick any function you like— and calculate the average energy. That is, 
you use Eq. (20.29) — generalized to three dimensions — to find what the average 
energy would be if the atom were really in the state described by this wave function. 
This energy will certainly be higher than the ground-state energy which is the lowest 


X We write d Vol for the element of volume. It is, of course, just dx dy dz % and the 
integral goes from — x to +x in all three coordinates. 
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possible energy the atom can have.J Now pick another function and calculate its 
average energy. If it is lower than your first choice you are getting closer to the 
true ground-state energy. If you keep on trying all sorts of artificial states you 
will be able to get lower and lower energies, which come closer and closer to the 
ground-state energy. If you are clever, you will try some functions which have a 
few adjustable parameters. When you calculate the energy it will be expressed 
in terms of these parameters. By varying the parameters to give the lowest possible 
energy, you are trying out a whole class of functions at once. Eventually you will 
find that it is harder and harder to get lower energies and you will begin to be 
convinced that you are fairly close to the lowest possible energy. The helium atom 
has been solved in just this way — not by solving a differential equation, but by 
making up a special function with a lot of adjustable parameters which are eventu- 
ally chosen to give the lowest possible value for the average energy. 



Fig. 20-1. A curve of probability 
density representing a localized particle. 


20-4 The position operator 

What is the average value of the position of an electron in an atom? For any 
particular state | what is the average value of the coordinate x? We’ll work in 
one dimension and let you extend the ideas to three dimensions or to systems with 
more than one parucic. 'we nave a state described oy \f/{x), and we keep measuring 
x over and over again. What is the average? It is 

j xP(x) dx, 

where P{x) is the probability of finding the electron in a little element dx at x. 
Suppose the probability density jP(x) varies with x as shown in Fig. 20-1. The 
electron is most likely to be found near the peak of the curve. The average value 
of x is also somewhere near the peak. It is, in fact, just the center of gravity of 
the area under the curve. 

We have seen earlier that P(x) is just | ^(x) | 2 = ^*(x)^(x), so we can write 
the average of x as 

(x) av = J^*(x)x^(x) dx. (20.33) 


Our equation for (x) av has the same form as Eq. (20.33). For the average 
energy, the energy operator 3C appears between the two ^’s, for the average position 
there is just x. (If you wish you can consider x to be the algebraic operator “multi- 
ply by x.”) We can carry the parallelism still further, expressing the average posi- 
tion in a form which corresponds to Eq. (20.18). Suppose we just write 

<x)av =<*!«> (20.34) 

with 

I a) = *|*>, (20.35) 

and then see if we can find the operator x which generates the state | a), which 
will make Eq. (20.34) agree with Eq. (20.33). That is, we must find a | a), so that 

<*!«>- <*>.v = /<* | x)x(x | *> dx. (20.36) 


First, let’s expand | <f>) in the x-representation. It is 


| «) = fit | x)(x | a) dx. 


(20.37) 


Now compare the integrals in the last two equations. You see that in the x-repre- 
sentation 


<x | 0 £> = x(x I *>. 


(20.38) 


t You can also look at it this way. Any function (that is, state) you choose can be 
written as a linear combination of the base states which are definite energy states. Since 
m this combination there is a mixture of higher energy states in with the lowest energy 
state, the average energy will be higher than the ground-state energy. 
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Operating on | \[/) with x to get | a) is equivalent to multiplying yf/(x) — (x \ \p) 
by x to get a(x) = (x \ a ). We have a definition of x in the coordinate representa- 
tion.t 

[We have not bothered to try to get the ^-representation of the matrix of the 
operator x. If you are ambitious you can try to show that 

(x | x | x') = x 5(x — x '). (20.39) 

You can then work out the amusing result that 

Jc I JC> = JC I x>. (20.40) 

The operator x has the interesting property that when it works on the base states 
[ x) it is equivalent to multiplying by x.] 

Do you want to know the average value of x 2 ? It is 

(x 2 )av = J\[/*(x)x 2 \^(x) dx, (20.41) 

Or, if you prefer you can write 


<* 2 > av = <* | a') 

with 

I «') = * 2 I *>. (20.42) 

By Jt 2 we mean xJc — the two operators are used one after the other. With the 
second form you can calculate {x 2 ) av using any representation (base-states) you 
wish. If you want the average of x n , or of any polynomial in x , you can see how 
to get it. 


20-5 The momentum operator 

Now we would like to calculate the mean momentum of an electron — again, 
we’ll stick to one dimension. Let P(p) dp be the probability that a measurement 
will give a momentum between p and p + dp. Then 

(P) av = jp P(P) dp. (20.43) 

Now we let (p \ yp) be the amplitude that the state | is in a definite momentum 
state | p). This is the same amplitude we called (mom p \ \p) in Section 16—3 and 
is a function of p just as (x \ $) is a function of x. There we chose to normalize 
the amplitude so that 

p (p) = 2k\(p\ *>i 2 - 

We have, then, 

<P ) av = /(^ I P)P(P I t) ^ • 

The form is quite similar to what we had for (x) ay . 

If we want, we can play exactly the same game we did with (x) av . First, we 
can write the integral above as 

f(t\p)(p 1/3)^- (20.46) 

You should now recognize this equation as just the expanded form of the amplitude 
I fi ) — expanded in terms of the base states of definite momentum. From Eq. 


t Equation (20.38) does not mean that \ct) = x\\p). You cannot “factor out” the 
(a; |, because the multiplier x in front of {x | is a number which is different for each 
state {x |. It is the value of the coordinate of the electron in the state | x). See Eq. (20.40). 
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(20.44) 

(20.45) 



(20.45) the state | j8) is defined in the momentum representation by 


(p\P) = P(P I *> (20.47) 

That is, we can now write 

(p) av - I 0) (20.48) 

with 

\0) = P\ *>, (20.49) 


where the operator p is defined in terms of the /^-representation by Eq. (20.47). 
[Again, you can if you wish show that the matrix form of p is 

(p\p\p f ) = pKp - p'\ (20.50) 

and that 

P\p) = P\P). (20.51) 


It works out the same as for x.] 

Now comes an interesting question. We can write (p), u as we have done in 
Eqs. (20.45) and (20.48), and we know the meaning of the operator p in the mo- 
mentum representation. But how should we interpret p in the coordinate representa- 
tion? That is what we will need to know if we have some wave function ^(x), 
and we want to compute its average momentum. Let's make clear what we mean. 
If we start by saying that (p) av is given by Eq. (20.48), we can expand that equation 
in terms of the ^-representation to get back to Eq. (20.45). If we are given the 
p-description of the state — namely the amplitude (p \ $), which is an algebraic 
function of the momentum p — we can get (p \ <j>) from Eq. (20.47) and proceed 
to evaluate the integral. The question now is: What do we do if we are given a 
description of the state in the ^-representation, namely the wave function \p(x) = 

(x I 

Well, let's start by expanding Eq. (20.48) in the x-representation. It is 

ip) av = /<* | x)(x | 0) dx. (20.52) 

Now, however, we need to know what the state | 0) is in the x-representation. 
If we can find it, we can carry out the integral. So our problem is to find the 
function 0(x) = (x | 0). 

We can find it in the following way. In Section 16-3 we saw how (p | 0) was 
related to (x \ 0). According to Eq. (16.24), 

(p\P) = fe~ ,pxl \x\fi)dx. (20.53) 

If we know (p | 0) we can solve this equation for (x \ 0). What we want, of course, 

is to express the result somehow in terms of \p(x) = (x \ \p)> which we are assuming 

to be known. Suppose we start with Eq. (20.47) and again use Eq. (16.24) to write 

(p I fi) = p(p 1 = pj e~ ,pzin t(x) dx. (20.54) 

Since the integral is over x we can put the p inside the integral and write 

<p|0> = fe-'^ptWdx. (20.55) 

Compare this with (20.53). You would say that {x \ 0) is equal to p\p(x). No, No! 
The wave function (x\0) = 0(x) can depend only on x — not on p. That’s the 
whole problem. 

However, some ingenious fellow discovered that the integral in (20.55) could 
be integrated by parts. The derivative of e ~ ipxfx with respect to x is (—i/h)pe~ ipxir \ 
so the integral in (20.55) is equivalent to 

-7 /s 
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If we integrate by parts, it becomes 

- h -. + ~J e- ipxl * f x dx. 

So long as we are considering bound states, so that 1 p{x) goes to zero at x = ± ac , 
the bracket is zero and we have 


<Pl0) = *j e-' PIlK dx. (20.56) 

Now compare this result with Eq. (20.53). You see that 

{x ^ ) = lTx ^ x) • (20 ' 57) 

We have the necessary piece to be able to complete Eq. (20.52). The answer is 


(p) av = j **(*) * ~ fix) dx. (20.58) 

We have found how Eq. (20.48) looks in the coordinate representation. 

Now you should begin to see an interesting pattern developing. When we 
asked for the average energy of the state | \p) we said it was 

<£) av = | <#>), with \<t> x ) = H\f). 


The same thing is written in the coordinate world as 

(2s) av = j (x)<f>(x) dx with <f>(x) = 3Qp(x). 

Here 3C is an algebraic operator which works a function of x. When we asked 
about the average value of x, we found that it could also be written 

<*>av = I <*)> with I a) = * I *>. 

In the coordinate world the corresponding equations are 

(jc)av = J ^*(x)a(x) dx, with a(x) = x\f/(x). 


When we asked about the average value of p, we wrote 

(p) av = I £>, with I /3) = / 1 *>. 

In the coordinate world the equivalent equations were 

(p) av = J iK*)/3(*) dx, with P(x) = J ~ $(x). 

In each of our three examples we start with the state | and produce another 
(hypothetical) state by a quantum-mechanical operator. In the coordinate repre- 
sentation we generate the corresponding wave function by operating on the wave 
function \p(x) with an algebraic operator. There are the following one-to-one 
correspondences (for one-dimensional problems): 


h 

& — > x, 

Px * &x 


2m dx* + 


h d_' 

i dx 


(20.59) 
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Table 20-1 


Physical Quantity 

Operator 

Coordinate Form 

Energy 

H 

2 

3C = - V 2 + V(r) 

2m 

Position 

5c 

X 


? 

y 


z 

z 

Momentum i 

Px 

* h d 

tP X . rj 

i dx 


P„ 

* h d 

<P» = - 
/ dy 


P. 

** 

li 

1 


In this list, we have introduced the symbol (9 X for the algebraic operator (h/i)d/dx: 

} -7S' (2 ° 

and we have inserted the x subscript on (P to remind you that we have been working 
only with the x-component of momentum. 

You can easily extend the results to three dimensions. For the other com- 
ponents of the momentum, 


Pv-> = 

Pz > S*z ~ 


h d^ 
i dy 

h d 
i dz 


If you want, you can even think of an operator of the vector momentum and write 

A h ( d . a . d\ 

p <p = _ - + e v - + e z -) > 

where e x , e y , and e z are the unit vectors in the three directions. It looks even more 
elegant if we write 


<P = - V. 

I 


(20.61) 


Our general result' is that for at least some quantum-mechanical operators, 
there are corresponding algebraic operators in the coordinate representation. 
We summarize our results so far — extended to three dimensions — in Table 20-1. 
For each operator we have the two equivalent forms :{ 

\<f>) = A | *> (20.62) 

or 

*(r) = <fy(r). (20.63) 


We will now give a few illustrations of the use of these ideas. The first one is 
just to point out the relation between (P and 3C. If we use (9 X twice, we get 


ffxVx = -h 2 


dx 2 


I In many books the same symbol is used for A and <5, because they both stand for the 
same physics, and because it is convenient not to have to write different kinds of letters. 
You can usually tell which one is intended by the context 
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This means that we can write the equality 

3C = ^ + Vy&y + ®z$z) + V(r). 

Or, using the vector notation, 

X = 2^ <P • <P + K ( r } . (20.64) 

(In an algebraic operator, any term without the operator symbol ( * ) means just a 
straight multiplication.) This equation is nice because it’s easy to remember if 
you haven’t forgotten your classical physics. Everyone knows that the energy is 
(nonrelativistically) just the kinetic energy p 2 /2m plus the potential energy, and 
3C is the operator of the total energy. 

This result has impressed people so much that they try to teach students all 
about classical physics before quantum mechanics. (We think differently!) But 
such parallels are often misleading. For one thing, when you have operators, the 
order of various factors is important; but that is not true for the factors in a 
classical equation. 

In Chapter 17 we defined an operator p x in terms of the displacement operator 
by [see Eq. (17.27)] 

| V) = 3 X (&) | *> - (l + j p x i) | *>, (20.65) 

where 8 is a small displacement. We should show you that this is equivalent to 
our new definition. According to what we have just worked out, this equation 
should mean the same as 

'p'(x) = i(x) + ^ 5 

But the right-hand side is just the Taylor expansion of \p(x + 6), which is certainly 
what you get if you displace the state to the left by 8 (or shift the coordinates to 
the right by the same amount). Our two definitions of p agree! 

Let’s use this fact to show something else. Suppose we have a bunch of parti- 
cles which we label 1, 2, 3, . . . , in some complicated system. (To keep things simple 
we’ll stick to one dimension.) The wave function describing the state is a function 
of all the coordinates x u x 2 , x Zi . . . We can write it as \p(x u x 2 , x 2 , . . .). Now 
displace the system (to the left) by 8. The new wave function 


^(* 1 , x 2 , *3, - . -) = + 8, x 2 + 8, *3 + 5, . . .) 


can be written as 


&(xu x 2> x 3i . . .) = Ux u x 2i x 3 > . . •) 


+ 


, ^ + , S + , S + -)- (2066) 


According to Eq. (20.65) the operator of the momentum of the state | yp) (let’s 
call it the total momentum) is equal to 


^ total = iw 1 + ^r 2 + w s + 


But this is just the same as 


<?total = + <Px2 "t~ <?x3 + • ' * . (20.67) 

The operators of momentum obey the rule that the total momentum is the sum of 
the momenta of all the parts. Everything holds together nicely, and many of the 
things we have been saying are consistent with each other. 
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20-6 Angular momentum 

Let’s for fun look at another operation — the operation of orbital angular 
momentum. In Chapter 17 we defined an operator J z in terms of R z (<p ), the operator 
of a rotation by the angle <p about the z-axis. We consider here a system described 
simply by a single wave function \p(r), which is a function of coordinates only, 
and does not take into account the fact that the electron may have its spin either 
up or down. That is, we want for the moment to disregard intrinsic angular 
momentum and think about only the orbital part. To keep the distinction clear, 
we’ll call the orbital operator L z , and define it in terms of the operator of a rotation 
by an infinitesimal angle e by 

&(e)|*> = (l + 3 « £.)n->. 



(Remember, this definition applies only to a state | \p) which has no internal spin 
variables, but depends only on the coordinates r = x, y, x ) If we look at the 
state | if') in a new coordinate system, rotated about the z-axis by the small angle 
e, we see a new state 

| r> = &(e)| i). 

If we choose to describe the state | \ p) in the coordinate representation — that 
is, by its wave function ^(r), we would expect to be able to write 

no = 0 + i e£ *)^ (x) - ( 20 - 68 ) 

What is £? Well, a point P at x and y in the new coordinate system (really x' 
and y, but we will drop the primes) was formerly at x — ey and y + ex, as you 
can see from Fig. 20-2. Since the amplitude for the electron to be at P isn’t changed 
by the rotation of the coordinates we can write 

ddf d\b 

y (x, y, z) - i P(x + ey, y - ex, z) = $(x, y, z) + ey — - ex — 
(remembering that e is a small angle). This means that 


Fig. 20-2. Rotation of the axes 
around the z-axis by the small angle €. 

That’s our answer. But notice. It is equivalent to 

<&Z ~ y y(^X> 



(20.69) 


(20.70) 


Returning to our quantum-mechanical operators, we can write 

L = xp u - yp x . 


(20.71) 


This formula is easy to remember because it looks like the familiar formula of 
classical mechanics; it is the z-component of 


L - rXp. (20.72) 

One of the fun parts of this operator business is that many classical equations 
get carried over into a quantum-mechanical form. Which ones don’t? There 
had better be some that don’t come out right, because if everything did, then 
there would be nothing different about quantum mechanics. There would be no 
new physics. Here is one equation which is different. In classical physics 


XPx ~ PxX = 0 . 


What is it in quantum mechanics? 
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xp x ~ Px% = ? 



Let’s work it out in the x-representation. So that we’ll know what we are doing 
we put in some wave function yp(x). We have 


or 


x(P»yKx) - $> x xt(x), 


Remember now that the derivatives operate on everything to the right. We get 


i dx 



h dt 
i X dx 


7 *(*)• 


(20.73) 


The answer is not zero. The whole operation is equivalent simply to multiplication 
by —h/i: 

$Px “ p x X = ~~ j * (20.74) 


If Plank’s constant were zero, the classical and quantum results would be the same, 
and there would be no quantum mechanics to learn! 

Incidentally, if any two operators A and B , when taken together like this: 

A6 - £a. 


do not give zero, we say that “the operators do not commute.” And an equation 
such as (20.74) is called a “commutation rule.” You can see that the commutation 
rule for p x and y is 

Pxy - ypx = o. 

There is another very important commutation rule that has to do with angular 
momenta. It is 

L x L y — L y L x = iht z . (20.75) 

You can get some practice with x and p operators by proving it for yourself. 

It is interesting to notice that operators which do not commute can also occur 
in classical physics. We have already seen this when we have talked about rotation 
in space. If you rotate something, such as a book, by 90° around x and then 90° 
around y> you get something different from rotating first by 90° around y and then 
by 90° around x. It is, in fact, just this property of space that is responsible for 
Eq. (20.75). 


20-7 The change of averages with time 

Now we want to show you something else. How do averages change with 
time? Suppose for the moment that we have an operator A , which does not itself 
have time in it in any obvious way. We mean an operator like x or p. (We exclude 
things like, say, the operator of some external potential that was being varied with 
time, such as K(x, t ).) Now suppose we calculate (A) av , in some state | ^), which is 

<^>* v = l*>- (20.76) 

How will (A) av depend on time? Why should it? One reason might be that the 
operator itself depended explicitly on time — for instance, if it had to do with a 
time-varying potential like V(x, /). But even if the operator does not depend on 
/, say, for example, the operator A = x, the corresponding average may depend 
on time. Certainly the average position of a panicle could be moving. How does 
such a motion come out of Eq. (20.76) if A has no time dependence? Well, the 
state | 4 /) might be changing with time. For nonstationary states we have often 
shown a time dependence explicitly by writing a state as | \p(t)). We wajit to show 
that the rate of change of (A) av is given by a new operator we will call A. Remem- 
ber that A is an operator, so that putting a dot over the A does not here mean taking 
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the time derivative, but is just a way of writing a new operator A which is defined by 

J t Wav = (20.77) 

Our problem is to find the operator A, 

First, we know that the rate of change of a state is given by the Hamiltonian. 
Specifically, 

= (20.78) 

This is just the abstract way of writing our original definition of the Hamiltonian: 

ih = £ (20 - 79) 

13 

If we take the complex conjugate of this equation, it is equivalent to 

~ih j t (HO I = <*(0 I 77. (20.80) 

Next, see what happens if we take the derivatives with respect to t of Eq. (20.76). 
Since each ^ depends on /, we have 

j t Wav ={jt<*\)*\++<'l'\j(j t \ *>) ■ ( 20.8 1 ) 

Finally, using the two equations in (20.78) and (20.79) to replace the derivatives, 
we get 

Jt Wav = {{(* 1 ft A I f) - | Aft I *)) . 

This equation is the same as 

J ( Wav = g <* | (HA - Aft) | *>. 

Comparing this equation with Eq. (20.77), you see that 

A = ^ (HA - AH). (20.82) 

That is our interesting proposition, and it is true for any operator A. 

Incidentally, if the operator A should itself be time dependent, we would have 

had 

A = ^ {ft A - AH)+™- (20.83) 

Let us try out Eq. (20.82) on some example to see whether it really makes 
sense. For instance, what operator corresponds to ;£? We say it should be 

k = ^(Hk - X H). (20.84) 

What is this? One way to find out is to work it through in the coordinate repre- 
sentation using the algebraic operator for 3C. In this representation the commutator 
is 

JCx — x3C = 

If you operate with this 
rivatives where you can, 

2m dx 


£ +»'»)* 


I h 2 d 2 , w , 
X (2 m dx 2 + 


or any wave function and work out all of the de- 
you end up after a little work with 
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But this is just the same as 


so we find that 



or that 

Hk =* xH — — / — p x 
m 

(20.85) 


x=^- 

m 

(20.86) 

A pretty result. It means that if the mean value of x is changing with time the 
drift of the center of gravity is the same as the mean momentum divided by m. 
Exactly like classical mechanics. 

Another example. What is the rate of change of the average momentum of a 
state? Same game. Its operator is 


i 

i-s 

ii 

(20.87) 


Again you can work it out in the x representation. Remember that p becomes 
d/dx , and this means that you will be taking the derivative of the potential energy 
V (in the 3C) — but only in the second term. It turns out that it is the only term 
which does not cancel, and you find that 


^ ^ ~ ~ , a y 

3C(P — (P3C = ~~lh j- 
dx 

or that 

HV 

P=-- d ~- ( 20 . 88 ) 

Again the classical result. The right-hand side is the force, so we have derived 
Newton’s law! But remember — these are the laws for the operators which give 
the average quantities. They do not describe what goes on in detail inside an 
atom. 

Quantum mechanics has the essential difference that px is not equal to kp . 
They differ by a little bit — by the small number h . But the whole wondrous compli- 
cations of interference, waves, and all, result from the little fact that kp — px is 
not quite zero. 

The history of this idea is also interesting. Within a period of a few months in 
1926, Heisenberg and Schrodinger independently found correct laws to describe 
atomic mechanics. Schrodinger invented his wave function ^(x) and found his 
equation. Heisenberg, on the other hand, found that nature could be described 
by classical equations, except that xp — px should be equal to h/i, which he could 
make happen by defining them in terms of special kinds of matrices. In our lan- 
guage he was using the energy-representation, with its matrices. Both Heisenberg’s 
matrix algebra and Schrodinger’s differential equation explained the hydrogen 
atom. A few months later Schrodinger was able to show that the two theories 
were equivalent — as we have seen here. But the two different mathematical forms 
of quantum mechanics were discovered independently. 


20-17 



2/ 


The Schrodinger Equation in a Classical 
Context: A Seminar on Superconductivity 


21-1 Schrodinger’s equation in a magnetic field 

This lecture is only for entertainment. I would like to give the lecture in a 
somewhat different style — just to see how it works out. It’s not a part of the course 
—in the sense that it is not supposed to be a last minute effort to teach you some- 
thing new. But, rather, I imagine that I’m giving a seminar or research report on 
the subject to a more advanced audience, to people who have already been educated 
in quantum mechanics. The main difference between a seminar and a regular 
lecture is that the seminar speaker does not carry out all the steps, or all the 
algebra. He says: ‘Tf you do such and such, this is what comes out,” instead 
of showing all of the details. So in this lecture I’ll describe the ideas all the way 
along but just give you the results of the computations. You should realize that 
you’re not supposed to understand everything immediately, but believe (more or 
less) that things would come out if you went through the steps. 

All that aside, this is a subject I want to talk about. It is recent and modern 
and would be a perfectly legitimate talk to give at a research seminar. My subject 
is the Schrodinger equation in a classical setting — the case of superconductivity. 

Ordinarily, the wave function which appears in the Schrodinger equation 
applies to only one or two particles. And the wave function itself is not some- 
thing that has a classical meaning — unlike the electric field, or the vector potential, 
or things of that kind. The wave function for a single particle is a “field”— in 
the sense that it is a function of position — but it does not generally have a classical 
significance. Nevertheless, there are some situations in which a quantum me- 
chanical wave function does have classical significance, and they are the ones I 
would like to take up. The peculiar quantum mechanical behavior of matter on 
a small scale doesn’t usually make itself felt on a large scale except in the standard 
way that it produces Newton’s laws — the laws of the so-called classical mechanics. 
But there are certain situations in which the peculiarities of quantum mechanics 
can come out in a special way on a large scale. 

At low temperatures, when the energy of a system has been reduced very, 
very low, instead of a large number of states being involved, only a very, very 
small number of states near the ground state are involved. Under those circum- 
stances the quantum mechanical character of that ground state can appear on a 
macroscopic scale. It is the purpose of this lecture to show a connection between 
quantum mechanics and large-scale effects — not the usual discussion of the way 
that quantum mechanics reproduces Newtonian mechanics on the average, but a 
special situation in which quantum mechanics will produce its own characteristic 
effects on a large or “macroscopic” scale. 

I will begin by reminding you of some of the properties of the Schrodinger 
equation.! I want to describe the behavior of a particle in a magnetic field using 
the Schrodinger equation, because the supercolnductive phenomena are involved 
with magnetic fields. An external magnetic field is described by a vector potential, 
and the problem is: what are the laws of quantum mechanics in a vector potential? 
The principle that describes the behavior of quantum mechanics in a vector 
potential is very simple. The amplitude that a particle goes from one place to 
another along a certain route when there’s a field present is the same as the ampli- 


t I’m not really reminding you, because I haven’t shown you some of these equations 
before; but remember the spirit of this seminar. 
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Fig. 21^—1. The amplitude to go from 
a to b along the path V is proportional ro 
exp (iq/fi) J^A - c/s. 


tude that it would go along the same route when there's no field, multiplied by the 
exponential of the line integral of the vector potential, times the electric charge 
divided by Planck's constant 1 (see Fig. 21-1): 

{b | a) m A = (b | a). 4 = 0 • exp ^ A • rfsj • (21.1) 

It is a basic statement of quantum mechanics. 

Now without the vector potential the Schrodmger equation of a charged 
particle (nonrelativistic, no spin) is 

~ 7 ft = i (7 v ) ‘ (7 v )* + q *+> (2L2) 

where <j> is the electric potential so that q<f> is the potential energy. f Equation (21.1) 
is equivalent to the statement that in a magnetic field the gradients in the Hamilton- 
ian are replaced in each case by the gradient minus qA, so that Eq. (21.2) becomes 

“ 7 ft = ** = fn (t v “ qA ) ‘ (? V _ ^ + qH ' (2L3) 

This is the Schrodinger equation for a particle with charge q moving in an elec- 
tromagnetic field A , <f> (nonrelativistic, no spin). 

To show that this is true I’d like to illustrate by a simple example in which 
instead of having a continuous situation we have a line of atoms along the x-axis 
with the spacing b and we have an amplitude —K for an electron to jump from 
one atom to another when there is no field. J Now according to Eq. (21.1) if 
there’s a vector potential in the x-direction A x (x, /), the amplitude to jump will 
be altered from what it was before by a factor exp ( iq/hA x b ), the exponent being 
iq/h times the vector potential integrated from one atom to the next. For simplicity 
we will write (qfti)A x == /(x), since A x will, in general, depend on x. If the ampli- 
tude to find the electron at the atom “/?” located at x is called C(x) = C„, then 
the rate of change of that amplitude is given by the following equation- 

- * ~ C(x) = E 0 C(x) - Ke- ,b/u+bl2) C(x + b ) 

_ Ke+ ,„nr-hi 2) C{X _ b y ( 21 . 4 ) 

There are three pieces. First, there’s some energy E {) if the electron is located 
at x. As usual, that gives the term E^C{x). Next, there is the term — KC{x + £), 
which is the amplitude for the electron to have jumped backwards one step from 
atom + 1,” located at x + b. However, in doing so in a vector potential, the 
phase of the amplitude must be shifted according to the rule in Eq. (21.1). If A x 
is not changing appreciably in one atomic spacing, the integral can be written as 
just the value of A x at the midpoint, times the spacing b. So (iq/h) times the integral 
is just bf(x + b/2 ). Since the electron is jumping backwards, I showed this 
phase shift with a minus sign. That gives the second piece. In the same manner 
there’s a certain amplitude to have jumped from the other side, but this time we 
need the vector potential at a distance (b/2) on the other side of x, times the dis- 
tance b. That gives the third piece. The sum gives the equation for the amplitude 
to be at x in a vector potential 

Now we know that if the function C(x) is smooth enough (we take the long 
wavelength limit), and if we let the atoms get closer together, Eq. (16 4) will 
approach the behavior of an electron in free space. So the next step is to expand 
both sides of (21.4) in powers of 6, assuming b is very small. For example, if b 
Is zero the right-hand side is just (E 0 — 2 K)C(x), so in the zeroth approximation 


1 Volume II, Section 15-5. 

f Not to be confused with our earlier use of for a state label 1 

% K is the same quantity that was called A in the problem of a linear lattice with no 
magnetic field See Chapter 13. 
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the energy is E 0 — IK . Next comes the terms in b . But because the two ex- 
ponentials have opposite signs, only even powers of b remain. So if you make a 
Taylor expansion of C(x), of /(x), and of the exponentials, and then collect the 
terms in b 2 , you get 

_ _ Eo C(x) - 2KC(x) 

l dt 

- Kb 2 {C"(x) - 2//(x)C'(x) - //'(x)C(x) - f 2 (x)C(x)) . (21.5) 

(The “primes” mean differentiation with respect to x.) 

Now this horrible combination of things looks quite complicated. But 
mathematically it’s exactly the same as 

- * ^ = (£o - 2K)C(x) - Kb 2 [A _ if(x ) 

The second bracket operating on C(x) gives C'(x) plus if(x)C(x). The first bracket 
operating on these two terms gives the C" term and terms in the first derivative 
of /(x) and the first derivative of C(x). Now remember that the solutions for zero 
magnetic field 2 represent a particle with an effective mass given by 


If you then set E 0 = — 2K , and put back f(x) = {q/h)A Xi you can easily check 
that Eq. (21.6) is the same as the first part of Eq. (21.3). (The origin of the potential 
energy term is well known, so I haven’t bothered to include it in this discussion.) 
The proposition of Eq. (21.1) that the vector potential changes all the amplitudes 
by the exponential factor is the same as the rule that the momentum operator, 
( h/i)V gets replaced by 


T* ~ im 


C(x). (21.6) 


7 v - qA, 

as you see in the Schrodinger equation of (21.3). 

21-2 The equation of continuity for probabilities 

Now I turn to a second point. An important part of the Schrodinger equation 
for a single particle is the idea that the probability to find the particle at a position 
is given by the absolute square of the wave function. It is also characteristic of 
the quantum mechanics that probability is conserved in a local sense. When the 
probability of finding the electron somewhere decreases, while the probability of 
the electron being elsewhere increases (keeping the total probability unchanged), 
something must be going on in between. In other words, the electron has a con- 
tinuity in the sense that if the probability decreases at one place and builds up 
at another place, there must be some kind of flow between. If you put a wall, for 
example, in the way, it will have an influence and the probabilities will not be the 
same. So the conservation of probability alone is not the complete statement of 
the conservation law, just as the conservation of energy alone is not as deep and 
important as the local conservation of energy. 3 If energy is disappearing, there 
must be a flow of energy to correspond. In the same way, we would like to find a 
“current” of probability such that if there is any change in the probability density 
(the probability of being found in a unit volume), it can be considered as coming 
from an inflow or an outflow due to some current. This current would be a vector 
which could be interpreted this way — the x component would be the net prob- 
ability per second and per unit area that a particle passes in the x direction across 
a plane parallel to the y-z plane. Passage toward +x is considered a positive 
flow, and passage in the opposite direction, a negative flow. 


2 Section 13-3. 

3 Volume II, Section 27-1. 
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Is there such a current? Well, you know that the probability density P(r, t) 
is given in terms of the wave function by 

P(M) = ^(r,r>Kr,0. (21.7) 

1 am asking: Is there a current J such that 

r) P 

Tt = -V-/? (21.8) 

If 1 take the time derivative of Eq. (21.7), I get two terms: 


dP 

dt 




d\j/ 

~dt 


+ f 


dt 


(21.9) 


Now use the Schrodinger equation — Eq. (21.3) — for d\p/dt ; and take the complex 
conjugate of it to get d\p*/dt — each / gets its sign reversed. You get 

Tt = ~ l h r i (t v _ qA ) ' (t v - qA )* + ***** 

— ^ Jm 0 V * (l V + qA^rfr* - e<f>\p\p*. 

The potential terms and a lot of other stuff cancel out. And it turns out that what 
is left can indeed be written as a perfect divergence. The whole equation is equiva- 
lent to 

T, - - T T “ I')* + 7 v - ( 2UI » 


It is really not as complicated as it seems. It is a symmetrical combination of 
\p* times a certain operation on \p, plus \p* times the complex conjugate operation 
on It is some quantity plus its own complex conjugate, so the whole thing is 
real — as it ought to be. The operation can be remembered this way: it is just the 
momentum operator & minus qA. I could write the current in Eq. (21.8) as 


J 


(9 


qA 


% + ** 


§> 



( 21 . 12 ) 


There is then a current / which completes Eq. (21.8). 

Equation (21.10) shows that the probability is conserved locally. If a particle 
disappears from one region it cannot appear in another without something going 
on in between. Imagine that the first region is surrounded by a closed surface far 
enough out that there is zero probability to find the electron at the surface The 
total probability to find the electron somewhere inside the surface is the volume 
integral of P. But according to Gauss’s theorem the volume integral of the di- 
vergence / is equal to ihe surface integral of /. If \p is zero at the surface, Eq. 
(21.10) says that / is zero, so the total probability to find the particle inside can’t 
change. Only if some of the probability approaches the boundary can some of it 
leak out. We can say that it only gets out by moving through the surface — and 
that is local conservation. 


21-3 Two kinds of momentum 


The equation for the current is rather interesting, and sometimes causes a 
certain amount of worry. You would think the current would be something like 
the density of particles times the velocity. The density should be something like 
which is o.k. And each term in Eq. (21.12) looks like the typical form for the 
average-value of the operator 


<P — qA ^ 


(21.13) 


21-4 


m 



so maybe we should think of it as the velocity of flow. It looks as though we have 
two suggestions for relations of velocity to momentum, because we would also 
think that momentum divided by mass, (P/m, should be a velocity. The two possi- 
bilities differ by the vector potential. 

It happens that these two possibilities were also discovered in classical physics, 
when it was found that momentum could be defined in two ways, 4 One of them 
is called “kinematic momentum,” but for absolute clarity I will in this lecture call 
it the “mu-momentum.” This is the momentum obtained by multiplying mass 
by velocity. The other is a more mathematical, more abstract momentum, some- 
times called the “dynamical momentum,” which I’ll call “p-momentum.” The 
two possibilities are 


mu-momentum = mv , 


(21.14) 


/7-momentum = mv + qA. 


(21.15) 


It turns out that in quantum mechanics with magnetic fields it is the p-momentum 
which is connected to the gradient operator $>, so it follows that (21.13) is the 
operator of a velocity. 

I’d like to make a brief digression to show you what this is all about — why 
there must be something like Eq. (21.15) in the quantum mechanics. The wave 
function changes with time according to the Schrodinger equation in Eq. (21.3). 
If I would suddenly change the vector potential, the wave function wouldn't 
change at the first instant; only its rate of change changes. Now think of what 
would happen in the following circumstance. Suppose I have a long solenoid, in 
which I can produce a flux of magnetic field (5-field), as shown in Fig. 21-2. And 
there is a charged particle sitting nearby. Suppose this flux nearly instantaneously 
builds up from zero to something. I start with zero vector potential and then 1 
turn on a vector potential. That means that I produce suddenly a circumferential 
vector potential A . You’ll remember that the line integral of A around a loop is 
the same as the flux of B through the loop. 5 Now what happens if I suddenly turn 
on a vector potential? According to the quantum mechanical equation the sudden 
change of A does not make a sudden change of \j / ; the wave function is still the 
same. So the gradient is also unchanged. 

But remember what happens electrically when I suddenly turn on a flux. 
During the short time that the flux is rising, there’s an electric field generated 
whose line integral is the rate of change of the flux with time: 

E = -§£■ (21. 16) 

That electric field is enormous if the flux is changing rapidly, and it gives a force 
on the particle. The force is the charge times the electric field, and so during the 
build up of the flux the particle obtains a total impulse (that is, a change in mv) 
equal to — qA, In other words, if you suddenly turn on a vector potential at a 
charge, this charge immediately picks up an “mu” momentum equal to —qA. 
But there is something that isn’t changed immediately and that’s the difference 
between mv and —qA. And so the sum p — mv + qA is something which is not 
changed when you make a sudden change in the vector potential. This quantity 
p is what we have called the p-momentum and is of importance in classical me- 
chanics in the theory of dynamics, but it also has a direct significance in quantum 
mechanics. It depends on the character of the wave function, and it is the one to 
be identified with the operator 


B 



Fig. 21-2. The electric field outside 
a solenoid with an increasing current. 



V. 


4 See, for example, J. D Jackson, Classical Electrodynamics , John Wiley and Sons, Inc. 
New York (1962), p. 408. 

3 Volume II, Chapter 14, Section 14-1. 
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21-4 The meaning of the wave function 


When Schrodinger first discovered his equation he discovered the conservation 
law of Eq. (21.9) as a consequence of his equation. But he imagined incorrectly 
that P was the electric charge density of the electron and that J was the electric 
current density, so he thought that the electrons interacted with the electromagnetic 
field through these charges and currents. When he solved his equations for the 
hydrogen atom and calculated he wasn't calculating the probability of anything 
— there were no amplitudes at that time — the interpretation was completely differ- 
ent. The atomic nucleus was stationary but there were currents moving around; 
the charges P and currents / would generate electromagnetic fields and the thing 
would radiate light. He soon found on doing a number of problems that it didn’t 
work out quite right. It was at this point that Born made an essential contribution 
to our ideas regarding quantum mechanics. It was Born who correctly (as far 
as we know) interpreted the ^ of the Schrodinger equation in terms of a probability 
amplitude — that very difficult idea that the square of the amplitude is not the 
charge density but is only the probability per unit volume of finding an electron 
there, and that when you do find the electron some place the entire charge is there. 
That whole idea is due to Born. 

The wave function \k(r) for an electron in an atom does not, then, describe 
a smeared-out electron with a smooth charge density. The electron is either here, 
or there, or somewhere else, but wherever it is, it is a point charge. On the other 
hand, think of a situation in which there are an enormous number of particles in 
exactly the same state, a very large number of them with' exactly the same wave 
function. Then what? One of them is here and one of them is there, and the 
probability of finding any one of them at a given place is proportional to 
But since there are so many particles, if I look in any volume dx dy dz I will 
generally find a number close to \p\p* dx dy dz. So in a situation in which ^ is the 
wave function for each of an enormous number of particles which are all in the 
same state, W* can be interpreted as the density of particles. If, under these 
circumstances, each particle carries the same charge q , we can, in fact, go further 
and interpret as the density of electricity. Normally, is given the dimen- 
sions of a probability density, then ^ should be multiplied by q to give the dimen- 
sions of a charge density. For our present purposes we can put this constant 
factor into and take itself as the electric charge density. With this under- 

standing, J (the current of probability I have calculated) becomes directly the 
electric current density. 

So in the situation in which we can have very many particles in exactly the 
same state, there is possible a new physical interpretation of the wave functions. 
The charge density and the electric current can be calculated directly from the 
wave functions and the wave functions take on a physical meaning which extends 
into classical, macroscopic situations. 

Something similar can happen with neutral particles. When we have the 
wave function of a single photon, it is the amplitude to find a photon somewhere. 
Although we haven’t ever written it down there is an equation for the photon wave 
function analogous to the Schrodinger equation for the electron. The photon 
equation is just the same as Maxwell’s equations for the electromagnetic field, 
and the wave function is the same as the vector potential A. The wave function 
turns out to be just the vector potential. The quantum physics is the same thing 
as the classical physics because photons are noninteracting Bose particles and 
many of them can be in the same state — as you know, they like to be in the same 
state. The moment that you have billions in the same state (that is, in the same 
electromagnetic wave), you can measure the wave function, which is the vector 
potential, directly. Of course, it worked historically the other way. The first ob- 
servations were on situations with many photons in the same state, and so we were 
able to discover the correct equation for a single photon by observing directly 
with our hands on a macroscopic level the nature of wave function. 

Now the trouble with the electron is that you cannot put more than one in 
the same state. Therefore, it was long believed that the wave function of the 
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Schrodinger equation would never have a macroscopic representation analogous 
to the macroscopic representation of the amplitude for photons. On the other 
hand, it is now realized that the phenomena of superconductivity presents us with 
just this situation. 


21-5 Superconductivity 

As you know, very many metals become superconducting below a certain 
temperature — the temperature is different for different metals. When you reduce 
the temperature sufficiently the metals conduct electricity without any resistance 
This phenomenon has been observed for a very large number of metals but not for 
all, and the theory of this phenomenon has caused a great deal of difficulty. It 
took a very long time to understand what was going on inside of superconductors, 
and I will only describe enough of it for our present purposes. It turns out that 
due to the interactions of the electrons with the vibrations of the atoms in the 
lattice, there is a small net effective attraction between the electrons. The result 
is that the electrons form together, if I may speak very qualitatively and crudely, 
bound pairs. 

Now you know that a single electron is a Fermi particle. But a bound pair 
would act as a Bose particle, because if I exchange both electrons in a pair I change 
the sign of the wave function twice, and that means that I don’t change anything. 
A pair is a Bose particle. 

The energy of pairing — that is, the net attraction — is very, very weak. Only 
a tiny temperature is needed to throw the electrons apart by thermal agitation, 
and convert them back to “normal” electrons. But when you make the tempera- 
ture sufficiently low that they have to do their very best to get into the absolutely 
lowest state; then they do collect in pairs. 

I don’t wish you to imagine that the pairs are really held together very closely 
like a point particle. As a matter of fact, one of the great difficulties of under- 
standing this phenomena originally was that that is not the way things are. The 
two electrons which form the pair are really spread over a considerable distance; 
and the mean distance between pairs is relatively smaller than the size of a single 
pair. Several pairs are occupying the same space at the same time. Both the reason 
why electrons in a metal form pairs and an estimate of the energy given up in 
forming a pair have been a triumph of recent times. This fundamental point in the 
theory of superconductivity was first explained in the theory of Bardeen, Cooper, 
and Schneffer, 7 but that is not the subject of this seminar. We will accept, however, 
the idea that the electrons do, in some manner or other, work in pairs, that we 
can think of these pairs as behaving more or less like particles, and that we can 
therefore talk about the wave function for a “pair.” 

Now the Schrodinger equation for the pair will be more or less like Eq. (21.3). 
There will be one difference in that the charge q will be twice the charge of an elec- 
tron. Also, we don’t know the inertia — or effective mass — for the pair in the crystal 
lattice, so we don’t know what number to put in for m. Nor should we think that 
if we go to very high frequencies (or short wavelengths), this is exactly the right 
form, because the kinetic energy that corresponds to very rapidly varying wave 
functions may be so great as to break up the pairs. At finite temperatures there 
are always a few pairs which are broken up according to the usual Boltzmann 
theory. The probability that a pair is broken is proportional to exp (—E mlT /kT). 
The electrons that are not bound in pairs are called “normal” electrons and will 
move around in the crystal in the ordinary way. I will, however, consider only 
the situation at essentially zero temperature — or, m any case, I will disregard the 
complications produced by those electrons which are not in pairs. 


First discovered by Onnes in 1911; H. K. Onnes, Comm. Phys. Lab , Umv. Leyden, 
Nos. 119, 120, 122 (1911). You will find a nice up-to-date discussion of the subject in 
E. A. Lynton, Superconductivity , John Wiley and Sons, Inc., New York, 1962. 

7 J. Bardeen, L. N. Cooper, and J. R. Schrieffer, Phys. Rev. 108, 1175 (1957). 
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Since electron pairs are bosons, when there are a lot of them in a given state 
there is an especially large amplitude for other pairs to go to the same state. So 
nearly all of the pairs will be locked down at the lowest energy in exactly the same 
state — it won’t be easy to get one of them into another state. There’s more ampli- 
tude to go into the same state than into an unoccupied state by the famous factor 
\/n, where n is the occupancy of the lowest state. So we would expect all the pairs 
to be moving in the same state. 

What then will our theory look like? I’ll call ^ the wave function of a pair 
in the lowest energy state. However, since is going to be proportional to the 
charge density p, I can just as well write ^ as the square root of the charge density 
times some phase factor: 

Kr) = p(r)e' e(r) , (21.17) 

where p and 6 are real functions of #*. (Any complex function can, of course, be 
written this way.) It’s clear what 1 we mean when we talk about the charge density, 
but what is the physical meaning of the phase 6 of the wave function? Well, let’s 
see what happens if we substitute \p(r) into Eq. (21.12), and express the current 
density in terms of these new variables p and 0. It’s just a change of variables and 
I won’t go through all the algebra, but it comes out 



Since both the current density and the charge density have a direct physical meaning 
for the superconducting electron gas, both p and 0 are real things. The phase is 
just as observable as p; it is a piece of the current density /. The absolute phase is 
not observable, but if the gradient of the phase is known everywhere, the phase is 
known except for a constant. You can define the phase at one point, and then the 
phase everywhere is determined. 

Incidentally, the equation for the current can be analyzed a little nicer, when 
you think that the current density / is in fact the charge density times the velocity 
of motion of the fluid of electrons, or pv. Equation (21,18) is then equivalent to 

mv = h V6 — qA. (21.19) 

Notice that there are two pieces in the mf-momentum; one is a contribution from 
the vector potential, and the other, a contribution from the behavior of the 
wave function. In other words, the quantity h Y0 is just what we have called the 
p-momentum. 


21-6 The Meissner effect 

Now we can describe some of the phenomena of superconductivity. First, 
there is no electrical resistance. There’s no resistance because all the electrons are 
collectively in the same state. In the ordinary flow of current you knock one 
electron or the other out of the regular flow, gradually deteriorating the general 
momentum. But here to get one electron away from what all the others are doing 
is very hard because of the tendency of all Bose particles to go in the same state. 
A current once started, just keeps on going forever. 

It’s also easy to understand that if you have a piece of metal in the super- 
conducting state and turn on a magnetic field which isn’t too strong (we won’t 
go into the details of how strong), the magnetic field can’t penetrate the metal. If, as 
you build up the magnetic field, any of it were to build up inside the metal, there 
would be a rate of change of flux which would produce an electric field, and an 
electric field would immediately generate a current which, by Lenz’s law, would 
oppose the flux. Since all the electrons will move together, an infinitesimal electric 
field will generate enough current to oppose completely any applied magnetic field. 
So if you turn the field on after you’ve cooled a metal to the superconducting state, 
it will be excluded. 
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Even more interesting is a related phenomenon discovered experimentally 
by Meissner. 8 If you have a piece of the metal at a high temperature (so that it is a 
normal conductor) and establish a magnetic field through it, and then you lower 
the temperature below the critical temperature (where the metal becomes a super- 
conductor), the field is expelled. In other words, it starts up its own current — and 
in just the right amount to push the field out. 

We can see the reason for that in the equations, and I’d like to explain how. 
Suppose that we take a piece of superconducting material which is in one lump. 
Then in a steady situation of any kind the divergence of the current must be zero 
because there’s no place for it to go. It is convenient to choose to make the 
divergence of A equal to zero. (I should explain why choosing this convention 
doesn’t mean any loss of generality, but I don’t want to take the time.) Taking 
the divergence of Eq. (21.18), then gives that the Lapiacian of 0 is equal to zero. 
One moment. What about the variation of p? I forgot to mention an important 
point. There is a background of positive charge in this metal due to the atomic 
ions of the lattice. If the charge density p is uniform there is no net charge and no 
electric field. If there would be any accumulation of electrons in one region the 
charge wouldn’t be neutralized and there would be a terrific repulsion pushing the 
electrons apart.f So in ordinary circumstances the charge density of the electrons 
in the superconductor is almost perfectly uniform — I can take p as a constant. 
Now the only way that V 2 0 can be zero everywhere inside the lump of metal is 
for Ho be a constant. And that means that there is no contribution to J from 
p-momentum. Equation (21,18) then says that the current is proportional to p 
times A. So everywhere in a lump of superconducting material the current is 
necessarily proportional to the vector potential: 

J = — p — A. (21.20) 

m 

Since p and q have the same (negative) sign, and since p is a constant, I can set 
pq/m = —(some constant); then 

/ = —(some constant)/*. (21.21) 

This equation was originally proposed by London and London 9 to explain the 
experimental observations of superconductivity — long before the quantum me- 
chanical origin of the effect was understood. 

Now we can use Eq. (21.20) in the equations of electromagnetism to solve 
for the fields. The vector potential is related to the current density by 


v 2 a = — K, j. 
e 0 c 2 

(21.22) 

If I use Eq. (21.21) for /, I have 


V 2 A = A 2 /*, 

(21.23) 

where X 2 is just a new constant; 


X 2 = p 9 ,• 

e 0 mc 2 

(21.24) 


We can now try to solve this equation for A and see what happens in detail. 
For example, in one dimension Eq. (21.23) has exponential solutions of the form 
e“ Xx and e +Xx . These solutions mean that the vector potential must decrease 
exponentially as you go from the surface into the material. (It can’t increase 


8 W. Meissner and R. Ochsenfeld, Naturwiss. 21, 787 (1933). 

9 H. London and F. London, Proc. Roy. Soc (London) A149, 71 (1935); Physica 2, 
341 (1935). 

t Actually if the electric field were too strong, pairs would be broken up and the 
“normal” electrons created would move in to help neutralize any excess of positive charge. 
Still, it takes energy to make these normal electrons, so the main point is that a nearly 
uniform density p is highly favored energetically. 
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because there would be a blow up.) If the piece of metal is very large compared 
to 1/X, the field only penetrates to a thin layer at the surface — a layer about 1/X 
in thickness. The entire remainder of the interior is free of field, as sketched in 
Fig. 21-3. This is the explanation of the Meissner effect. 

How big is the distance X? Well, remember that r 0 , the “electromagnetic 
radius” of the electron (2.8 X 10~ 18 cm), is given by 


2 

me 




2 

c 


4*re 0 ro 


Also, remember that q in Eq. (21.24) is twice the charge on an electron, so 

q = 87rr () ^ 
e 0 mc 2 q e 

Writing p as q e N, where N is the number of electrons per cubic centimeter, we have 

X 2 = 8xjVr 0 . (21.25) 


For a metal such as lead there are about 3 X 10 22 atoms per cm 3 , so if each one 
contributed only one conduction electron, 1/X would be about 2 X 10~ 5 cm. 
That gives you the order of magnitude 



Fig 21-3 (a) A superconducting cyl- 

inder is a magnetic field; (b) the magnetic 
field 6 as a function of r. 


21-7 Flux quantization 

The London equation (21.21) was proposed to account for the observed 
facts of superconductivity including the Meissner effect. In recent times, however, 
there have been some even more dramatic predictions. One prediction made by 
London was so peculiar that nobody paid much attention to it until recently. 
I will now discuss it. This time instead of taking a single lump, suppose we take 
a ring whose thickness is large compared to 1/X, and try to see what would happen 
if we started with a magnetic field through the ring, then cooled it to the super- 
conducting state, and afterward removed the original source of B. The sequence of 
events is sketched in Fig 21-4. In the normal state there will be a field in the body 
of the ring as sketched in part (a) of the figure. When the ring is made super- 
conducting, the field is forced outside of the material (as we have just seen). 
There will then be some flux through the hole of the ring as sketched in part (b). 
If the external field is now removed, the lines of field going through the hole are 
“trapped” as shown in part (c). The flux <£ through the center can’t decrease 
because d$/dt must be equal to the line integral of E around the ring, which is 
zero in a superconductor. As the external field is removed a super current starts 
flowing around the ring to keep the flux through the ring a constant. (It’s the 
old eddy-current idea, only with zero resistance.) These currents will, however, 
all flow near the surface (down to a depth 1/X), as can be shown by the same kind 
of analysis that I made for the solid block. These currents can keep the magnetic 
field out of the body of the ring, and produce the permanently trapped magnetic 
field as well. 

Now, however, there is an essential difference, and our equations predict a 
surprising effect. The argument I made above that 6 must be a constant in a solid 
block does not apply for a ring, as you can see from the following arguments. 

Well inside the body of the ring the current density J is zero; so Eq. (21.18) 
gives 

hve = qA. (21.26) 


Now consider what we get if we take the line integral of A around a curve T, 
which goes around the ring near the center of its cross-section so that it never 
gets near the surface, as drawn in Fig. 21-5. From Eq, (21.26), 


h 


V0- ds 


q f A * ds 


(21.27) 
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Now you know that the line integral of A around any loop is equal to the flux 
of B through the loop 


ds — <$>. 


Equation (21.27) the becomes 


/ 


V0 - ds = f <£. 
n 


(a) 


(21.28) 


The line integral of a gradient from one point to another (say from point 1 to point 
2) is the difference of the values of the function at the two points. Namely, 


t 


Vd-ds = do - 0i. 


If we let the two end points 1 and 2 come together to make a closed loop you might 
at first think that 0 2 would equal 0 X , so that the integral in Eq. (21.28) would be 
zero. That would be true for a closed loop in a simply-connected piece of super- 
conductor, but it is not necessarily true for a ring-shaped piece. The only physical 
requirement we can make is that there can be only one value of the wave function 
for each point . Whatever 0 does as you go around the ring, when you get back to 
the starting point the 0 you get must give the same value for the wave function 


4* = Vpe td . 

This will happen if 0 changes by 2m, where n is any integer. So if we make one 
complete turn around the ring the left-hand side of Eq. (21.27) must be h * 2irn. 
Using Eq. (21.28), I get that 


27 rnh — q$. 


(21.29) 



The trapped flux must always be an integer times 2i rh/q! If you would think of the 
ring as a classical object with an ideally perfect (that is, infinite) conductivity, 
you would think that whatever flux was initially found through it would just stay 
there — any amount of flux at all could be trapped. But the quantum-mechanical 
theory of superconductivity says that the flux can be zero, or 27 rh/q, or Anrh/q, 
or 6wh/q t and so on, but no value in between. It must be a multiple of a basic 
quantum mechanical unit. 

London 10 predicted that the flux trapped by a superconducting ring would 
be quantized and said that the possible values of the flux would be given by Eq. 
(21.29) with q equal to the electronic charge. According to London the basic 
unit of flux should be 2Trh/q e , which is about 4 X 10” 7 gauss = cm 2 . To visual- 
ize such a flux, think of a tiny cylinder a tenth of a millimeter in diameter; the 
magnetic field inside it when it contains this amount of flux is about one percent 
of the earth’s magnetic field. It should be possible to observe such a flux by a 
sensitive magnetic measurement. 

In 1961 such a quantized flux was looked for and found by Deaver and 
Fairbank 11 at Stanford University and at about the same time by Doll and 
Nabauer 12 in Germany. 

In the experiment of Deaver and Fairbank, a tiny cylinder of superconductor 
was made by electroplating a thin layer of tin on a one-centimeter length of No. 
56 (1.3 X 10~ 3 cm diameter) copper wire. The tin becomes superconducting 
below 3.8°K, while the copper remains a normal metal. The wire was put in a 
small controlled magnetic field, and the temperature reduced until the tin became 
superconducting. Then the exeternal source of field was removed. You would 


10 F. London, Superfluids; John Wiley and Sons, Inc., New York, 1950, Vol. I, p. 152 

11 B. S. Deaver, Jr., and W. M. Fairbank, Phys. Rev. Letters 7, 43 (1961), 

12 R. Doll and M. Nabauer, Phys. Rev. Letters 7, 51 (1961). 



Fig. 21-4. A ring in a magnetic 
field: (a) in the normal state; (b) in the 
superconducting state; (c) after the ex- 
ternal field is removed. 



Fig. 21-5. The curve V inside a 
superconducting ring 
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expect this to generate a current by Lenz’s law so that the flux inside would not 
change. The little cylinder should now have magnetic moment proportional to the 
flux inside. The magnetic moment was measured by jiggling the wire up and down 
(like the needle on a sewing machine, but at the rate of 100 cycles per second) 
inside a pair of little coils at the ends of the tin cylinder. The induced voltage in 
the coils was then a measure of the magnetic moment. 

When the experiment was done by Deaver and Fairbank, they found that the 
flux was quantized, but that the basic unit was only one-half as large as London 
had predicted. Doll and Nabauer got the same result. At first this was quite mys- 
terious^ but we now understand why it should be so. According to the Bardeen, 
Cooper, and Schrieffer theory of superconductivity, the q which appears in Eq. 
(21.29) is the charge of a pair of electrons and so is equal to 2 q t . The basic flux 
unit is 


$ 0 = ~ 2 X 10 7 gauss-cm (21.30) 

(/e 

or one-half the amount predicted by London. Everything, now fits together, and 
the measurements show the existence of the predicted purely quantum-mechanical 
effect on a large scale. 


21-8 The dynamics of superconductivity 

The Meissner effect and the flux quantization are two confirmations of our 
general ideas. Just for the sake of completeness I would like to show you what 
the complete equations of a superconducting fluid would be from this point of 
view — it is rather interesting. Up to this point I have only put the expression for 
\p into equations for charge density and current. If I put it into the complete 
Schrodinger equation I get equations for p and 0. It should be interesting to see 
what develops, because here we have a “fluid” of electron pairs with a charge 
density p and a mysterious 0 — we can try to see what kind of equations we get for 
such a “fluid”! So we substitute the wave function of Eq. (21.17) into the Schro- 
dinger equation (21.3) and remember that p and 0 are real functions of x, and 
z. If we separate real and imaginary parts we obtain then two equations. To 
write them in a shorter form I will — following Eq. (21.19) — write 

-V6 - S- A = v. (21.31) 

m m 

One of the equations 1 get is then 

f t = V-pv. (21.32) 

Since pv is first 7, this is just the continuity equation once more. The other equation 
I obtain tells how 0 varies; it is 

-£{-L,> (v ^)}. (21.33) 

Those who are thoroughly familiar with hydrodynamics (of which I’m sure few 
of you are) will recognize this as the equation of motion for an electrically charged 
fluid if we identify he as the “velocity potential” — except that the last term, which 
should be the energy of compression of the fluid, has a rather strange dependence 
on the density p. In any case, the equation says that rate of change of the quantity 
he is given by a kinetic energy term, \mv 2 , plus a potential energy term, q<f>, with 
an additional term, containing the factor h 2 , which we could call a “quantum 
mechanical energy.” We have seen that inside a superconductor p is kept very 


t It has once been suggested by Onsager that this might happen (see F. London, Ref. 
10), although no one else ever understood why. 
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uniform by the electrostatic forces, so this term can almost certainly be neglected 
in every practical application provided we have only one superconducting region. 
If we have a boundary between two superconductors (or other circumstances in 
which the value of p may change rapidly) this term can become important. 

For those who are not so familiar with the equations of hydrodynamics, 
I can rewrite Eq. (21.33) in a form that makes the physics more apparent by using 
Eq. (21.31) to express 6 in terms of v. Taking the gradient of the whole of Eq. 
(21.33) and expressing V0 in terms of A and v by using (21.31), I get 

57 - i(~ v * - *4) - v X < T x »> - (» X v),- 

(21.34) 

What does this equation mean? First, remember that 

r)A 

~ = E (21.35) 

Next, notice that if I take the curl of Eq. (21.19), I get 

VXt> = - — V X A, (21.36) 

m v ' 


since the curl of a gradient is always zero. But V X A is the magnetic field B , 
so the first two terms can be written as 

— (E + v X B). 
m 


Finally, you should understand that dv/dt stands for the rate of change of the 
velocity of the fluid at a point. If you concentrate on a particular particle, its 
acceleration is the total derivative of v (or, as it is sometimes called in fluid dy- 
namics, the “comoving acceleration”), which is related to dv/dt by 13 


dv 

It 


eomoving 


dv , 

a7+ (v ' v)v - 


(21.37) 


This extra term also appears as the third term on the right side of Eq. (21.25). 
Taking it to the left side, I can write Eq. (21.25) in the following way: 


m 


dv 

~di 


-*£+.X10— *£(- LvVp) 


(21.38) 


We also have from Eq. (21.36) that 

V X V = - 2 - B. 

m 


(21.39) 


These two equations are the equations of motion of the superconducting 
electron fluid. The first equation is just Newton’s law for a charged fluid in an 
electromagnetic field. It says that the acceleration of each particle of the fluid 
whose charge is q comes from the ordinary Lorentz force q(E + v X B) plus an 
additional force, which is the gradient of some mystical quantum mechanical 
potential — a force which is not very big except at the junction between two super- 
conductors. The second equation says that the fluid is “ideal” — the curl of v has 
zero divergence (the divergence of B is always zero). That means that the velocity 
can be expressed in terms of velocity potential. Ordinarily one writes that V X 
v = 0 for an ideal fluid, but for an ideal charged fluid m a magnetic field , this gets 
modified to Eq. (21.40). ' 

So, Schrodinger’s equation for the electron pairs in a superconductor gives 
us the equations of motion of an electrically charged ideal fluid. Superconductivity 
is the same as the problem of the hydrodynamics of a charged liquid. If you want 


13 See Volume II, Section 40-2. 
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Fig. 21-6. Two superconductors sep 
orated by a thin insulator. 


to solve any problem about superconductors you take these equations for the 
fluid [or the equivalent pair, Eqs. (21.32) and (21.33)], and combine them with 
Maxwell’s equations to get the fields. (The charges and currents you use to get 
the fields must, of course, include the ones from the superconductor as well as 
from the external sources.) 

Incidentally, I believe that Eq. (21.38) is not quite correct, but ought to have 
an additional term involving the density. This new term does not depend on 
quantum mechanics, but comes from the ordinary energy associated with varia- 
tions of density. Just as in an ordinary fluid there should be a potential energy 
density proportional to the square of the deviation of p from p 0 , the undisturbed 
density (which is, here, also equal to the charge density of the crystal lattice). 
Since there will be forces proportional to the gradient of this energy, there should 
be another term in Eq. (21.38) of the form: (const) V(p — p 0 ) 2 . This term did 
not appear from the analysis because it comes from the interactions between parti- 
cles, which I neglected in using an independent-particle approximation. It is, 
however, just the force I referred to when I made the qualitative statement that 
electrostatic forces would tend to keep p nearly constant inside a superconductor. 


21-9 The Josephson junction 

I would like to discuss next a very interesting situation that was noticed by 
Josephson 14 while analyzing what might happen at a junction between two super- 
conductors. Suppose we have two superconductors which are connected by a 
thin layer of insulating material as in Fig. 21-6. Such an arrangement is now 
called a “Josephson junction.” If the insulating layer is thick, the electrons can’t 
get through; but if the layer is thin enough, there can be an appreciable quantum 
mechanical amplitude for electrons to jump across. This is just another example 
of the quantum-mechanical penetration of a barrier. Josephson analyzed this 
situation and discovered that a number of strange phenomenon should occur. 

In order to analyze such a junction I’ll call the amplitude to find an electron 
on one side, ^ x , and the amplitude to find it on the other, In the superconduct- 
ing state the wave function, is the common wave function of all the electrons 
on one side, and \p 2 is the corresponding function on the other side. I could do 
this problem for different kinds of superconductors, but let us take a very simple 
situation in which the material is the same on both sides so that the junction is 
symmetrical and simple. Also, for a moment let there be no magnetic field. Then 
the two amplitudes should be related in the following way : 

ih ^ = Urfi + Ki 2 , 

ih = U $ 2 + K\pi. 


The constant AT is a characteristic of the junction. If K were zero, these two 
equations would just describe the lowest energy state — with energy U — of each 
superconductor. But there is coupling between the two sides by the amplitude K 
that there may be leakage from one side to the other. (It is just the “flip-flop” 
amplitude of a two-state system.) If the two sides are identical, U i would equal 
U 2 and I could just subtract them off. But now suppose that we connect the two 
superconducting regions to the two terminals of a battery so that there is a po- 
tential difference V across the junction. Then l/ x — U 2 = qV, I can, for con- 
venience, define the zero of energy to be halfway between, then the two equations 
are 


= ¥*• + **„ 


(21.40) 


14 B. D. Josephson, Physics Leiters 1, 251 (1962). 
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These are the standard equations for two quantum mechanical states coupled 
together. This time, let’s analyze these equations in another way. Let’s make the 
substitutions 


*1 = Vple’*, 

$2 = Vfhe 102 , 


(21.41) 


where 6 X and d 2 are the phases on the two sides of the junction and p x and p 2 
are the density of electrons at those two points. Remember that in actual practice 
pi and p 2 are almost exactly the same and are equal to p 0 , the normal density of 
electrons in the superconducting material. Now if you substitute these equations 
for and \ p 2 into (21 .40), you get four equations by equating the real and imaginary 
parts in each case. Letting ( $ 2 — 6 x ) — 5, for short, the result is 


Pi = 

h = 

*i - 


v 2 = 


+ ^ KVp 2 pi sin 8, 


- ft sin 8 , 




+ 


K K 

* VP2 


C0S 5 + ~Th ' 


(21.42) 


(21.43) 


The first two equations say that pi = — p 2 . “Bui,” you say, “they must 
both be zero if pi and p 2 are both constant and equal to zero.” Not quite. These 
equations are not the whole story. They say what pi and p 2 would be if there 
were no extra electric forces due to an unbalance between the electron fluid and 
the background of positive ions. They tell how the densities would start to change, 
and therefore describe the kind of current that would begin to flow. This current 
from side 1 to side 2 would be just pi(or — p 2 ), or 


J 


2 K 

h 


VP 1 P 2 sin 


8 . 


(21.44) 


Such a current would soon charge up side 2, except that we have forgotten that 
the two sides are connected by wires to the battery. The current that flows will 
not charge up region 2 (or discharge region 1) because currents will flow to keep 
the potential constant. These currents from the battery have not been included 
in our equations. When they are included, p x and p 2 do not in fact change, but 
the current across the junction is still given by Eq. (21.44). 

Since pi and p 2 do remain constant and equal to p 0) let’s set 2Kp 0 /h = J 0 , 
and write 

J — J 0 sin 5. (21.45) 

7 0 , like K , is then a number which is a characteristic of the particular junction. 

The other pair of equations (21.43) tells us about 6 X and 0 2 . We are interested 
in the difference 8 = 0 2 — 61 to use Eq. (21.45); what we get is 

J = * a _ii = iT (21.46) 

That means that we can write 

S(t) = S tl + % J V(t) dt, (21.47) 


where 5 0 is the value of 5 at / = 0. Remember also that q is the charge of a pair, 
namely, q = 2q e . In Eqs. (21.45) and (21.47) we have an important result, the 
general theory of the Josephson junction. 
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Now what are the consequences? First, put on a dc voltage. Jf you put on a 
dc voltage, F 0j the argument of the sine becomes (5 0 + (#/^)L 0 /) Since h is a 
small number (compared to ordinary voltage and times), the sine oscillates rather 
rapidly and the net current is nothing. (In practice, since the temperature is not 
zero, you would get a small current due to the conduction by “normal” electrons.) 
On the other hand if you have zero voltage across the junction, you can get a cur- 
rent! With no voltage the current can be any amount between +J 0 and — / 0 
(depending on the value of 5 0 ). But try to put a voltage across it and the current 
goes to zero. This strange behavior has recently been observed experimentally. 15 

There is another way of getting a current — by applying a voltage at a very 
high frequency in addition to a dc voltage. Let 

V ~ V 0 + v cos c of, 

where v « V. Then b(t) is 

$o T~ f J'V “Tv -- sin o)t. 
n n a; 

Now for Ax small, 

sin (x + Ax) ~ sin x + Ax cos x. 

Using this approximation for sin 5, I get 

J = Jo [^sin (« 0 + | Vot'j + | ~ sin M cos (X + | V () t^ j • 

The first term is zero on the average, but the second term is not if 



There should be a current if the ac voltage has just this frequency. Shapiro 15 
claims to have observed such a resonance effect. 

If you look up papers on the subject you will find that they often write the 
formula for the current as 


J = sin (T, + ^ / A- ds)’ (21.48) 

where the integral is to be taken across the junction. The reason for this is that 
when there’s a vector potential across the junction the flip-flop amplitude is 
modified in phase in the way that we explained earlier If you chase that extra 
phase through, it comes out as given above. 

Finally, I would like to describe a very dramatic and interesting experiment 
which has recently been made on the interference of the currents from each of 
two junctions. In quantum mechanics we’re used to the interference between 
amplitudes from two different slits. Now we’re going to do the interference be- 
tween two junctions caused by the difference in the phase of the arrival of the 
currents through two different paths. In Fig. 21-7, 1 show two different junctions, 
“a” and “b”, connected in parallel. The ends, P and g, are connected to our elec- 
trical intruments which measure any current flow. The external current, ./total, 
will be the sum of the currents through the two junctions. Let J A and J b be the 
currents through the two junctions, and let their phases be 5 a and <5 b . Now the 
phase difference of the wave functions between P and g must be the same whether 
you go on one route or the other. Along the route through junction “a”, the phase 
difference between P and g is 5 a plus the line integral of the vector potential along 
the upper route: 

APhasej^Q = 5,. + ^ / A-ds. (21.49) 

™ J upper 

15 P. W. Anderson and J. M. Rowell, Phys Rev Letters 10, 230 (1963). 

16 S. Shapiro, Phys. Rev. Letters 11, 80 (1963). 
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Fig. 21-7. 
in parallel. 


Two Josephson junctions 


Why 9 Because the phase 0 is related to A by Eq. (21.26). If you integrate that 
equation along some path, the left-hand side gives the phase change, which is then 
just proportional to the line integral of A , as we have written here. The phase 
change along the lower route can be written similarly 

APhasep^g = 5b H~ I A * ds . (21.50) 

ft J lower 

These two must be equal; and if 1 subtract them I get that the difference of the 
deltas must be the line integral of A around the circuit: 

«h - 5a = ^ f A- ds. 

Here the integral is around the closed loop T of Fig. 21-7 which circles through 
both junctions. The integral over A is the magnetic flux 3> through the loop. So 
the two 6’s are going to differ by 2 q e /h times the magnetic flux <t> which passes 
between the two branches of the circuit: 


5b — 5 a 



(21.51) 


1 dan control this phase difference by changing the magnetic field on the circuit, 
so I can adjust the differences in phases and see whether or not the total current 
that flows through the two junctions shows any interference of the two parts. 
The total current will be the sum of J a and Jb. For convenience, I will write 

«o = So + #, S h = So - f *. 

Then, 

./total = do jsm (s„ + ^ <i>) + sin (s 0 - ~ 3>)J 

= J 0 sin So cos ^ • (21.52) 


Now we don’t know anything about S 0 , and nature can adjust that anyway 
she wants depending on the circumstances. In particular, it will depend on the 
external voltage we apply to the junction. No matter what we do, however, sin5 () 
can never get bigger than 1. So the maximum current for any given $ is given by 


J \ 1 1 ax J 0 


COS 


h 


This maximum current will vary with and will itself have maxima whenever 


T = n — ’ 

with n some integer. That is to say that the current takes on its maximum values 
where the flux linkage has just those quantized values we found in Eq. (21.30)' 
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Fig. 21-8. A recording of the current 
through a pair of Josephson junctions as a 
function of the magnetic field in the region 
between the two junctions (see Fig, 21-7). 
[This recording was provided by R. C. 
Jaklevic, J. Lambe, A. H. Silver, and J. E. 
Mercereau of the Scientific Laboratory, 
Ford Motor Company ] 
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The Josephson current through a double junction was recently measured 1 ' 
as a function of the magnetic field in the area between the junctions. The results 
are shown in Fig. 21-8. There is a general background of current from various 
effects we have neglected, but the rapid oscillations of the current with changes in 
the magnetic field are due to the interference term cos q e $/h of Eq, (21.52). 

One of the intriguing questions about quantum mechanics is the question of 
whether the vector potential exists in a place where there's no field. 1 8 This experi- 
ment I have just described has also been done with a tiny solenoid between the 
two junctions so that the only significant magnetic B field is inside the solenoid 
and a negligible amount is on the superconducting wires themselves. Yet it is 
reported that the amount of current depends oscillatorily on the flux of magnetic 
field inside that solenoid even though that field never touches the wires — another 
demonstration of the “physical reality” of the vector potential. 19 

1 don’t know what will come next. But look what can be done. First, notice 
that the interference between two junctions can be used to make a sensitive mag- 
netometer. If a pair of junctions is made with an enclosed area of, say, 1 mm 2 , 
the maxima in the curve of Fig. 21-8 would be separated by 2 X 10 _0 gauss. It 
is certainly possible to tell when you are 1/10 of the way between two peaks; so 
it should be possible to use such a junction to measure magnetic fields as small as 
2 X 10~ 7 gauss — or to measure larger fields to such a precision. One should be 
able to go even farther. Suppose for example we put a set of 10 or 20 junctions 
close together and equally spaced. Then we can have the interference between 
10 or 20 slits and as we change the magnetic field we will get very sharp maxima 
and minima. Instead of a 2-slit interference we can have a 20- or perhaps even a 
100-slit interferometer for measuring the magnetic field. Perhaps we can predict 
that the measurement of magnetic fields will — by using the effects of quantum- 
mechanical interference — eventually become almost as precise as the measurement 
of wavelength of light. 

These then are some illustrations of things that are happening in modem 
times — the transistor, the laser, and now these junctions, whose ultimate practical 
applications are still not known. The quantum mechanics which was discovered 
in 1926 has had nearly 40 years of development, and rather suddenly it has begun 
to be exploited in many practical and real ways. We are really getting control of 
nature on a very delicate and beautiful level. 

I am sorry to say, gentlemen, that to participate in this adventure it is ab- 
solutely imperative that you learn quantum mechanics as soon as possible. It was 
our hope that in this course we would find a way to make comprehensible to you 
at the earliest possible moment the mysteries of this part of physics 


17 Jaklevic, Lambe, Silver, and Mercereau, Phys. Rev Letters 12, 159 (1964). 

18 Jaklevic, Lambe, Silva, and Mercereau, Phys Rev. Letters 12, 274 (1964). 

19 See Volume II, Chapter 15, Section 15-5. 
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Feynman 9 8 Epilogue 


Well, I’ve been talking to you for two years and now I’m going to quit. In 
some ways I would like to apologize, and other ways not. I hope — in fact, I know — 
that two or three dozen of you have been able to follow everything with great 
excitement, and have had a good time with it. But I also know that “the powers of 
instruction are of very little efficacy except in those happy circumstances in which 
they are practically superfluous.” So, for the two or three dozen who have under- 
stood everything, may I say I have done nothing but shown you the things. For 
the others, if I have made you hate the subject. I’m sorry. I never taught elementary 
physics before, and I apologize. I just hope that I haven’t caused a serious trouble 
to you, and that you do not leave this exciting business. I hope that someone else 
can teach it to you in a way that doesn’t give you indigestion, and that you will 
find someday that, after all, it isn’t as horrible as it looks. 

Finally, may I add that the main purpose of my teaching has not been to 
prepare you for some examination — it was not even to prepare you to serve in- 
dustry or the military. I wanted most to give you some appreciation of the wonder- 
ful world and the physicist’s way of looking at it, which, I believe, is a major part 
of the true culture of modern times. (There are probably professors of other sub- 
jects who would object, but I believe that they are completely wrong.) 

Perhaps you will not only have some appreciation of this culture; it is even 
possible that you may want to join in the greatest adventure that the human mind 
has ever begun. 
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